What datasets are required to use the customer churn prediction template?

This article outlines the datasets required to use the customer churn template.

In order to build and train a customer churn model, we need historical data of the customers behaviour. This dataset should include details of each customer's past transactions, including the date and amount. The Engine will use this information to build a prediction model, to create predictions that will indicate the likelihood of churn for each customer. Therefore, in order to use the customer churn template, we need a dataset that contains all past customer transactions, called “Transaction Data”. Each row of this dataset needs to show a unique transaction. The data must include the following three columns:

  • Customer ID: a column that shows the customer’s unique identifier.

  • Transaction date: date of the transaction

  • Transaction amount: dollar value of the transaction or number of units sold.

transaction data columns

In addition to transaction data, a “Customer Information” dataset can optionally be included in the template. This dataset must contain one row per customer ID capturing their:

  1. Biographic / demographic profile information, such as their date and place of birth and their current employment status, education level, residential address, household size, and annual income.

  2. Any other business-specific / product specific information about the customer: customer type, sign up date, ongoing subscription cost, etc.

This dataset must have a column that contains the unique customer identifiers - the same ones in the customer identifier column of the transactions dataset:

Customer ID column