How are the models evaluated in the customer churn prediction template?

This article explains how the models are evaluated in the customer churn prediction template.

The Customer Churn Prediction template in the Engine offers an end-to-end ML solution that is used to predict which customers are likely to churn, based on either their historical transactional activity or activity recorded in event logs of various kinds, and optionally, customer information. In this template, trained models are automatically evaluated on the test portion of the prepared dataset, which shows at a particular time point whether a customer would churn after a given prediction lead period (click here & here for more details).

In particular, the last 30-day period in the prepared dataset is reserved as the test set and the rest is used to train models. Splitting the data in this way is the best practice advocated in machine learning for time-sensitive problems such as customer churn prediction with historical transactions. A beginner ML practitioner may misuse a random split, which can cause temporal leakage once the model is put into production (see the diagram below for different train-test split ways).

 

train_test_split_comparison.drawioComparison between a time-based split and a random split

 

The Engine automates these best practices as a built-in feature for non-ML experts, which potentially takes much more time and effort to implement manually (see the screenshot of an App setup attached below).

Flow of an app setup for the customer churn prediction templateFlow of an app setup for the transactional option in the customer churn prediction template. The split process for the subscription option is similar

In the evaluation report, the Engine offers a comprehensive list of metrics as well as visualizations. The details differ based on the choice of option (transactional/subscription):

  • Binary classification is used for the transactional option, as we want to predict whether or not each customer will have reduced transaction activity in a certain time window. See here for details about the evaluation metrics for this case, and here for explanations about visualizations of evaluation such as the ROC curve and PR curve.

  • Multi-class classification is used for the subscription option, as we want to predict one of three possibilities: whether it is too late to act, about time to act, or safe to not act for each customer. See here for details about the evaluation metrics for this case.


For users with less technical background, the Engine also summarizes the performance of a model as “Prediction Quality“) on the Model Leaderboard. See the sections on Binary and Multi-class Classification on this page to see how it’s derived for the transactional and the subscription case respectively.