How to use the templated ML solution for subscription churn?

This article demonstrates how to use a ML Solution Template to predict the possibility of customer churn, for a transaction-based business.

Introduction

Many businesses face the significant challenge of customer churn. It is crucial to predict churn early so retention initiatives can be put into place proactively.

To address this concern, the AI & Analytics Engine offers a template that uses machine learning to predict customer churn.

The template uses historical customer data and customer activity to make accurate predictions.

Currently you have two Customer Churn template options:

  1. Predicting whether a transactional activity will decline in volume, or frequency for each customer who is currently active

  2. Predicting the likelihood of each customer terminating their ongoing paid subscription service

In this article you will be guided through the second option where you will predict the likelihood of churn for customers with ongoing subscriptions.

:mortar_board: To learn more about the Engine’s template for subscription-based businesses, read how to use the templated ML solution for the subscription option in the customer churn prediction template? 

Use case: Customer-churn prediction in telco

Assume you work in the sales department of a mobile-telecommunications-service provider. As part of your task, you are required to maintain a customer retention program. In that program, you generate a weekly report of customers who are likely to churn soon, and then you use your domain understandinwig to select customers who would be targeted for retention efforts.

The Engine’s customer-churn template will use these datasets to build an application that identifies customers that are likely to churn.

It will process these datasets to generate the churn status of the customers and any patterns that can possibly be used to predict their churn status using a machine-learning model.

🎓 To learn more about how the Engine processes data read how does the Engine process data for the subscription option in the customer churn template?

Data

To demonstrate this, datasets from an example telco use case are used in this article.

💡 Follow this guide along by downloading the datasets.

There are 4 datasets available:

Subscriptions : Tabular data that contains past and current customers' subscription start dates (registration_date) and churn dates (churn_date). If a customer is active, the churn_date will be empty. Each row represents a customer with a unique customer_id.


Preview of subscription datasetPreview of subscription dataset

 

Billing - This dataset contains the billing records of the customers, identifiable by their customer_id. Each statement contains a recurrent billing amount (recurrent_amount), the cost of value-added services, additional usage amount, roaming-usage charges and total_amount paid by each customer along with the corresponding dates when the payments are made (billing_date).

billing dataPreview of billing dataset

 

Customer_service_requests - This dataset contains call center records. Each record represents a service request (over the phone) made by the customers and the times these request were made. The columns include the details of service-call types, call outcomes, call wait times and duration.

service requests dataPreview of customer-service requests dataset

 

Customer_demographics - This dataset contains demographic data each customers (customer_id), such as their date_of_birth and gender.

demographics dataPreview of customer demographics dataset

💡 Adding a customer demographics dataset is optional. It may provide additional signals to improve the performance of the model.

Creating a Customer churn prediction app with the template

In this section, you can follow some simple steps and provide the required information to create an application that uses machine-learning models to predict customer churn.

You’ll observe how this templated solution shortens the time between feeding raw data to getting insightful predictions.

Within the project, select Customer Churn Prediction under Machine Learning Solution Templates

select customer churnSelect Customer Churn

 

Then create the application by selecting the Use this template and naming the application, this will take you to the App builder pipeline.

select the subscription customer churn

Step 1: Select a template that matches your business type

Within the App builder pipeline, choose the option “Predict if my customers will terminate their ongoing paid subscription”.

Subscription 1Choose the first option for subscription businesses

Step 2: Define business requirements

This step is to define your business requirements. This requires an understanding of what churn means to your business and the lead time required to action retention initiatives.

The labels too_late, about_time and safe are labels that are to be predicted using your dataset, this step is about you defining what each of labels mean to your business.

For example:

Too late: A customer churning within the next 30 day period from the prediction date will be too late for the business to start their retention tactics.

For this use case, use 30 days as the churn period for the prob_too_late label, and 30 days following the next 30 day period (day 60 from the date of prediction) for the about_time label.

Entering these parameters will automatically assign the safe to customers who are not churning within the too_late and about_time prediction windows (60 days).

Use 30 days as the prediction frequency. This means that the models will be tested on the last 30 day portion of data.

Subscription 2Define churn periods and prediction frequency

Step 3: Review data requirements and import your data

Based on the values provided when you defined the business requirements, you need the data from recent history that spans at least 270 days. This is because of the churn periods you you identified for your labels, as well as the test portion defined by the prediction frequency. The Engine then works out a minimum training portion to build the model.

The data you need is:

  • Subscription start and churn dates (required data)

  • One or more event logs (required data)

  • Customer info (optional data)

Subscription 3.1Review data requirements

 

The next step is to add the dataset containing the subscription start and churn dates.

The churn dates for active customers are empty, and each customer should have only one record in this dataset.

You can connect to a database table or upload the data containing this information.

You will need to provide the customer identifier column (customer_id), subscription start date column (registration_start) and the churn_date column.

Subscription 3.2Add subscription data

 

Then add event-logs data, of which type there are two datasets: billing dataset and service_requests dataset.

For each event-logs dataset, you will need to provide a label to designate the type of events it contains, and select the columns containing the customer identifiers and event timestamps.

The event-type label provided here enables the Engine to generate feature names that are relatable to the business use case, in the training dataset to be generated.

Subscription 3.3Add customer_service_request dataset and billing dataset

 

The last dataset you can add is a the optional customer-information dataset. Adding such a dataset about customers might lead to better model quality later.

You need to select the column containing the customer identifiers here as well.

Subscription 3.4Add customer information dataset

Step 4: Define contributing factors

Next, define contributing factors that are usually predictive with respect to the likelihood of churn.

🎓 To learn more about contributing factors, read what do "contributing factors" mean in the customer churn prediction template.

Time-based factors from the events datasets

For each event logs dataset select the columns in the dataset and relevant rolling date ranges that can potentially be used to identify future churn.

Use the default setting for column selection which is to use the first 15 numerical columns.

For the rolling date ranges windows, let us use “Most 30 recent days”, “30-day range, 30 days ago” and “30-day range, 60 days ago” for the customer_service_requests dataset and “Most 60 recent days” for the billing dataset.

Transactional 4Contribution factors from customer_service_requests dataset. You can define the contribution factors for the billing dataset in a similar way

Attributes from the customer information dataset

Use the default selection which is to have all relevant columns selected.

Contribution factors from customer info dataset

Step 5, build models

The last step in the App builder pipeline is to select the algorithms for training.

You can either let the Engine select the best algorithms for training the churn prediction model (default) or you can choose manually.

For this use case continue by selecting Find and train the top 3 algorithms.

Transactional 5Select machine learning algorithms to be used

 

You can also specify the minimum desired prediction quality and maximum training time. This allows you to build models that are estimated to meet both criteria, ensuring that the generated models are of sufficient quality, and training time is within the desired limits.

At this stage, the app configuration is complete, and you can start the build process by clicking Start building.

You will be directed to the App summary page with the label processing. You can see the progress of the app on the right-side panel. Once models are trained, the App is ready to use.

At this stage, you can:

  • To view model insights: Select View models

  • To generate predictions: Select Make a prediction

Subscription App summary page when app is readyApp summary page when app is ready

 

Accessing model insights

Model insights are useful for understanding the model’s performance.

You can follow the steps to go to the individual model’s insights page.

Once the app is ready, go to Model leaderboard using Models tab. Here you can see a list of trained models and their performances in terms of prediction quality and training times.

Subscription Model leaderboardModel leaderboard

 

If you need more in-depth details on a model’s performance you can click on the model and go to the individual Model details page. Here you can see detailed performance insights such as multiple evaluation metrics and multi-class confusion matrix.

Subscription multi-class confusion matrix.Model insights for the model with best prediction quality

 

Apart from the evaluation in the Performance tab, you can also generate other model insights in the corresponding tabs, such as feature importance, prediction explanation and what-if analysis.

Generating predictions

On the App summary page you can start the process of making predictions by clicking Make predictions.

The predictions from the Churn prediction template provide following:

  • List of currently active customers

  • Their predicted likelihood of churn in each of the 3 defined time period labels:

    • prob_too_late

    • prob_about_time

    • prob_safe

Two options available to make predictions.

    1. Make a one-off prediction: You can use this option if you want to test the model quickly or make a single prediction.

    2. Schedule periodic predictions: This option enables the use of the ML prediction pipeline built by the template in production, to automatically generate updated predictions periodically, requiring no manual intervention. When you want to schedule churn predictions in this way, you will connect your live customer data to the Engine, to periodically ingest new data available since the last prediction and generate the latest predictions. The first step is to setup a database connection (e.g. mysql database) that allows querying the relevant tables periodically, and fetch the data required to generate new predictions.

      prediction options
Two options for prediction

🎓For more information about these prediction options, read what are the options for predictions in the customer churn prediction template.

Make a one-off predictions

There are three steps to set up a one-off prediction.

1. Select the model

Use the recommended model which is the model with the highest prediction quality.

prediction modelSelect the model to generate predictions

 

2. Define prediction input

Depending on the data availability, you can either use the data already uploaded to the Engine to make a one-off prediction for the next time period or, if you have more data at the time of prediction, you have the option to use it as well.

For this use-case you can continue by using the data already uploaded to the Engine.

The data uploaded spans from 01 Mar 2022 to 21 Mar 2023. Given the input, the template will predict the likelihoods of churn for the users who were active as of 21 Mar 2023.

prediction inputDefine prediction input

 

3. Define output destination

This is an optional step to specify where to export the prediction output.

You have options to export the output either as a new table in a database, or appending to an existing table in a database or as a dataset to a project in the Engine.

Even if you didn’t specify the output destination, you can download the prediction output once it is ready.

prediction destinationOptionally define prediction output destination

 

Once these steps are completed you can click Run prediction to start the prediction process which will direct you to the Prediction details page. You can see the prediction status in this prediction details page or in the App details page.

prediction processing

Once the prediction status changes to ready you can consume the predictions in three ways.

  1. Preview a sample of the output

  2. Download the output as a csv, json lines or parquet file

  3. Export output to a dataset within the Engine or to an external database

prediction previewPrediction preview after exporting to the Engine

prediction csvPrediction output as a csv. This file contains the features generated by the Engine as well, however those columns were hidden in this image for clarity

 

Schedule periodic predictions

Scheduling periodic predictions requires 4 steps to complete:

1. Select the model

Same as one-off predictions above.

2. Define the prediction input

Provide a connection to a database table/collection containing up-to-date subscription start and churn dates info, events and customer info data (If you didn't provide the customer info data while building the app, you won't be asked for it).

Provide the data connection by choosing the database server type, and enter the credentials and details of the corresponding tables.


periodic_prediction inputProviding connection to a database to get inputs for periodic predictions

3. Define the output destination

This is same as one-off predictions above.

4. Schedule

Input when and in what frequency you want to get the predictions.

For example: Schedule predictions 9 a.m. on 1st day of every month.

Define the schedule for periodic predictionsDefine the schedule for periodic predictions

 

Once these steps are completed, you can click Run prediction to start the prediction process. In a similar manner to the one-off predictions, you can see the prediction status on the Prediction details page or the App details page. However, note that you won’t see any predictions until the scheduled time.

Once predictions are made automatically by the Engine at scheduled times, you can consume them in the same three ways as one-off predictions.

Conclusion

This article demonstrated a convenient and easy-to-use ML solution Template using the AI & Analytics Engine. This solution specifically focused on predicting customer churn for businesses operating on a paid-subscription model.