How to use the templated ML solution for subscription churn?

In this article, we demonstrate how to use the templated machine-learning solution of the AI & Analytics Engine to predict the possibility of customer churn for a subscription-based business.

Watch the walkthrough video:

HubSpot Video

Many businesses face the significant challenge of customer churn. It is crucial to predict churn early and take preventative measures, as doing so can greatly boost a business's growth. To address this concern, the AI & Analytics Engine offers a user-friendly template that utilizes machine learning to predict customer churn. This template uses historical customer data, as well as activity to make accurate predictions, and it currently offers two options:

  1. Predicting whether a transactional activity will decline in volume, or frequency for each customer who is currently active

  2. Predicting the likelihood of each customer terminating their ongoing paid subscription service

In this article, lets look at the second option where we will predict the likelihood of churn for customers with ongoing subscriptions.

If you’re interested in the first option for a transaction-based business, read this article.

Use case: Customer-churn prediction in telco

Assume you work in the sales department of a mobile-telecommunications-service provider. As part of your task, you are required to maintain a customer-retention program. In that program, you generate a weekly report of customers who are likely to churn soon, and then, based on various factors, such as the customer lifetime value, budgets, current competitor offers etc. you select customers who would be targeted for retention efforts. The data available to you includes:

  • subscription start dates and churn dates of customers

  • their past events/activity with timestamps

  • their demographics (optional)

The Engine’s customer-churn template will use these datasets to build an application that identifies customers that are likely to churn. It will process these datasets to generate churn status of the customers and any patterns that can possibly be used to predict their churn status using a machine-learning model. (See here for more information on how the Engine processes data to that end).

To demonstrate this, we will be using datasets from an example telco use case that meet the requirements of the subscription-based churn option discussed in this article. There are 4 datasets available.

  1. Subscriptions - This dataset contains past and current customers' subscription start dates and churn dates. If a customer is active, the churn date will be empty.

Preview of subscription datasetPreview of subscription dataset

2. Billing - This dataset contains the billing records of the customers. Each statement contains a recurrent billing amount, the cost of value-added services, additional usage amount, roaming-usage charges and total amount paid by each customer along with the corresponding dates when the payments are made.

billing dataPreview of billing dataset

3. Customer_service_requests - This dataset contains call center records. Each record represents a service request (over the phone) made by the customers and the times these request were made. The columns include the details of service-call types, call outcomes, call wait times and duration.

service requests dataPreview of customer-service requests dataset

4. Customer_demographics - This dataset contains demographic data of customers, such as their date of birth and gender.

demographics dataPreview of customer demographics dataset

Creating a Customer Churn Prediction Application

Within the project, we create an application using the “Use template” option and selected “Customer Churn”.

use templateSelect “Use template” option

customer churn templateSelect the “Customer Churn” template

 

We then create the application by selecting the template and naming the application, taking us to the App-builder page.

Within the App builder, we choose the option “Predict if my customers will terminate their ongoing paid subscription”.

subscription optionSelect the Subscription option

 

In the next steps of the App builder, we add datasets and provide other inputs required for the Engine to create the Churn prediction application.

  1. Firstly, we add the dataset containing the subscription start and churn dates. For example, we can connect to a database table containing this information. We will need to provide the customer identifier column, subscription start date column and the churn date column.

subscription datasetAdd subscription data

2. We then add event-logs data, of which type we have two datasets: billing dataset and service requests dataset. For each event-logs dataset, we will need to provide a label to designate the type of events it contains, and select the columns containing the customer identifiers and event timestamps. The event-type label provided here enables the Engine to generate feature names that are relatable to the business use case, in the training dataset to be generated.

billing datasetAdd billing dataset

service requests datasetAdd service requests dataset

3. The last dataset we are adding adding is a customer-information dataset. This is an optional step, but adding such a dataset about customers might lead to better model quality later. We need to select the column containing the customer identifiers here as well.

customer demograpicsAdd customer information dataset

4. Next, we need to define the churn period. There are two parameters we need to define, depending on the time the business requires to apply retention actions if a customer is predicted to churn. For more details about these churn periods, see here. Lets use 30 days for both parameters.

customer churn periodDefine churn periods

 

5. Next, we define contributing factors that are usually predictive with respect to the likelihood of churn (Further explanations on contributing factors can be found here). There are two types of contributing factors:

a. Time-based factors from the events datasets: For each event logs dataset we select the columns in the dataset and relevant rolling date ranges that can potentially be used to identify future churn. We use the default setting for column selection which is to use the first 15 numerical columns. For the rolling date ranges windows, we use “Most 30 recent days” and “30-day range, 30 days ago” for billing dataset and “Most 60 recent days” for customer_service_requests dataset.

billing contribution factors

Contribution factors from billing dataset. We can define the contribution factors for service requests dataset in a similar way

b. Attributes from the customer information dataset: We go with the default selection which is to have all relevant columns selected.

Customer info contributing factorsContribution factors from customer info dataset

6. Finally, we let the Engine select the best algorithms for training (default) the churn prediction model or we can choose manually. We let the engine select the top 3 algorithms for training.

build modelsSelect machine learning algorithms to be used

We also have the option to specify the minimum desired prediction quality and maximum training time. This allows us to build models that are estimated to meet both criteria, ensuring that the generated models are of sufficient quality, and training time is within the desired limits.

At this stage, the app configuration is complete, and we can start the build process by clicking “Start building”. We will be directed to the App summary page with the label “processing”. We can see the progress of the app on the right-side panel.

App building

Once models are trained, the App is ready to use. At this stage we can:

  • View model insights

  • Generate predictions

App readyApp summary page when app is ready

Accessing model insights

Model insights are useful for understanding the model’s performance. We can follow the below steps to go to the individual model’s insights page.

Once the app is ready, let’s go to model leaderboard using “View all models” button. Here we can see a list of trained models and their performances in terms of prediction quality, prediction and training times.

Model leaderbaordModel leaderboard

If we need more in-depth details on a model’s performance we can click on the model and go to the individual model details page. Here we can see detailed performance insights such as multiple evaluation metrics and multiclass confusion matrix.

model insightsModel insights for the model with best prediction quality

We can also generate other important model insights such as feature importance, prediction explanation and What-if analysis from this page.

Generating predictions

On the app’s summary page, we can start the process of making predictions by clicking “Make Prediction” button. The predictions from this churn prediction template provide following key information:

  • List of currently active customers

  • Their predicted likelihood of churn in each of the 3 defined time periods: “too_late”, “about_time” and “safe”

Depending on the requirements of the business, there are two options available to make predictions.

  • Make a one-off prediction - We can use this option if we want to test the model quickly or make a single prediction.

  • Schedule periodic predictions - In a nutshell, this option enables the use of the ML prediction pipeline built by the template in production, to automatically generate updated predictions periodically, requiring no manual intervention. When we want to schedule churn predictions in this way, we will connect our live customer data to the Engine, to periodically ingest new data available since the last prediction and generate the latest predictions. Therefore, we first setup a database connection (for e.g. mysql database) that allows querying the relevant tables periodically, and fetch the data required to generate updated predictions.

    prediction options

Two options for prediction

For more information on these prediction options, read this article.

One-off predictions

There are three steps to set up a one-off prediction:

1. Select the model - We select the recommended model which is the model with the highest prediction quality.


prediction modelSelect the model to generate predictions

 

2. Define prediction input - Depending on the data availability, we can either use the data already uploaded to the engine to make a one-off prediction for the next time period or, if we have more data at the time of prediction, we have the option to use it as well.

Let us use the data already uploaded to the Engine. The data we uploaded spans from 01 Mar 2022 to 21 Mar 2023. Therefore given the input, the template will predict the likelihoods of churn for the users who were active as of 21 Mar 2023.

prediction inputDefine prediction input

3. Define output destination - This is an optional step to specify where to export the prediction output. We have options to export the output either as a new table in a database, or appending to an existing table in a database or as a dataset to a project in the engine. Even if we didn’t specify the output destination, we can download the prediction output once it is ready.

prediction destinationOptionally define prediction output destination

Once these steps are completed we can click “Run prediction” to start the prediction process which will direct us to the prediction details page. We can see the prediction status in this prediction details page or in the App details page.

prediction processing

Once the prediction status changed to “ready” we can consume the predictions in three ways.

  • We can preview a sample of the output

  • Download the output as a csv, json lines or parquet file

  • Export output to a dataset within the engine or to an external database

prediction previewPrediction preview

prediction csvPrediction output as a csv. This file contains the features generated by the engine as well, however those columns were hidden in this image for clarity

Schedule periodic predictions

Scheduling periodic predictions requires 4 steps to complete:

1. Select the model - Same as one-off predictions above.

2. Define the prediction input - In this step, we will provide a connection to a database table/collection containing up-to-date subscription start and churn dates info, events and customer info data (If we didn't provide the customer info data while building the app, we won't be asked for it). We provide the data connection by choosing the database server type, and enter the credentials and details of the corresponding tables.


periodic_prediction inputProviding connection to a database to get inputs for periodic predictions

3. Define the output destination - This is same as one-off predictions above.

4. Schedule - Here we input when and in what frequency we want to get the predictions. For example: let’s Schedule it to 9 a.m. on 1st day of every month.

Define the schedule for periodic predictionsDefine the schedule for periodic predictions


Once these steps are completed, we can click “Run prediction” to start the prediction process. In a similar manner to the one-off predictions, we can see the prediction status in the prediction details page or in the the App details page. However, note that we won’t see any predictions until scheduled time.

Once predictions are made automatically by the Engine at scheduled times, we can consume them in the same three ways as one-off predictions.

Conclusion

In this article, we demonstrated a convenient and easy-to-use templated machine learning solution using the AI & Analytics Engine. This solution specifically focused on predicting customer churn for businesses operating on a paid-subscription model. The solution allows business users with similar problems to apply customer-churn prediction in their business, without needing to perform the complex task of writing software to manually putting together components of a ML pipeline to perform data preparation, ingestion, training, validation, tuning, evaluation, and scheduled prediction runs.