What features are generated after the Engine processes the data in a subscription option in the customer churn prediction app?

This article explains the features that are generated after the Engine processes the data in a subscription option in the customer churn prediction app.

Features are known attributes used as input by machine learning models to predict the unknown target.

For churn prediction, the Engine automatically generates a number of useful features from the transactions data and the customer information data provided by the user. These features represent various customer behavior statistics over different periods of time. 

Different types of aggregated features are generated over the selected time windows:

  • Events-based features: minimum, maximum, standard deviation and total for the selected attributes of logged events

  • Count-based features: counts for the event attribute

  • Time-interval based features: minimum, maximum and average number of days between events

  • Recency features: days since last event

  • Additional features from the customer info data and subscription data

The user can select the time windows used to generate these aggregated features (we regard these features as “contributing factors”) when they are defining time based factors from the events datasets:

Specifying time windows (bottom) for computing contributing factors (upper) from a streaming sessions events logs datasetSpecifying time windows (bottom) for computing contributing factors (upper) from a streaming sessions events logs dataset

As an example, assume we have an imaginary streaming company called PetFlix.

The company saves viewing streaming session logs of pet videos viewed. It also has some information about its customers (assuming all of them are active).

Subscription information:

User ID sign_up_date churn_date
1 May 1, 2023  
2  May 1, 2023  
3  May 1, 2023  

Streaming logs: (we give this dataset a nickname "streaming_sessions")

User ID Date session_length_minutes
1 May 29, 2023 20
2 May 30, 2023 30
3 May 30, 2023 40
4 May 30, 2023 20

Customer information:

User ID Gender Date of Birth (DOB)
1 F Oct 23, 1999
2 F Apr 17, 1983
3 M Feb 29, 1984

Then, the following features will be generated by the engine. For the event activity based features, the suffixes such as “_last_30d” and “_last_15d” correspond to the rolling date ranges confirmed by the business user while selecting the time-based contributing factors.

Description Feature name
Events-based features Count based

count_of_streaming_sessions_last_15d

count_of_streaming_sessions_last_30d

Stats from event attributes

min_session_length_minutes_in_streaming_sessions_last_30d

min_session_length_minutes_in_streaming_sessions_last_15d

max_session_length_minutes_in_streaming_sessions_last_30d

max_session_length_minutes_in_streaming_sessions_last_15d

total_session_length_minutes_in_streaming_sessions_last_30d

total_session_length_minutes_in_streaming_sessions_last_15d

stddev_session_length_minutes_in_streaming_sessions_last_30d

stddev_session_length_minutes_in_streaming_sessions_last_15d

Interval between events

min_days_btw_events_in_streaming_sessions_last_30d

min_days_btw_events_in_streaming_sessions_last_15d

max_days_btw_events_in_streaming_sessions_last_30d

max_days_btw_events_in_streaming_sessions_last_15d

avg_days_btw_events_in_streaming_sessions_last_30d

avg_days_btw_events_in_streaming_sessions_last_15d

stddev_days_btw_events_in_streaming_sessions_last_30d

stddev_days_btw_events_in_streaming_sessions_last_15d

Recency days_since_last_event_in_streaming_sessions
Features from subscription start and end dates dataset Tenure (days since subscription start date)
Time based features from customer info data

year_of_dob

month_of_dob

week_in_year_of_dob

weekday_of_dob

days_since_dob

Other features from customer info data gender