What is a Feature Set?

A feature set is the subset of all feature columns chosen to build the ML model.

A feature set is essentially a candidate subset of available columns (independent variables) that can serve as the input to a classification/regression/clustering/anomaly detection ML model.

The feature set will not include the target variable. In the Engine, users can create different feature sets to understand how predictions vary when certain variables (columns)  are removed or added.

Feature Set Example

We have a dataset of customer transactions, from which we want to predict whether a particular transaction is fraudulent. This dataset contains these columns:

  • year

  • month

  • day_of_month

  • day_of_week

  • customer_id

  • amount

  • account_balance

  • customer_age

  • location

  • is_fraudulent (target column)

To predict whether a particular transaction is fraudulent, we choose the relevant columns to form a feature set. With that in mind, here are some possible feature sets: 

Feature set 1: {day_of_month, day_of_week, amount, account_balance, customer_age}

Feature set 2: {amount, location}