What is a Feature Set (classification and regression apps)?

In classification and regression apps, a feature set is a sub-set of columns to be used by a model to make a prediction.

A feature set is essentially a candidate subset of available columns (independent variables) that can serve as the input to a classification/regression machine learning model.

The feature set will not include the target variable. On the Engine, users can create different feature sets to understand how predictions vary when certain variables (columns) are removed or added.

Feature Set Example

We have a dataset of customer transactions, from which we want to predict whether a particular transaction is fraudulent. This dataset contains these following columns:

  • year

  • month

  • day_of_month

  • day_of_week

  • customer_id

  • amount

  • account_balance

  • customer_age

  • location

  • is_fraudulent (target column)

To predict whether a particular transaction is fraudulent, we choose the relevant columns to form a feature set. With that in mind, here are some possible example feature sets: 

Feature set 1: {day_of_month, day_of_week, amount, account_balance, customer_age}

Feature set 2: {amount, location}