How are features pre-processed for machine learning algorithms?

This article outlines how features are pre-processed for machine learning algorithms.

On the AI & Analytics Engine, users can train models once the dataset is ML-ready. Here an ML-ready dataset can include features with missing values, be unscaled or in the original text format. Even though ML algorithms do not accept such datasets as input, the steps that can handle these features, called pre-processing steps, are automatically done before model training on the Engine.

For a given feature set, pre-processing steps are recommended for each feature based on the feature type.

For more details, see the flow chart below.

Pre-processing steps recommended by the AI & Analytics Engine

Note that only features of types Categorical, Text, Numeric and Boolean are supported by the Engine. Features of other types such as JSONArray and Datetime are currently ignored in pre-processing. In the case that unsupported feature types are needed in model training, users can transform these features with a data-preparation recipe.