1. AI & Analytics Engine Knowledge Hub
  2. Concepts
  3. Model Performance (Classification & Regression)

Evaluating machine learning models' performance in the Engine

This article explains how to evaluate machine learning model performance, and what metrics and plots are available in the AI & Analytics Engine.

What is model evaluation?

Model evaluation is the process of using metrics and plots to quantify and visualize the performance of a machine learning model. The evaluation metrics and plots are typically derived by analyzing the trained model’s predicted output of the target value when using the test portion features as the prediction input, against the test portion’s actual target value.

This process requires the data to be split in two before training the model (train/test split). The training portion is used to train the model, and the test portion is kept out for testing. The AI & Analytics Engine automatically creates the train/test split and evaluation metrics and plots. The train/test split can be configured during the app creation phase.

For more information on train/test split, see: What is the train/test split for classification and regression apps?

The relationship between the predicted target column and the actual target column as input to the model evaluationDiagram illustrating the train/test split and the relationship between the predicted target column and the actual target column as input to the model evaluation

 

Model evaluation metrics and plots are useful to indicate how well the model makes predictions. When used as part of the ML development process, they can guide model optimization and help anticipate real-world model performance, ensuring alignment with business objectives.

Evaluating machine learning model performance in the Engine

Within the Engine, model evaluation metrics and plots are automatically generated after a model has been trained.

Basic model performance evaluation

Basic evaluation metrics are quickly accessible from the Model Leaderboard page. These include prediction quality and training time. Each model will show a prediction quality, ranging from 0% to 100%.

Models basic evaluation metrics from the leaderboard page.

Models basic evaluation metrics from the leaderboard page.

 

Prediction quality is calculated differently for different machine learning problem types such as regression, binary classification, and multi-class classification.

Advanced model performance evaluation

The Engine has several advanced evaluation metrics and plots. What is displayed depends on whether the ML problem type is a regression, binary, or multi-class classification problem.

To access advanced metrics and plots navigate to the Model Leaderboard page, and select a model.

3. Model leaderboard page after the model has been trained

Model Leaderboard page, after the model has been trained.
 

From the model “Insights” tab, the “Performance” tab contains advanced evaluation metrics and plots. In this view, a full list of evaluation metrics and relevant plots related to the problem type are displayed.

Model Insights tab, viewing the Model Performance tab

Model Insights tab, viewing the Model Performance tab

What metrics and plots are available for regression

Basic metrics

  • Prediction quality

  • Prediction Error

  • Percentage Error

Additional metrics

  • R2 Score

  • Explained Variance

  • RMSE (Root Mean Squared Error)

  • MABE (Mean Absolute Error)

  • MedABE (Median Absolute Error)

  • MAPE (Mean Absolute Percentage Error)

  • MSLE (Mean Squared Logarithmic Error)

For a detailed overview of regression metrics and their meaning see: Which metrics are used to evaluate a regression model's performance?

Plots

  • Predicted vs. actual values

  • Residuals vs. predicted values

  • Residuals distribution

What metrics and plots are available for binary classification?

Metrics

  • Precision

  • Recall

  • F1 Score

  • FPR (False Positive Rate)

  • AUC-ROC (Area Under Curve - Receiver Operating Characteristic)

For a detailed overview of binary classification metrics and their meaning see: Which metrics are used to evaluate a binary classification model's performance?

 Plots
  • Precision recall curve
  • ROC curve
  • Confusion Matrix

    What metrics and plots are available for multi-class classification

    Metrics

    • Macro and weighted average of

      • F1 Score

      • Precision

      • Recall

      • FPR (False Positive Rate)

      • AUC-ROC (Area Under Curve - Receiver Operating Characteristic)

      • Average Precision Score

    • Log-loss Score

    • Accuracy

    For detailed overview of multi-class classification metrics and their meaning see: Which metrics are used to evaluate a multiclass classification model's performance?

    Plots

    • Multi-class confusion matrix