What algorithms are available to build machine learning models on the Engine?

The AI & Analytics Engine offers a variety of machine learning algorithms for each problem type available within the Engine.

Learn more about machine learning problem types: clustering, classification and regression.

The model templates are grouped by:

  1. Supervised learning

  2. Unsupervised learning

Supervised learning

Supervised machine learning is an approach using labeled datasets. These datasets are designed to train or “supervise” algorithms for classifying data or predicting continuous outcomes. By using paired inputs and labels, these models can be properly evaluated and improved with more data available over time.

Regression

  1. AdaBoost regressor

  2. Bayesian ridge regression

  3. Decision tree regressor

  4. Extremely randomized trees (Extra-trees) regressor

  5. Gradient boosting regressor

  6. K-Nearest neighbors (KNN) regressor

  7. LightGBM regressor

  8. Linear regression (with support of GPU, multi-GPU)

  9. Ridge regression (with support of multi-GPU)

  10. Mini-batch SGD (stochastic gradient descent) regressor (with support of GPU)

  11. Random forest regressor (with support of GPU, multi-GPU)

  12. XGBoost Regressor (with support of GPU, multi-node, multi-GPU)

Classification

  1. AdaBoost classifier

  2. Random forest classifier (with support of GPU, multi-GPU)

  3. K-Nearest neighbors (KNN) classifier

  4. Logistic regression (with support of GPU)

  5. Mini-batch SGD (stochastic gradient descent) classifier (with support of GPU)

  6. Decision tree classifier

  7. Extremely randomized trees (Extra-trees) classifier

  8. Gradient boosting classifier

  9. LightGBM classifier

  10. Gaussian Naive Bayes

  11. XGBoost classifier (with support of GPU, multi-node, multi-GPU)

Unsupervised learning

Unsupervised learning is a branch of machine learning where algorithms (model templates) are trained on unlabeled data to identify hidden patterns. The main types of problems under this category are clustering, anomaly detection, association, dimensionality reduction, and topic modeling. The Engine supports the following algorithms (templates):

Clustering

  1. Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) with dimensionality reduction achieved via Uniform Manifold Approximation and Projection (UMAP)

  2. Gaussian Mixture Model (Spark ML)

  3. K-Means (Spark ML)