What is clustering?

This article explains clustering as a Machine Learning term and as a concept within the AI & Analytics Engine: Clustering is an unsupervised machine learning (ML) technique that can determine the intrinsic groupings among unlabeled data.

Clustering is a set of techniques in Machine Learning

Clustering can automatically discover groups of similar entities from a dataset and segment it accordingly.

Example of input & output dataset

How many clusters are there in the input dataset on the left? A good clustering technique will produce results as seen on the right side. Colours indicate the cluster ID assigned to each entity in the data.

Unlike Classification and Regression where the goal is to build a model that predicts a specified column in the dataset, there is no column to predict in Clustering. Instead, Clustering is used to discover and describe patterns in data by analyzing similarities to find the most coherent groupings automatically.

Clustering thus provides a powerful way to generate highly valuable and actionable insights from datasets of any size. Applications of clustering in the real world include: 

  • Demographic and behavioral segmentation of customers
  • Product recommendation
  • Market research, and
  • Biological data analysis, among others.

Clustering within the AI & Analytics Engine 

On the AI & Analytics Engine, Clustering is supported as an App type. To use clustering, the user needs to provide the “problem description” as an input, which includes:

The dataset for which clustering needs to be performed,

image-20220822-051516

The columns that must be taken into consideration while determining similarity between items, and

image-20220822-051757

The algorithm to use and its configurations.

image-20220822-051859

Users can generate multiple clustering results using different algorithms and configurations, based on the same columns. Under each clustering result, one can:

Examine an overview, detailed analysis, and insights about each cluster

image-20220822-053843
Overview of the clusters

image-20220822-053928

Analysis: Important dimensions for each cluster

image-20220822-054341

A detailed description of clusters

Clustering Results

Users can generate the output as a dataset with a cluster ID and other associated columns, or export the result as a CSV file.


💡Follow the detailed step-by-step guide to use clustering and learn how the AI & Analytics Engine makes Clustering easy, user-friendly, and intuitive for all users.