How to create an app to discover natural groups of similar items in your data using the Engine? (Clustering)

This article outlines the steps on using the Engine to discover natural groups of similar items in your data.

Step 1 - Create new app and select problem

There are two ways to create a new app:

  • From the project homepage, click the “Create new app” button

Create new app button from project homepage

  • From a dataset page, click the “Create app” button

Create app button from a dataset page

On the app creation dialog,

  1. Select a dataset

  2. Select “Discover natural groups of similar items“

  3. Click “Next”

App creation dialog page

Step 2 - Select columns

Select and deselect the columns used as criteria to group the naturally similar instances by clicking the check box to the left of the column names

To assist your selection, you can:

  • Type in the search bar to look for columns by name

  • Click the “select all” checkbox to select all columns

  • Click the “Hide disabled” toggle to hide all columns that cannot be used as criteria

  • Click on a row in the column list to view the stats and histogram of the column

  • Review the selected columns using the column tags in the “Selected columns” section, and deselect columns by clicking the cross icon in each tag

Note

  • To select a column as a grouping criteria, you must click on the check box to the left of the column name

  • Clicking on a row in the column list doesn’t add the column to the selected list, but only show the column analysis

Select columns page

Step 3 - Select and configure options

  1. Select and deselect the options and their configurations by clicking on the option

  2. Modify the appropriate configurations for the selected option to get the desirable output

    1. For HDBSCAN on UMAP dimensionality-reduced dataset, specify the smallest number of items per output cluster

    2. For Gaussian Mixture Model (GMM) and K Means algorithm, you can select one of the following 2 options:

      1. Specify a range for the number of clusters you expect from your data

      2. Specify the number of output clusters you expect from your data

  3. Click “Next”

Select and configure options page

Select and configure options page

Choose which one is better for you. 

Step 4 - Review and run

In this step,

  1. Give the app a name

  2. Review the columns selected as criteria

  3. Review the selected options, their configurations and edit the output name

  4. Review how much data is deducted from the “total size of processed data per month”

  5. Click “Run”

Review and run page

Review and run page

Step 5 - Check progress and troubleshoot if any issue occurs

On the Result listing of the clustering app,

  1. To see the list of ready output, select the “Completed” tab

  2. To see the progress of the processing output, select the “In progress“ tab

  3. To see the list of failed output and troubleshoot, select the “Failed” tab

USE THIS SIZE Blog Images (1)-2

Note: If you want to know how to view the output detail page, read this article.