Create a data-wrangling recipe to prepare your data as per your needs. When the recipe is complete, it will be applied to the input dataset to create a new transformed dataset.
Tip: If your dataset is already ready for machine learning/analysis and you do not need to prepare your data further, you may skip this step and create an app directly after importing your dataset
Continuing from the previous article, Create a new dataset, you next have the option to prepare data to suit your needs. Let’s use the same german_credit_score dataset for this.
1. Create a recipe
You can use “Process data” from the “Quick Access” in dataset listing page or select “Create Recipe” in the dataset detail page. Then, select "Create a new data wrangling recipe", name your recipe, and click "Create".
Note: For more information, see What is a recipe?
2. Start a recipe-building session
You will then enter the recipe-building session. The Engine will need 1-2 minutes to prepare the session before you can begin
3. Add suggested actions
The Engine will automatically generate suggestions on what actions to add to the recipe. These suggestions are shown in the suggestions tab.
Click on the (+) buttons next to the 2 suggestions to queue them up in the recipe:
-
Convert columns to numeric type
-
Drop columns
After you are done adding both suggestions, click "commit action" in the recipe panel.
Tip: Are you curious about why the Engine provided these suggestions? Click on "see analysis" in the suggestion box to find out.
Note: For more information, see What are suggestions?
4. Add actions
Next, we want to add one or more specific actions to the recipe:
-
Drop columns: Column1 is just the row number
Add Drop Columns action
-
Click on the "Add Action" tab.
-
In the search field, enter "Drop".
-
Select the Drop action
-
Under Input Columns, select "Column1"
-
Click “ADD” to add the action to the queue
Note: To see a full list of actions supported in the Engine, see action catalogue.
5. Commit actions
Select the "RECIPE" tab, and at the bottom of tab, click on "Commit Actions".
Caution: Once actions are committed, they can no longer be edited.
This will apply the actions to the entire dataset and generate a fresh set of suggestions based on the latest dataset. At this stage, you may choose to repeat steps (3) and (4) to further transform the dataset as desired.
For this tutorial, we are happy with the current state of the dataset. Proceed to the next step to finalize & end.
6. Finalize & end
Click on "Finalize & end" to finalize the recipe. This will generate a transformed dataset (german_credit_score - Processed) by applying the actions in your recipe to the selected input dataset.
At the time of finalizing the recipe, any queued action will automatically be committed.
Caution: Please note finalized recipes are no longer editable. You will need to create a new recipe to make changes.