1. The AI & Analytics Engine Knowledge Hub
  2. Build from Scratch: Classification and Regression guide

Download dataset for the tutorial

Download the dataset needed for the tutorial here

Tutorial

Our task is to build a model to estimate the probability of default given the features collected at credit card application time by creating a model with SeriousDlqin2yrs as the target.

Download dataset here

Note: The dataset we are using is a modified version of the original dataset from Kaggle (source: https://www.kaggle.com/c/GiveMeSomeCredit).

Data Description

German Credit Card Application data is a collection of data about the borrower (or applicant) who have applied for a credit card with the bank and have the credit card subsequently approved.

The borrowers are then monitored for two years after obtaining the credit card. If the borrower defaults on his credit card debt within two years, they will receive a value of 1 in the SeriousDlqin2yrs column (serious delinquency within two years). Otherwise, the value in the column will be 0.

Data Dictionary

Column Name Description Data Type
SeriousDlqin2yrs Person experienced 90 days past due delinquency or worse  Y/N
RevolvingUtilizationOfUnsecuredLines Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits percentage
age Age of borrower in years integer
NumberOfTime30-59DaysPastDueNotWorse Number of times borrower has been 30-59 days past due but no worse in the last 2 years. integer
DebtRatio Monthly debt payments, alimony,living costs divided by monthy gross income percentage
MonthlyIncome Monthly income real
NumberOfOpenCreditLinesAndLoans Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) integer
NumberOfTimes90DaysLate Number of times borrower has been 90 days or more past due. integer
NumberRealEstateLoansOrLines Number of mortgage and real estate loans including home equity lines of credit integer
NumberOfTime60-89DaysPastDueNotWorse Number of times borrower has been 60-89 days past due but no worse in the last 2 years. integer
NumberOfDependents Number of dependents in family excluding themselves (spouse, children etc.) integer
constant_XYZ Column with a constant value text
high_cor_col Simulated column with high correlation to target column to simulate target leakage. integer

 

NEXT: IMPORT DATASET⇒