Download the dataset needed for the tutorial here
Tutorial
Our task is to build a model to estimate the probability of default given the features collected at credit card application time by creating a model with SeriousDlqin2yrs as the target.
Note: The dataset we are using is a modified version of the original dataset from Kaggle (source: https://www.kaggle.com/c/GiveMeSomeCredit).
Data Description
German Credit Card Application data is a collection of data about the borrower (or applicant) who have applied for a credit card with the bank and have the credit card subsequently approved.
The borrowers are then monitored for two years after obtaining the credit card. If the borrower defaults on his credit card debt within two years, they will receive a value of 1 in the SeriousDlqin2yrs column (serious delinquency within two years). Otherwise, the value in the column will be 0.
Data Dictionary
Column Name | Description | Data Type |
SeriousDlqin2yrs | Person experienced 90 days past due delinquency or worse | Y/N |
RevolvingUtilizationOfUnsecuredLines | Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits | percentage |
age | Age of borrower in years | integer |
NumberOfTime30-59DaysPastDueNotWorse | Number of times borrower has been 30-59 days past due but no worse in the last 2 years. | integer |
DebtRatio | Monthly debt payments, alimony,living costs divided by monthy gross income | percentage |
MonthlyIncome | Monthly income | real |
NumberOfOpenCreditLinesAndLoans | Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) | integer |
NumberOfTimes90DaysLate | Number of times borrower has been 90 days or more past due. | integer |
NumberRealEstateLoansOrLines | Number of mortgage and real estate loans including home equity lines of credit | integer |
NumberOfTime60-89DaysPastDueNotWorse | Number of times borrower has been 60-89 days past due but no worse in the last 2 years. | integer |
NumberOfDependents | Number of dependents in family excluding themselves (spouse, children etc.) | integer |
constant_XYZ | Column with a constant value | text |
high_cor_col | Simulated column with high correlation to target column to simulate target leakage. | integer |