Análisis Roll Rate

Python

Roll Rate

Análisis Roll Rate mediante matrices de transición para el dataset Default of Credit Card Clients

Autor/a

Joel Burbano

Fecha de publicación

23 de agosto de 2024

Carga del dataset

Código

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

df = pd.read_csv("UCI_Credit_Card.csv")
df.head()

	ID	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_0	PAY_2	PAY_3	PAY_4	...	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	default.payment.next.month
0	1	20000.0	2	2	1	24	2	2	-1	-1	...	0.0	0.0	0.0	0.0	689.0	0.0	0.0	0.0	0.0	1
1	2	120000.0	2	2	2	26	-1	2	0	0	...	3272.0	3455.0	3261.0	0.0	1000.0	1000.0	1000.0	0.0	2000.0	1
2	3	90000.0	2	2	2	34	0	0	0	0	...	14331.0	14948.0	15549.0	1518.0	1500.0	1000.0	1000.0	1000.0	5000.0	0
3	4	50000.0	2	2	1	37	0	0	0	0	...	28314.0	28959.0	29547.0	2000.0	2019.0	1200.0	1100.0	1069.0	1000.0	0
4	5	50000.0	1	2	1	57	-1	0	-1	0	...	20940.0	19146.0	19131.0	2000.0	36681.0	10000.0	9000.0	689.0	679.0	0

5 rows × 25 columns

Limpieza y Preparación de Datos

Código

print(df.isnull().sum())

print(df.duplicated().sum())

# df = df.drop_duplicates()

print(df.info())


# Las columnas PAY_i con i={0,2,3,...,6} representan el estado de pago del cliente
# -1: Pago a tiempo, 0: Pago debido, 1: 1-30 diás moroso, .... , 8: 180 + días moroso

# Estandarizamos estas categorias
pay_cols = ["PAY_" + str(i) for i in [0,2,3,4,5,6]]

# Reemplazamos valores por categorías más fáciles de interpretar
df[pay_cols] = df[pay_cols].replace({-2: 'No deuda',-1: 'Corriente', 0: 'Corriente',
                                      1: '1-30 días', 2: '31-60 días', 3: '61-90 días',
                                      4: '91-120 días', 5: '121-150 días',
                                      6: '151-180 días', 7: '180+ días', 8: '180+ días'})
                                      
# Definir el orden deseado de los estados de pago
estado_pago_orden = ['No deuda', 'Corriente', '1-30 días', '31-60 días', '61-90 días', '91-120 días', '121-150 días', '151-180 días', '180+ días']
for col in pay_cols:
  df[col]=pd.Categorical(df[col], categories= estado_pago_orden, ordered = True)

ID                            0
LIMIT_BAL                     0
SEX                           0
EDUCATION                     0
MARRIAGE                      0
AGE                           0
PAY_0                         0
PAY_2                         0
PAY_3                         0
PAY_4                         0
PAY_5                         0
PAY_6                         0
BILL_AMT1                     0
BILL_AMT2                     0
BILL_AMT3                     0
BILL_AMT4                     0
BILL_AMT5                     0
BILL_AMT6                     0
PAY_AMT1                      0
PAY_AMT2                      0
PAY_AMT3                      0
PAY_AMT4                      0
PAY_AMT5                      0
PAY_AMT6                      0
default.payment.next.month    0
dtype: int64
0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 25 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   ID                          30000 non-null  int64  
 1   LIMIT_BAL                   30000 non-null  float64
 2   SEX                         30000 non-null  int64  
 3   EDUCATION                   30000 non-null  int64  
 4   MARRIAGE                    30000 non-null  int64  
 5   AGE                         30000 non-null  int64  
 6   PAY_0                       30000 non-null  int64  
 7   PAY_2                       30000 non-null  int64  
 8   PAY_3                       30000 non-null  int64  
 9   PAY_4                       30000 non-null  int64  
 10  PAY_5                       30000 non-null  int64  
 11  PAY_6                       30000 non-null  int64  
 12  BILL_AMT1                   30000 non-null  float64
 13  BILL_AMT2                   30000 non-null  float64
 14  BILL_AMT3                   30000 non-null  float64
 15  BILL_AMT4                   30000 non-null  float64
 16  BILL_AMT5                   30000 non-null  float64
 17  BILL_AMT6                   30000 non-null  float64
 18  PAY_AMT1                    30000 non-null  float64
 19  PAY_AMT2                    30000 non-null  float64
 20  PAY_AMT3                    30000 non-null  float64
 21  PAY_AMT4                    30000 non-null  float64
 22  PAY_AMT5                    30000 non-null  float64
 23  PAY_AMT6                    30000 non-null  float64
 24  default.payment.next.month  30000 non-null  int64  
dtypes: float64(13), int64(12)
memory usage: 5.7 MB
None

Cálculo del Roll Rate

Código

# Cálculo de las tasas de transición (Roll Rate)
roll_rates = {}

# Iteramos por pares de meses para calcular la tasa de transición entre estados de pago
for i in range(len(pay_cols)-1):
  transition_matrix = pd.crosstab(df[pay_cols[i]], df[pay_cols[i+1]], normalize= 'index')
  label = f'Transición de {pay_cols[i]} a {pay_cols[i+1]}'
  roll_rates[label] = transition_matrix
  
# Resumen de tasas de transición
roll_rates['Transición de PAY_0 a PAY_2']

PAY_2	No deuda	Corriente	1-30 días	31-60 días	61-90 días	91-120 días	121-150 días	151-180 días	180+ días
PAY_0
No deuda	0.928235	0.069953	0.000000	0.001812	0.000000	0.000000	0.000000	0.000000	0.000000
Corriente	0.000000	0.978358	0.000000	0.018949	0.002301	0.000245	0.000147	0.000000	0.000000
1-30 días	0.331074	0.166757	0.007592	0.453362	0.029555	0.008677	0.001898	0.000542	0.000542
31-60 días	0.000000	0.371579	0.000000	0.596550	0.026622	0.005249	0.000000	0.000000	0.000000
61-90 días	0.000000	0.000000	0.000000	0.844720	0.127329	0.024845	0.003106	0.000000	0.000000
91-120 días	0.000000	0.000000	0.000000	0.000000	0.763158	0.197368	0.039474	0.000000	0.000000
121-150 días	0.000000	0.000000	0.000000	0.000000	0.000000	0.961538	0.000000	0.038462	0.000000
151-180 días	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	1.000000	0.000000	0.000000
180+ días	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.321429	0.678571

Visualización de Resultados

Código

# Visualización de cada matriz de transición utilizando heatmaps
for label, matrix in roll_rates.items():
  plt.figure(figsize= (10,6))
  sb.heatmap(matrix, annot = True, cmap = "YlGnBu", cbar = True)
  plt.title(label)
  plt.xlabel('Estado de Pago Siguiente')
  plt.ylabel('Estado de Pago Actual')
  plt.show()
  plt.clf()

<Figure size 672x480 with 0 Axes>

<Figure size 672x480 with 0 Axes>

<Figure size 672x480 with 0 Axes>

<Figure size 672x480 with 0 Axes>

<Figure size 672x480 with 0 Axes>