Datasets
Standard Dataset
DataCredit_With_External_Factors
- Citation Author(s):
- Submitted by:
- Jomark Noriega
- Last updated:
- Wed, 10/16/2024 - 01:01
- DOI:
- 10.21227/qy2q-1f11
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset offers a comprehensive mix of financial, demographic, temporal, and external factor data to help predict credit delinquency. It includes key information such as loan terms, credit balances, and effective interest rates, along with client details like salary, marital status, and profession.
In addition to tracking historical credit behavior and overdue days at different time points, the dataset incorporates critical external factors, including climate change, social unrest, and global crises like COVID-19, which may influence payment delays and financial behavior.
With this broad scope, the dataset is well-suited for building machine learning models that can improve credit risk management by analyzing the combined effects of financial, socio-demographic, and external influences.
1. Columns: Each column is described with its name, data type, and meaning. Familiarize yourself with these details to understand what each field represents.
2. Data Types: Columns are classified as integer, decimal, bit, or string. This tells you how to handle the data:
Integer: Whole numbers.
Decimal: Numbers with decimal points, often for financial values.
Bit: Binary values (0 or 1).
String: Text fields, like income source codes.
3. Data Origins: Columns may contain extracted, calculated, or time series data, indicating whether they are raw values, derived metrics, or changing over time.
4. Delinquency Data: Fields like DiasVencido and Class track overdue payments. Binary classifications help flag cases where payment delays exceed certain thresholds (e.g., 30 or 29 days).
5. External Factors: Includes variables related to external events like COVID 19, climate change and social unrest, useful for analyzing their impact on delinquency.
6. Normalized Data: Some fields, such as SalarioNormalizado, are adjusted relative to other values, so they may not reflect the original scale.
7. Time Series Data: Delinquency information is available for multiple months. Ensure consistency when analyzing trends over time.
Dataset Files
- The dataset includes financial, demographic, and time-based data to analyze credit delinquency. Data011_with_FE.zip (39.04 MB)
- Road Blockage Dataset ALERTAS_INTERRUMPIDO_RESTRINGIDO_2022-06_2023-11-01.zip (1.57 MB)
- Covid -19 deaths dataset fallecidos_sinadef.zip (42.85 MB)
- Covid -19 positive cases dataset positivos_covid.zip (58.71 MB)
- Temperature anomaly dataset Temperatura4.zip (462.76 kB)
- Time series data set of default by economic activity 2020 - 2023 DiasMoraxActividad.zip (16.28 MB)
- Time series data set of default by economic activity 2016 - 2023 DiasMoraxActividad_2016_2023.zip (41.89 MB)
Documentation
Attachment | Size |
---|---|
Data dictionary FE | 114.2 KB |
Data dictionary Time series | 2.02 KB |
Road Blockages Data_Dictionary | 786 bytes |
COVID 19 DEATH CASES Data_Dictionary | 2.05 KB |
COVID 19 POSITIVE CASES Data_Dictionary | 809 bytes |
TEMPERATURE ANOMALY Data_Dictionary | 457 bytes |