Data for the study has been retrieved from a publicly available data set of a leading European P2P lending platform, Bondora (https://www.bondora.com/en). The retrieved data is a pool of both defaulted and non-defaulted loans from the time period between 1st March 2009 and 27th January 2020. The data comprises demographic and financial information of borrowers and loan transactions. In P2P lending, loans are typically uncollateralized and lenders seek higher returns as compensation for the financial risk they take. In addition, they need to make decisions under information asymmetry that works in favor of the borrowers. In order to make rational decisions, lenders want to minimize the risk of default of each lending decision and realize the return that compensates for the risk.

As in the financial research domain, there are very few datasets available that can be utilized for building and analyzing credit risk models. This dataset will help the research community in building and performing research in the credit risk domain.


The dataset also consists of data preprocessing Jupyter notebook that will help in working with the data and to perform basic data pre-processing. The zip file of the dataset consists of pre-processed and raw dataset directly extracted from the Bondora website https://www.bondora.com/en.

In the attached notebook, I have used my intuition and assumption for performing data-preprocessing.


Thanku so much for this data set. I will be using the same for my ML project

Submitted by Sany Sunney on Wed, 01/06/2021 - 02:05

Could you provide the description of the name of variables in the preprocessed dataset? I have found some descriptions from the Bondora website, but there are some variables that have no descriptions. (e.g. Default, does 1 mean yes and 0 mean no?)

Submitted by Ruoyi Nan on Mon, 03/15/2021 - 08:49