Credit Card Fraud Detection - Turkish Data

Yapı Kredi Teknoloji
Daniel de Souza
Thu, 11/18/2021 - 20:12
This dataset was provided by the Turkish company Yapi Kredi Teknoloji and contains a set of 62,380 transactions, of which 61,572 (98.70%) are genuine, and 808 (1.30%) are fraudulent. This imbalanced dataset has 26 variables: the transactions timestamps, the class label (1 for fraud and 0 otherwise), and 24 additional explaining variables. This dataset was also made available in two parts, being one of them for training/testing, with 31,190 transactions (30,400 genuine and 790 fraudulent) and the other, containing 31,190 observations (31,172 genuine e 18 fraudulent) for validation. Despite the transactions represented in this dataset being actual, the variables are numeric, resulting from preprocessing by Principal Component Analysis (PCA) [43], and do not have a description due to confidentiality concerns.

  • Balanced Training Dataset

Dataset Description: The dataset is anonymized with PCA method and balanced at card level to reduce the high-class imbalance. Half of these credit cards are selected based on the criteria of having at least one fraudulent transaction in the given time frame. Accordingly, the remaining half consist of credit cards that do not have any fraudulent transaction in the time frame. This dataset can be used to train models but should not be used to evaluate their performance. For the evaluation please use the unbalanced dataset that is also provided.

  • Unbalanced Test Dataset

Dataset Description: Unbalanced test set anonymized with PCA and containing transactions of all credit cards in a certain time frame.


