Credit Card Fraud Detection - European Data

Citation Author(s):
Machine Learning Group
Submitted by:
Daniel de Souza
Last updated:
Thu, 11/18/2021 - 20:11
Data Format:
0 ratings - Please login to submit your rating.


This dataset contains information about actual credit card transactions that occurred in Europe in September of 2013. This dataset is unbalanced, with a total of 284,807 transactions, in which 284,315 (99.83%) are genuine, and 492 (0.17%) are fraudulent. It has a total of 31 variables: the transactions timestamps, the class label (1 for fraud and 0 otherwise), and 29 additional explaining variables. This dataset was made available in two parts, being one of them for training/testing, with 227,485 transactions (227,455 genuine and 390 fraudulent) and the other, containing 56,962 observations (56,860 genuine e 102 fraudulent), for validation.

  • Balanced Training Dataset

Dataset Description: The dataset is anonymized with PCA method and balanced at card level to reduce the high-class imbalance. Half of these credit cards are selected based on the criteria of having at least one fraudulent transaction in the given time frame. Accordingly, the remaining half consist of credit cards that do not have any fraudulent transaction in the time frame. This dataset can be used to train models but should not be used to evaluate their performance. For the evaluation please use the unbalanced dataset that is also provided.

  • Unbalanced Test Dataset

Dataset Description: Unbalanced test set anonymized with PCA and containing transactions of all credit cards in a certain time frame.