File Fragment Type (FFT) - 75 Dataset

Name: File Fragment Type (FFT) - 75 Dataset
Creator: Pawel Korus
Keywords: Security

Citation Author(s):: Govind Mittal (New York University)

Pawel Korus (New York University)

Nasir Memon (New York University)
Submitted by:: Pawel Korus
Last updated:: Wed, 05/18/2022 - 02:21
DOI:: 10.21227/kfxw-8084
Data Format:: Numpy
Research Article Link:: FiFTy: Large-Scale File Fragment Type Identification Using Convolutional Neural…
Links:: Fifty (File Type Classifier)

3679 views

Categories:

Security

Keywords:

forensics

carving

file type classification

Machine Learning

Multimedia Security

CITE

Abstract

This FFT-75 dataset contains randomly sampled, potentially overlapping file fragments from 75 popular file types (see details below). It is the most diverse and balanced dataset available to the best of our knowledge. The dataset is labeled with class IDs and is ready for training supervised machine learning models. We distinguish 6 different scenarios with different granularity and provide variants with 512 and 4096-byte blocks. In each case, we sampled a balanced dataset and split the data as follows: 80% for training, 10% for testing and 10% for validation.