Datasets
Open Access
File Fragment Type (FFT) - 75 Dataset
- Citation Author(s):
- Submitted by:
- Pawel Korus
- Last updated:
- Tue, 05/17/2022 - 22:21
- DOI:
- 10.21227/kfxw-8084
- Data Format:
- Link to Paper:
- Links:
- License:
- Creative Commons Attribution
3341 Views
- Categories:
- Keywords:
0 ratings - Please login to submit your rating.
Abstract
This FFT-75 dataset contains randomly sampled, potentially overlapping file fragments from 75 popular file types (see details below). It is the most diverse and balanced dataset available to the best of our knowledge. The dataset is labeled with class IDs and is ready for training supervised machine learning models. We distinguish 6 different scenarios with different granularity and provide variants with 512 and 4096-byte blocks. In each case, we sampled a balanced dataset and split the data as follows: 80% for training, 10% for testing and 10% for validation.
Instructions:
See documentation (readme.md).
Dataset Files
- Scenario #1 (4096-byte blocks) 4k_1.tar.gz (24.18 GB)
- Scenario #2 (4096-byte blocks) 4k_2.tar.gz (6.08 GB)
- Scenario #3 (4096-byte blocks) 4k_3.tar.gz (8.25 GB)
- Scenario #4 (4096-byte blocks) 4k_4.tar.gz (3.72 GB)
- Scenario #5 (4096-byte blocks) 4k_5.tar.gz (3.67 GB)
- Scenario #6 (4096-byte blocks) 4k_6.tar.gz (3.77 GB)
- Scenario #1 (512-byte blocks) 512_1.tar.gz (3.18 GB)
- Scenario #2 (512-byte blocks) 512_2.tar.gz (825.67 MB)
- Scenario #3 (512-byte blocks) 512_3.tar.gz (1.06 GB)
- Scenario #4 (512-byte blocks) 512_4.tar.gz (490.66 MB)
- Scenario #5 (512-byte blocks) 512_5.tar.gz (484.36 MB)
- Scenario #6 (512-byte blocks) 512_6.tar.gz (490.54 MB)
- Human-readable labels classes.json (902 bytes)
- Use-case hierarchy tags.txt (1.09 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
readme.md | 10.17 KB |
Comments
-
thanks!