content-based dataset that composes of 12 features for eight common types of files (JPG, PNG, HTML, TXT, MP4, M4A, MOV, and MP3) to be suitable for file type identification (FTI). These features were extracted from pool of file fragment of size 512 byte each from all the prementioned eight types. This dataset is developed in such a way that can be used for supervised and unsupervised ML model. It provides the ability to classifying and clustering the above-mentioned type into two levels.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Saja Khudhur, Hassan Jeiad, "Content-based file identification (512-byte)", IEEE Dataport, 2022. [Online]. Available: http://dx.doi.org/10.21227/mef8-ft96. Accessed: Dec. 11, 2023.
@data{mef8-ft96-22,
doi = {10.21227/mef8-ft96},
url = {http://dx.doi.org/10.21227/mef8-ft96},
author = {Saja Khudhur; Hassan Jeiad },
publisher = {IEEE Dataport},
title = {Content-based file identification (512-byte)},
year = {2022} }
TY - DATA
T1 - Content-based file identification (512-byte)
AU - Saja Khudhur; Hassan Jeiad
PY - 2022
PB - IEEE Dataport
UR - 10.21227/mef8-ft96
ER -
Saja Khudhur, Hassan Jeiad. (2022). Content-based file identification (512-byte). IEEE Dataport. http://dx.doi.org/10.21227/mef8-ft96
Saja Khudhur, Hassan Jeiad, 2022. Content-based file identification (512-byte). Available at: http://dx.doi.org/10.21227/mef8-ft96.
Saja Khudhur, Hassan Jeiad. (2022). "Content-based file identification (512-byte)." Web.
1. Saja Khudhur, Hassan Jeiad. Content-based file identification (512-byte) [Internet]. IEEE Dataport; 2022. Available from : http://dx.doi.org/10.21227/mef8-ft96
Saja Khudhur, Hassan Jeiad. "Content-based file identification (512-byte)." doi: 10.21227/mef8-ft96