Content-based file identification (512-byte)

Citation Author(s):: Saja Khudhur (University of Technology-Iraq)

Hassan Jeiad
Submitted by:: Saja Khudhur
Last updated:: Fri, 10/21/2022 - 18:16
DOI:: 10.21227/mef8-ft96
Data Format:: *.avi; *.csv; *.txt
Research Article Link:: A Content-based File Identification Dataset: collection, construction, and eval…
Links:: A Content-based File Identification Dataset: collection

77 views

Categories:

ACCESS DATASET CITE

Abstract

content-based dataset that composes of 12 features for eight common types of files (JPG, PNG, HTML, TXT, MP4, M4A, MOV, and MP3) to be suitable for file type identification (FTI). These features were extracted from pool of file fragment of size 512 byte each from all the prementioned eight types. This dataset is developed in such a way that can be used for supervised and unsupervised ML model. It provides the ability to classifying and clustering the above-mentioned type into two levels. As a fine grain level (by their file type exactly, JPG, PNG, HTML, TXT, MP4, M4A, MOV, and MP3) and as a coarse-grain level (by their broad type, image, text, audio, video).

Datasets

Standard Dataset

Content-based file identification (512-byte)

Abstract

Instructions:

Dataset Files

QUESTIONS?

More like this Dataset

Weather Monitoring Station For Farms And Agriculture

Trilateration based on RSSI values in transmitters and receivers

The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs)

Retinal Fundus Multi-disease Image Dataset (RFMiD)

Experimental database for detecting and diagnosing rotor broken bar in a three-phase induction motor.

Dataset for classification of handwritten and printed text in a Doctor's prescription