Datasets
Standard Dataset
Dataset for lossless compression testing on already compressed data
- Citation Author(s):
- Submitted by:
- Flavio Barros
- Last updated:
- Tue, 05/23/2023 - 13:50
- DOI:
- 10.21227/hpch-xv71
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
The following dataset is used to test lossless compression algorithm on already compressed data, i.e. reduce data size even further without compromising its integrity. This Dataset has 1.000 “.txt” text files extracted from other Datasets and compressed in “.zip” format with a maximum size of 1 kB (Kilobyte), that is, 1.024 bytes. The largest Dataset file has 1670 characters including “text_zip (970).zip” spaces, and the smallest repository file has 14 characters with “text_zip (1).zip” spaces. The maximum compression of the Dataset file is 25.42%, file “text_zip (5).zip”, and a minimum of 1.53%, file “text_zip (972).zip”. The average compression of the entire repository is 8.16%. All dataset files have significant lossless compression.
- Publication: Lossless Compression Algorithms in Compressed Data (submitted).
- Matlab processing code: contact the authors at flavioifma@gmail.com, leticia.correia@ifma.edu.br, allan.kardec@ufma.edu.br.