Dataset for lossless compression testing on already compressed data

Citation Author(s):
José Flávio
Gomes Barros
Letícia
Cabral
Allan Kardec
Barros
Submitted by:
Flavio Barros
Last updated:
Tue, 05/23/2023 - 13:50
DOI:
10.21227/hpch-xv71
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The following dataset is used to test lossless compression algorithm on already compressed data, i.e. reduce data size even further without compromising its integrity. This Dataset has 1.000 “.txt” text files extracted from other Datasets and compressed in “.zip” format with a maximum size of 1 kB (Kilobyte), that is, 1.024 bytes. The largest Dataset file has 1670 characters including “text_zip (970).zip” spaces, and the smallest repository file has 14 characters with “text_zip (1).zip” spaces. The maximum compression of the Dataset file is 25.42%, file “text_zip (5).zip”, and a minimum of 1.53%, file “text_zip (972).zip”. The average compression of the entire repository is 8.16%. All dataset files have significant lossless compression.

Instructions: 

- Publication: Lossless Compression Algorithms in Compressed Data (submitted).

- Matlab processing code: contact the authors at flavioifma@gmail.com, leticia.correia@ifma.edu.br, allan.kardec@ufma.edu.br.