Differentiable Image Compression via KAN-Driven Dynamic Quantization (model and dataset)

Citation Author(s):: 天翼刘 (兰州大学)
Submitted by:: TianYi Liu
Last updated:: Wed, 02/12/2025 - 07:15
DOI:: 10.21227/jrxa-rq63

140 views

Categories:

Artificial Intelligence

Keywords:

KAN

ACCESS DATASET CITE

Abstract

The rapid evolution of visual data demands compression technologies that balance theoretical expressiveness with practical deployment constraints. Current learning-based approaches face dual challenges: non-differentiable quantization operations that hinder end-to-end optimization, and rigid architectural components limiting adaptability to diverse content characteristics. This paper introduces a novel neural compression framework that integrates principles from Kolmogorov-Arnold Networks (KANs) with dynamic quantization mechanisms. Our threefold contribution addresses these limitations through: (1) A spline-enhanced hybrid architecture combining KAN's adaptive nonlinearities with convolutional feature extraction, theoretically grounded in function decomposition theory; (2) A trainable quantization process employing content-dependent step sizes with bounded gradient approximation errors; (3) An autonomous rate control system that dynamically balances distortion and entropy constraints. Extensive evaluations demonstrate the framework's superiority in rate-distortion performance compared to state-of-the-art codecs, particularly in preserving high-frequency components critical for perceptual quality. Practical implementations reveal robust performance across standard benchmarks and emerging multimedia formats. Beyond immediate compression applications, this work establishes foundational insights for developing explainable neural codecs, suggesting promising extensions to video and volumetric data compression through adaptive basis function learning.

Instructions:

The dataset utilized in this study is the well-known Kodak dataset (commonly referred to as Kodak24 or kodim). This dataset contains 24 high-quality, uncompressed true-color (RGB) images with a resolution of 768×512 pixels, widely used as a benchmark in the image compression and quality evaluation community. Each file is stored in the PNG format, preserving the original image fidelity. The Kodak dataset provides a diverse range of image content, including landscapes, portraits, and indoor scenes, enabling a comprehensive evaluation of compression algorithms across various visual contexts.

In addition to the evaluation dataset, our work includes a complete set of training logs, meticulously recorded during the model training process. These logs include detailed epoch-level and batch-level records of loss values, gradient information, and other key metrics that can aid in the analysis, reproducibility, and future optimization efforts. The logs serve as a valuable resource for researchers to understand the behavior of the model during training and can also support insights into convergence trends.