A Perceptual Study of the Decoding Process of the SoftCast Wireless Video Broadcast Scheme
The SoftCast scheme has been proposed as a promising alternative to traditional video broadcasting systems in wireless environments. In its current form, SoftCast performs image decoding at the receiver side by using a Linear Least Square Error (LLSE) estimator. Such approach maximizes the reconstructed quality in terms of Peak Signal-to-Noise Ratio (PSNR). However, we show that the LLSE induces an annoying blur effect at low Channel Signal-to-Noise Ratio (CSNR) quality. To cancel this artifact, we propose to replace the LLSE estimator by the Zero-Forcing (ZF) one. In order to better understand the perceived quality offered by these two estimators, a mathematical characterization as well as an objective and subjective studies are performed. Results show that the gains brought by the LLSE estimator, in terms of PSNR and Structural SIMiliraty (SSIM), are limited and quickly tend to null value as the CSNR increases. However, higher gains are obtained by the ZF estimator when considering the recent Video Multi-method Assessment Fusion (VMAF) metric proposed by Netflix, which evaluates the perceptual video quality. This result is confirmed by the subjective assessment.
For more information, please refer to the following paper:
Anthony Trioux, Giuseppe Valenzise, Marco Cagnazzo, Michel Kieffer, François-Xavier Coudoux, et al., A Perceptual Study of the Decoding Process of the SoftCast Wireless Video Broadcast Scheme. 2021 IEEE Workshop on Multimedia Signal Processing (MMSP), Oct. 2021, Tampere, Finland.
The SoftCast Database consists of 8 RAW HD reference videos and 156 cropped videos transmitted and received through the SoftCast linear video coding and transmission scheme considering either the LLSE or the ZF estimator. Each video has a duration of 5 seconds. Note that only the luminance is considered in this database. Furthermore, the number of frames depends on the framerate of the video (125 frames for 25fps and 150frames for 30fps).
The GoP-size was set to 32 frames, 2 compression ratio (CR) were considered: CR=1 (no compression applied) and CR=0.25 (75% of the DCT coefficients are discarded before transmission). The Channel Signal-to-Noise Ratio (CSNR) considered in this test vary from 0 to 27dB by 3dB step. This database was evaluated by 30 participants (9 women and 21 men). They were asked to select which one of the two displayed version of the reconstructed videos they prefered based on a Forced-choice PairWise Comparison test. A training session was organized prior to the test for each observer in order to familiarize them with the procedure.
Video files are named using the following structure:
Video_filename_y_only_GoP_32_CR_X_Y_ZdB_crop.yuv where X equals either 1 or 0.25 Y refers to the estimator used (ZF or LLSE) and Z is either equal to 0,3,6,9,12,15,18,21,24 or 27dB.
The original video files are denoted: Video_filename_y_only_crop.yuv.
Each video file is in *.yuv format (4:2:0) where the chrominance plans are all set to 128. (This process allows to perform the VMAF computation as VMAF requires either a yuv420p, yuv422p, yuv444p, yuv420p10le, yuv422p10le or yuv444p10le video format).
The preference scores for each of the stimuli are available in the PWC_scores.xls file.
The objective scores (frame by frame) for each videos are available in the objective_scores_ZF_LLSE.zip file.
- PWC_scores.xlsx (16.54 kB)
- objective_scores_ZF_LLSE.zip (3.45 MB)
- SoftCast_ZF_LLSE_database.zip (15.90 GB)