SCO dataset : A dataset for semantic communication

Citation Author(s):
Beijing Key Laboratory of Network System Architecture and Convergence
Beijing Key Laboratory of Network System Architecture and Convergence
Beijing Laboratory of Advanced Information Networks
Beijing Laboratory of Advanced Information Networks
Beijing Key Laboratory of Network System Architecture and Convergence
Submitted by:
Last updated:
Fri, 10/28/2022 - 07:14
Data Format:
0 ratings - Please login to submit your rating.


The integration of communication and artificial intelligence has become a development trend, one of the applications is semantic communication, but the current research lacks the support of comprehensive datasets. To solve this problem, we built a new image and video dataset, named SCO dataset, for the researches on semantic communication and computing. First, we introduce the peculiarities of the dataset, which contains 5100 images and 138 video clips. Secondly, we we give the data generation and processing methods of the dataset, including images and videos. Then we describes the labeling method of the dataset for intelligent tasks, including classfication and object detection tasks, and the evaluation method of the dataset for subjective experience of human users. Next, we give and analysize the experimental results of the dataset. Finally, we give some practical applications of this dataset, including end-to-end semantic communication, joint source and channels semantic coding, and multi-user semantic communication resource management.


The purpose of semantic communication is to extract semantic information such as features of original data through deep learning and other methods, then reduce the amount of transmitted data, improve communication efficiency, and better complete intelligent tasks. Therefore, in essence semantic communication is driven by data and artificial intelligence, and the training and optimization of deep networks in semantic communication systems need the support of a large number of datasets.

The exising datasets are mainly divided into the following two categories. The first category is the intelligent tasks-oriented datasets, such as PASCAL and Waymo, etc. These datasets provide the original data in some specific scenarios. These datasets are mainly used to train and test some deep network models, with labels of specific tasks generally, but these datasets ignore the communication process and without further processing.

Another category is communication process oriented datasets, such as TID2013 and LIVE, etc., these provide some datasets with distortion processing on the orginal data, such as compression, adding noise, etc. Those datasets are often used to optimize the parameters in the communication system, in order to meet the demands of system for QoE and QoS, but these datasets are difficult to obtain due to high cost on data generation, and do not include labels of intelligent tasks.

However, according to the existing work, the research of semantic communication has the following requirements for datasets.

  • The research of semantic communication needs multi-modal data due to the variety of intelligent task.
  • The research of semantic communication needs to pre-process the original data, because semantic communication needs to consider the changes of original data from acquisition to transmission, and evaluate the performance index after data distortion.
  • The research of semantic communication requires different labels of datasets to complete the evaluation of various system performance metrics. In addition to objective metrics of intelligent tasks, such as accuracy of classification tasks and mAP of detection tasks, semantic communication also needs to consider some subjective metrics, such as quality of human user experience.

Based on the above analysis, due to the special needs of semantic communication, the current data sets are difficult to support the research of semantic communication comprehensively. Therefore, this paper constructs a new semantic communication-oriented dataset, named SCO Dataset.

We create a dataset of 5100 images and 138 videos based on existing datasets. The new dataset contains 100 original images and 10 original videos. We simulate the distortion effect during communication, such as blur, compression and noise adding, to generate 5000 expanded images and 128 expanded videos with different distortion methods.

We give the generation methods of different labels in this dataset. For the classification tasks and object detection tasks, labels needed for deep network training, such as categories and object locations, are provided, and the method of label generation is given. At the same time, we test the subjective experience of human users, provide the mean opinion score (MOS) value to quantify the subjective experience, and elaborate the method of human users experience assessment.

We conduct basic testing and numerical analysis on the dataset, provide the performance benchmarks of a series of mainstream deep learning models such as Faster-RCNN and YOLO V3 on this dataset, and give the numerical results of subjective and objective quality assessment on this dataset. In addition, we apply this dataset to end-to-end and multi-user semantic communication and give the corresponding experimental results.

The SOC Dataset includes two parts: Images and Videos. The folder Images contains images of four distortion types: Gaussian blur, compression, Gaussian noise and mixed distortion. The folder name indicates the corresponding distortion type and distortion level, and the corresponding subjective scores are included in the folder Subjective_scores. Each image has 9 subjective scores, and we provided the mean opinion score (MOS) and standard deviations (STD) of these scores to facilitate subsequent application.The Videos folder contains three parts: Videos, Video_labels, and Video_scores. The folder Videos contains a total of 138 videos of 10 scenes and 4 types of distortion from the IVP Subjective Quality Video Database dataset (IVP) and detailed information about IVP. Video_labels contains data labels for classification and object detection, involving Scene categories and object locations, these labels can be used to train deep networks. The Video_scores contains subjective scores from the IVP dataset, as well as scores in classification and object detection tasks.

Tools such as Matlab and Pycharm can be used to analyze the SCO Dataset.