Datasets
Standard Dataset
Double Landmines:data-part
- Citation Author(s):
- Submitted by:
- Hou Yang
- Last updated:
- Tue, 12/24/2024 - 03:09
- DOI:
- 10.21227/5vj2-5392
- License:
Abstract
This paper conducts in-depth research on three text classification tasks: sentiment analysis, offensive language identification, and news topic classification. The datasets used are Stanford Sentiment Treebank (SST-2), Offensive Language Identification Dataset (OLID), and AG's News. We prepare two types of data for different datasets: one is the poisoned dataset with backdoors embedded in it (with different poisoning rates), and the other is the test dataset after defense processing (to evaluate the robustness of different backdoor attack methods to defense strategies). For example, we created 6 versions of the training set of the sst-2 dataset with different poisoning rates to analyze the backdoor attack performance under different poisoning rates. In addition, we also used LLM to perform defense processing such as ONION on the test set of each dataset to evaluate the resistance of the backdoor attack method to the defense strategy.
This paper conducts in-depth research on three text classification tasks: sentiment analysis, offensive language identification, and news topic classification. The datasets used are Stanford Sentiment Treebank (SST-2), Offensive Language Identification Dataset (OLID), and AG's News. We prepare two types of data for different datasets: one is the poisoned dataset with backdoors embedded in it (with different poisoning rates), and the other is the test dataset after defense processing (to evaluate the robustness of different backdoor attack methods to defense strategies). For example, we created 6 versions of the training set of the sst-2 dataset with different poisoning rates to analyze the backdoor attack performance under different poisoning rates. In addition, we also used LLM to perform defense processing such as ONION on the test set of each dataset to evaluate the resistance of the backdoor attack method to the defense strategy.
Dataset Files
- SST-2-part.zip (4.31 MB)
- datasets-code-part.zip (3.74 kB)