Abstract

This paper conducts in-depth research on three text classification tasks: sentiment analysis, offensive language identification, and news topic classification. The datasets used are Stanford Sentiment Treebank (SST-2), Offensive Language Identification Dataset (OLID), and AG's News. We prepare two types of data for different datasets: one is the poisoned dataset with backdoors embedded in it (with different poisoning rates), and the other is the test dataset after defense processing (to evaluate the robustness of different backdoor attack methods to defense strategies). For example, we created 6 versions of the training set of the sst-2 dataset with different poisoning rates to analyze the backdoor attack performance under different poisoning rates. In addition, we also used LLM to perform defense processing such as ONION on the test set of each dataset to evaluate the resistance of the backdoor attack method to the defense strategy.

Instructions:

Dataset Files

SST-2-part.zip (4.31 MB)
datasets-code-part.zip (3.74 kB)

Datasets

Standard Dataset

Double Landmines:data-part

Abstract

Dataset Files

QUESTIONS?