Prompt Datasets to Evaluate LLM Safety

Citation Author(s):
Hima
Thota
Submitted by:
Hima Thota
Last updated:
Sat, 05/18/2024 - 21:58
DOI:
10.21227/gjej-zp03
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The rise in Generative Artificial Intelligence technology through applications like ChatGPT has increased awareness about the presence of biases within machine learning models themselves. The data that Large Language Models (LLMs) are trained upon contain inherent biases as they reflect societal biases and stereotypes. This can lead to the further propagation of biases. In this paper, I establish a baseline measurement of the gender and racial bias within the domains of crime and employment across major LLMs using “ground truth” data published by the U.S. Bureau of Labor and the FBI’s UCR program. I then propose the novel approach of fact-based prompting (called “fact-shot prompting”) in mitigating bias in LLM-generated responses. Fact-shot prompting supplements factual information to a prompt that is given to an LLM. This approach also allows the observation of the effectiveness of current LLM training. I observe patterns in predictions involving race and gender when applied to various occupations and crimes. This approach to bias mitigation within LLMs themselves may provide key insights on specific areas of training failure and a plan of correction.

Instructions: 

The csv files containing the word "prompt" are present in the categories of race, gender, and crime. Each model (Gemini, Vertex, GPT) have their own separate prompt file across these three categories. These are the files used to generate responses from the LLMs using their respective APIs, and they are to be saved. We also provide the model responses to these initial prompts. 

The csv files containing prompt_grounded are the same prompts used before, except they contain factual data that is meant to deter the model from biased predictions. Each model (Gemini, Vertex, GPT) have their own separate prompt file across these three categories. These files are used to generate responses from each LLM model using their respective APIs, and should be saved. We also provide the saved responses we retrieved after using "grounding" prompts. 

With both prompts, a side by side comparison can be conducted in order to evaluate the efficacy of grounding on mitigating model bias in their three respective aeas. 

Comments

This dataset accompanies a research paper titled: Enhancing Large Language Model Safety: A Novel Fact-Based Prompting Approach to Mitigating Bias
 

Submitted by Hima Thota on Sat, 05/18/2024 - 21:59