Prompt Datasets to Evaluate LLM Safety

Citation Author(s):
Hima
Thota
Submitted by:
Hima Thota
Last updated:
Sat, 05/18/2024 - 21:58
DOI:
10.21227/gjej-zp03
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The rise in Generative Artificial Intelligence technology through applications like ChatGPT has increased awareness about the presence of biases within machine learning models themselves. The data that Large Language Models (LLMs) are trained upon contain inherent biases as they reflect societal biases and stereotypes. This can lead to the further propagation of biases. In this paper, I establish a baseline measurement of the gender and racial bias within the domains of crime and employment across major LLMs using “ground truth” data published by the U.S. Bureau of Labor and the FBI’s UCR program. I then propose the novel approach of fact-based prompting (called “fact-shot prompting”) in mitigating bias in LLM-generated responses. Fact-shot prompting supplements factual information to a prompt that is given to an LLM. This approach also allows the observation of the effectiveness of current LLM training. I observe patterns in predictions involving race and gender when applied to various occupations and crimes. This approach to bias mitigation within LLMs themselves may provide key insights on specific areas of training failure and a plan of correction.

Comments

This dataset accompanies a research paper titled: Enhancing Large Language Model Safety: A Novel Fact-Based Prompting Approach to Mitigating Bias
 

Submitted by Hima Thota on Sat, 05/18/2024 - 21:59