CodePromptEval

Citation Author(s):: Ranim Khojah (Chalmers University of Technology and University of Gothenburg)

Francisco Gomes de Oliveira Neto (Chalmers University of Technology and University of Gothenburg)

Mazen Mohamad (RISE Reseach Institutes of Sweden | Chalmers University of Technology and University of Gothenburg)

Philipp Leitner (Chalmers University of Technology and University of Gothenburg)
Submitted by:: Ranim Khojah
Last updated:: Sun, 12/22/2024 - 16:51
DOI:: 10.21227/sj94-ez71
Data Format:: *.csv
Links:: GitHub page

33 views

Categories:

Keywords:

AI in Software Engineering

Human-AI Interaction

Code Generation

large language models

ACCESS DATASET CITE

Abstract

CodePromptEval is a dataset of 7072 prompts designed to evaluate five prompt techniques (few-shot, persona, chain-of-thought, function signature, list of packages) and their effect on the correctness, similarity, and quality of complete functions generated. Each data point in the dataset includes a function generation task, a combination of prompt techniques to be applied, the prompt in natural language that applied the prompt techniques, the ground truth of the functions (human-written functions based on CoderEval dataset by Yu et al.), the tests to evaluate the correctness of the generated functions. The prompts in the dataset are carefully designed to apply the five prompt techniques.

Instructions:

The dataset is in csv format. It is recommended to upload the dataset using Pandas dataframes.

Dataset Files

CodePromptEval.csv (Size: 22.7 MB)

Datasets

Standard Dataset

CodePromptEval

Abstract

Instructions:

Dataset Files

QUESTIONS?

More like this Dataset

Cardinal RF (CardRF): An Outdoor UAV/UAS/Drone RF Signals with Bluetooth and WiFi Signals Dataset

A Dataset on Online Learning-based Web Behavior from Different Countries Before and After COVID-19

Gestational Diabetes

Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions

Learning Style Identification

Data collection of user-generated content of social network of communities Reddit in 2023