large language models

To train critique models capable of delivering step-level supervision and constructive feedback for reasoning, we introduce AutoMathCritique—an automated and scalable framework for collecting critique data.
This framework consists of three main stages: flawed reasoning path construction, critique generation, and data filtering. Using AutoMathCritique, we create a dataset containing $76,321$ samples named MathCritique-76k.

Categories:
17 Views

CodePromptEval is a dataset of 7072 prompts designed to evaluate five prompt techniques (few-shot, persona, chain-of-thought, function signature, list of packages) and their effect on the correctness, similarity, and quality of complete functions generated. Each data point in the dataset includes a function generation task, a combination of prompt techniques to be applied, the prompt in natural language that applied the prompt techniques, the ground truth of the functions (human-written functions based on CoderEval dataset by Yu et al.), the tests to evaluate the correctness of the generate

Categories:
13 Views

Maternal, sexual and reproductive healthcare (MSRH) are sensitive urgent public health issues that require timely trustworthy authentic medical responses. Unfortunately, curative healthcare systems of Low Middle-Income Countries (LMICs) are insufficiently responsive to such healthcare needs. Such needs vary among social groups often founded on social inequalities like income, gender and education.

Last Updated On: 
Sun, 06/30/2024 - 14:01

This dataset comprises over 38,000 seed inputs generated from a range of Large Language Models (LLMs), including ChatGPT-3.5, ChatGPT-4, Claude-Opus, Claude-Instant, and Gemini Pro 1.0, specifically designed for the application in fuzzing Python functions. These seeds were produced as part of a study evaluating the utility of LLMs in automating the creation of effective fuzzing inputs, a method crucial for uncovering software defects in the Python programming environment where traditional methods show limitations.

Categories:
189 Views

The rise in Generative Artificial Intelligence technology through applications like ChatGPT has increased awareness about the presence of biases within machine learning models themselves. The data that Large Language Models (LLMs) are trained upon contain inherent biases as they reflect societal biases and stereotypes. This can lead to the further propagation of biases. In this paper, I establish a baseline measurement of the gender and racial bias within the domains of crime and employment across major LLMs using “ground truth” data published by the U.S.

Categories:
342 Views

This Named Entities dataset is implemented by employing the widely used Large Language Model (LLM), BERT, on the CORD-19 biomedical literature corpus. By fine-tuning the pre-trained BERT on the CORD-NER dataset, the model gains the ability to comprehend the context and semantics of biomedical named entities. The refined model is then utilized on the CORD-19 to extract more contextually relevant and updated named entities. However, fine-tuning large datasets with LLMs poses a challenge. To counter this, two distinct sampling methodologies are utilized.

Categories:
329 Views

This is the full ChatGPT transcript for the IEEE Power Engineering Letter "On the Potential of ChatGPT to Generate Distribution Systems for Load Flow Studies using OpenDSS". The abstract for the letter is as follows:

Categories:
979 Views