LLM

LLM Empowering Urban Science: The Exploration of Constructing a New Instruction Dataset

The application of large language models (LLMs) in urban planning has gained momentum, with prior research demonstrating their value in participatory planning, process streamlining, and event forecasting. This study focuses on further enhancing urban planning through the integration of more comprehensive datasets. We introduce a newly developed instruction dataset that amalgamates crucial information from several prominent urban datasets, including highD, NGSIM, the Road Networks dataset, TLC Trip data, and the Urban Flow Prediction Survey dataset.

Categories:: Artificial Intelligence
Transportation

14 Views

DragonVerseQA

DragonVerseQA is an open-domain and long-form Over-The-Top (OTT) Question-Answering (QA) dataset specifically oriented to the fantasy universe of "The House of the Dragon" and "Game Of Thrones" TV series. The curated dataset combines full episode summaries sourced from HBO and fandom wiki websites, user reviews from sources like IMDb and Rotten Tomatoes, and high-quality, open-domain, legally admissible sources, and structured data from repositories like WikiData into one dataset.

Categories:: Artificial Intelligence

140 Views

CHVM-1K-A1

Partial dataset of CHVM-1K dataset for illustration purposes.

{

"question": "What stages can be divided into in the development history of ancient Chinese bronzes? Why?",

"answer": "The development history of ancient Chinese bronzes can be divided into several stages: Xia (2100-1600 BCE), Shang (1600-1046 BCE), Early Western Zhou (1046-771 BCE), Middle Western Zhou (771-720 BCE), Late Western Zhou (720-256 BCE), and Eastern Zhou (256-256 BCE). These stages are marked by technological advancements, stylistic evolution, and cultural significance.",

Categories:: Artificial Intelligence

13 Views

Human Thinking Data

Recent research indicates that fine-tuning smaller parameter language models using reasoning samples generated by large languages models (LLMs) can effectively enhance the performance of small models in complex reasoning tasks. However, after fine-tuning the small model using the existing Zero-shot-CoT method, there are still shortcomings in problem understanding, mathematical calculations, logical reasoning, and missing steps when handling problems.

Categories:: Artificial Intelligence

40 Views

knowledge conflict test dataset

A dataset to detect knowledge conflict.
The dataset contains 90 groups of natural language sentences with contradictions and 10 groups without contradictions, each group containing 5 sentences, usually 3 identical questions and 2 declarative sentences. The Agent should be able to accurately detect the contradictory statements.

Categories:: Artificial Intelligence

17 Views

Python Program Resource Usage from Fuzzing Corpora

Resource usage fuzzing samples and related data. Contains samples from Pythoin, random data, GPT-3.5, GPT-4, Gemini-1.0, Claude Instant, and Claude Opus. These samples are generated for 50 Python functions. Also included are resource measures for CPU time, instruction count, function calls, peak RAM usage, final RAM allocated, and coverage. These values were collected on an isolated system and account for interference from other processes.

Categories:: Machine Learning
Security

77 Views

Dataset on RAG Pipeline Evaluation for Retrieval and Generative Response Accuracy Testing

This dataset has been meticulously curated to evaluate the efficiency of Retrieval-Augmented Generation (RAG) pipelines in both retrieval and generative accuracy, with a particular focus on scenarios involving overlapping contexts. The dataset comprises two primary components: Motor data and Employee data. The Motor dataset includes master data of various motor models along with their corresponding manuals, linked by the motor's model name. Similarly, the Employee dataset encompasses employee master data and associated policy documents, linked by department.

Categories:: Artificial Intelligence

449 Views

Text2RDF: LLM Fine-tuning Dataset for NER and RE

The Text2RDF dataset is primarily designed to facilitate the transformation from text to RDF. It contains 1,000 annotated text segments, encompassing a total of 7,228 triplets. Utilizing this dataset to fine-tune large language models enables the models to extract triplets from text, which can ultimately be used to construct knowledge graphs.

Categories:: Artificial Intelligence

358 Views