large language models

LLM-RIMSA: Large Language Models driven Reconfgurable Intelligent Metasurface Antenna Systems

The LLM-RIMSA dataset, designed to advance 6G networks through ultra-massive connectivity and intelligent radio environments. The dataset is built around a novel framework that integrates large language models (LLMs) with a reconfigurable intelligent metasurface antenna (RIMSA) architecture. This integration addresses limitations in hardware efficiency, dynamic control, and scalability seen in existing RIS technologies.

Categories:

Machine Learning

FedE4RAG_Dataset

This is the dataset of the paper Privacy-Preserving Federal Embedding Learning for Localized Retrieval-Augmented Generation.

Categories:

Financial

IRWOZ 2.0 - A Large Language Model-driven Dialogue Dataset for Industrial Robot Conversations

IRWOZ has improved industrial human-robot interaction (HRI) dialogue systems through domain-specific annotations. However, its initial version contains substantial noise in dialogue states and utterances, limiting state-tracking accuracy. We introduce IRWOZ 2.0, which addresses these limitations through large language model (LLM) enhanced generation (Mistral/Claude-3.5) and quality refinements. Our improved

Categories:

Artificial Intelligence

TrafficLLM Dataset

We released TrafficLLM's training datasets, which contain over 0.4M traffic data and 9K human instructions for LLM adaptation across different traffic analysis tasks.

Categories:

Artificial Intelligence

Simulation Data of Emotional Delivery Service Workers

Computational experiments within metaverse service ecosystems enable the identification of social risks and governance crises, and the optimization of governance strategies through counterfactual inference to dynamically guide real-world service ecosystem operations. The advent of Large Language Models (LLMs) has empowered LLM-based agents to function as autonomous service entities capable of executing diverse service operations within metaverse ecosystems, thereby facilitating the governance of metaverse service ecosystem with computational experiments.

Categories:

Artificial Intelligence

Readability-Aware Summarization Dataset for Turkish

This dataset is constructed in a study that addresses the gap between text summarization and content readability for diverse Turkish-speaking audiences. It contains paired original texts and corresponding summaries optimized for different readability levels using the YOD (Yeni Okunabilirlik Düzeyi) formula.

Categories:

Artificial Intelligence

Challenge: Abnormal Activity Detection in Individuals with Developmental Disabilities

Facilities for the developmentally disabled face the challenge of detecting abnormal behaviors because of limited staff and the difficulty of spotting subtle movements. Traditional methods often struggle to identify these behaviors because abnormal actions are irregular and unpredictable, leading to frequent misses or misclassifications.

Categories:

Automated Repair of Declarative Software Specifications in the Era of Large Language Models

The growing adoption of declarative software specification languages, coupled with their inherent difficulty in debugging, has underscored the need for effective and automated repair techniques applicable to such languages. Researchers have recently explored various methods to automatically repair declarative software specifications, such as template-based repair, feedback-driven iterative repair, and bounded exhaustive approaches. The latest developments in Large Language Models (LLMs) provide new opportunities for the automatic repair of declarative specifications.

Categories:

Machine Learning

DecoderLLMs-CodeSearch-main

Code search is essential for code reuse, allowing developers to efficiently locate relevant code snippets. Traditional encoder-based models, however, face challenges with poor generalization and input length limitations. In contrast, decoder-only large language models (LLMs), with their larger size, extensive pre-training, and ability to handle longer inputs, present a promising solution to these issues. However, their effectiveness in code search has not been fully explored.

Categories:

Artificial Intelligence

Code and Data for Empirical and Synthetic Experiments on Formula L Optimization

This dataset provides the foundational resources for evaluating and optimizing Formula L , a novel mathematical framework for semantic-driven task allocation in multi-agent systems (MAS) powered by large language models (LLM). The dataset includes Python code and both empirical and synthetic data, specifically designed to validate the effectiveness of Formula L in improving task distribution, contextual relevance, and dynamic adaptation within MAS.

The dataset comprises:

Categories:

Artificial Intelligence

Category