Skip to main content

Datasets

Standard Dataset

IRWOZ 2.0 - A Large Language Model-driven Dialogue Dataset for Industrial Robot Conversations

Citation Author(s):
Chen Li (Department of Materials and Production, Aalborg University)
Dimitrios Chrysostomou (Department of Materials and Production, Aalborg University)
Submitted by:
Chen Li
Last updated:
DOI:
10.21227/zpnj-5245
Data Format:
Links:
No Ratings Yet

Abstract

IRWOZ has improved industrial human-robot interaction (HRI) dialogue systems through domain-specific annotations. However, its initial version contains substantial noise in dialogue states and utterances, limiting state-tracking accuracy. We introduce IRWOZ 2.0, which addresses these limitations through large language model (LLM) enhanced generation (Mistral/Claude-3.5) and quality refinements. Our improved
dataset expands to 390 dialogues across 4 industrial domains (Assembly, Delivery, Position, Relocation), featuring manual corrections and automated typo removal. Benchmark experiments on dialogue state tracking demonstrate significant improvements, with GPT-2’s BLEU-4 score increasing from 0.1651 to 0.5604 compared to original IRWOZ.

Instructions:

Introduction

Industrial Robots Domain Wizard-of-Oz dataset (IRWOZ 2.0), a fully-labeled dialogue dataset of large language model generated conversations spanning over four domains (Product Assembly, Transportation, Position, Relocation). At a size of 390 dialogues, it aims to provide simulated dialogues between shop floor worker and industrial robots to support language-assisted Human Robot Interaction (HRI) in industrial setup. To the best of our knowledge, IRWOZ 2.0 is the first LLM-generated and annotated task-oriented corpora for manufacturing domain.

Data Structure

To maintain a high scalability, IRWOZ 2.0 has a similar data structure of the most popular Multi-Domain Wizard-of-Oz dataset (MultiWOZ). Each dialogue consists of a domain, multiple user&system utterances and belief state as well as system act.

The belief state have two sections: DB_request and T_inform. DB_request refers to slots that need to be used for query the database. T_inform includes slots which relate to the task. Each of them includes required (req) and optional (opt) sections. "req" contains all the slots must be obtained during the dialogue while the slots in "opt" are the optional. The system act contains all the DB search results and status of the required slots.

Funding Agency
Villum Fonden
Grant Number
58627