Skip to main content

Datasets

Standard Dataset

SynEL: A Synthetic Benchmark for Entity Linking

Citation Author(s):
Ilia Karpov (HSE University)
Alexander Kirillovich (HSE University)
Elisaveta Goncharova (HSE University)
Andrey Parinov (HSE University)
Alexander Chernyavskiy (HSE University)
Dmitry Ilvovsky (HSE University)
Natalia Semenova (AIRI)
Artyom Sosedka (NUST MISiS )
Ekaterina Lisitsyna (Independent Researcher)
Belkin Belkin (Independent Researcher)
Submitted by:
Ilia Karpov
Last updated:
DOI:
10.21227/25m4-h372
Data Format:
No Ratings Yet

Abstract

Dataset for "SynEL: A Synthetic Benchmark for Entity Linking" paper. The dataset integrates structured information from two primary sources: DBpedia for English, representing a high-resource language environment, and the Russian Public Company Register, a challenging low-resource dataset. Each dataset includes extensive annotations and structured entity links, ensuring high relevance for real-world applications in diverse industries. The dataset facilitates the training and evaluation of advanced graph neural network (GNN) and large language model (LLM) techniques, enabling robust performance across varied linguistic contexts. Experimental results indicate that models trained on this dataset achieve significant gains in entity linking precision and recall, especially in specialized domains such as finance and regulatory compliance.

 

Instructions:

Just unzip the archive.