SynEL: A Synthetic Benchmark for Entity Linking

Citation Author(s):
Ilia
Karpov
HSE University
Alexander
Kirillovich
HSE University
Elisaveta
Goncharova
HSE University
Andrey
Parinov
HSE University
Alexander
Chernyavskiy
HSE University
Dmitry
Ilvovsky
HSE University
Natalia
Semenova
AIRI
Artyom
Sosedka
NUST MISiS
Ekaterina
Lisitsyna
Independent Researcher
Belkin
Belkin
Independent Researcher
Submitted by:
Ilia Karpov
Last updated:
Thu, 11/14/2024 - 17:06
DOI:
10.21227/25m4-h372
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Dataset for "SynEL: A Synthetic Benchmark for Entity Linking" paper. The dataset integrates structured information from two primary sources: DBpedia for English, representing a high-resource language environment, and the Russian Public Company Register, a challenging low-resource dataset. Each dataset includes extensive annotations and structured entity links, ensuring high relevance for real-world applications in diverse industries. The dataset facilitates the training and evaluation of advanced graph neural network (GNN) and large language model (LLM) techniques, enabling robust performance across varied linguistic contexts. Experimental results indicate that models trained on this dataset achieve significant gains in entity linking precision and recall, especially in specialized domains such as finance and regulatory compliance.

 

Instructions: 

Just unzip the archive.