Datasets
Standard Dataset
TASTEset - Recipe and Food Entities Dataset
- Citation Author(s):
- Submitted by:
- Anna Wroblewska
- Last updated:
- Mon, 07/08/2024 - 15:59
- DOI:
- 10.21227/11bb-v380
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
Abstract
Food computing is currently a fast-growing field of research. Web mining and content analysis are also increasingly essential in this field, especially for recognising food entities.
However, only a few well-defined tasks still serve as benchmarks for solutions in this area. To bridge this gap, we introduce a new dataset - called TASTEset. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g. food products, quantities and their units, names of cooking processes, physical qualities of ingredients, their purpose, and taste.
The dataset consists of 1,000 recipes with about 19,500 entities to extract. We share the dataset and the task for entities extraction to encourage progress on more in-depth and complex information extraction from recipes.
TASTEset
This is a dataset for the food entities recognition problem. It consists of 1,000 manually annotated recipes ingredients. Recipes were scraped from the following websites:
using this tool.
The dataset encapsulates 15 entities:
FOOD, UNIT, QUANTITY, PHYSICAL QUALITY, PROCESS, COLOR, TASTE, PURPOSE, PART, TRADE NAME, DIET, EXAMPLE, EXCLUDED, EXCLUSIVE, POSSIBLE SUBSTITUTE.
The dataset is available in the CSV format. It contains two columns:
ingredients - list of recipe's ingredients
ingredients_entities - entities manually annotated in the list of ingredients
The ingredients_entities are of the following format:
"span": list of tuples, each tuple contains the start and end character id. If more tuples are present, it means that the entity is discontinuous.
"type": entity type
"entity": entity