TASTEset - Recipe and Food Entities Dataset

Citation Author(s):
Anna
Wróblewska
the Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
Agnieszka
Kaliska
the Faculty of Modern Languages, Adam Mickiewicz University, Poznan, Poland
Maciej
Pawłowski
the Faculty of Computing and Telecommunications, Poznan University of Technology, Poznan, Poland
Dawid
Wiśniewski
the Faculty of Computing and Telecommunications, Poznan University of Technology, Poznan, Poland
Witold
Sosnowski
the Polish-Japanese Academy of Information Technology, Warsaw, Poland
Agnieszka
Ławrynowicz
the Faculty of Computing and Telecommunications and the Center for Artificial Intelligence and Machine Learning (CAMIL), Poznan University of Technology, Poznan, Poland
Submitted by:
Anna Wroblewska
Last updated:
Mon, 07/08/2024 - 15:59
DOI:
10.21227/11bb-v380
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Food computing is currently a fast-growing field of research. Web mining and content analysis are also increasingly essential in this field, especially for recognising food entities.

However, only a few well-defined tasks still serve as benchmarks for solutions in this area. To bridge this gap, we introduce a new dataset - called TASTEset. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g. food products, quantities and their units, names of cooking processes, physical qualities of ingredients, their purpose, and taste. 

The dataset consists of 1,000 recipes with about 19,500 entities to extract. We share the dataset and the task for entities extraction to encourage progress on more in-depth and complex information extraction from recipes.

Instructions: 

TASTEset

This is a dataset for the food entities recognition problem. It consists of 1,000 manually annotated recipes ingredients. Recipes were scraped from the following websites:

https://www.allrecipes.com

http://food.com

https://tasty.co

www.yummly.com

using this tool.

 

The dataset encapsulates 15 entities:

FOOD, UNIT, QUANTITY, PHYSICAL QUALITY, PROCESS, COLOR, TASTE, PURPOSE, PART, TRADE NAME, DIET, EXAMPLE, EXCLUDED, EXCLUSIVE, POSSIBLE SUBSTITUTE.

 

The dataset is available in the CSV format. It contains two columns:

ingredients - list of recipe's ingredients

ingredients_entities - entities manually annotated in the list of ingredients

The ingredients_entities are of the following format:

"span": list of tuples, each tuple contains the start and end character id. If more tuples are present, it means that the entity is discontinuous.

"type": entity type

"entity": entity

 

Funding Agency: 
the Norway Grants 2014-2021 via the National Centre for Research and Development
Grant Number: 
NOR/SGS/TAISTI/0323/2020