Abstract

Food computing is currently a fast-growing field of research. Web mining and content analysis are also increasingly essential in this field, especially for recognising food entities.

However, only a few well-defined tasks still serve as benchmarks for solutions in this area. To bridge this gap, we introduce a new dataset - called TASTEset. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g. food products, quantities and their units, names of cooking processes, physical qualities of ingredients, their purpose, and taste.

The dataset consists of 1,000 recipes with about 19,500 entities to extract. We share the dataset and the task for entities extraction to encourage progress on more in-depth and complex information extraction from recipes.

Instructions:

TASTEset

This is a dataset for the food entities recognition problem. It consists of 1,000 manually annotated recipes ingredients. Recipes were scraped from the following websites:

https://www.allrecipes.com

http://food.com

https://tasty.co

www.yummly.com

using this tool.

The dataset encapsulates 15 entities:

FOOD, UNIT, QUANTITY, PHYSICAL QUALITY, PROCESS, COLOR, TASTE, PURPOSE, PART, TRADE NAME, DIET, EXAMPLE, EXCLUDED, EXCLUSIVE, POSSIBLE SUBSTITUTE.

The dataset is available in the CSV format. It contains two columns:

ingredients - list of recipe's ingredients

ingredients_entities - entities manually annotated in the list of ingredients

The ingredients_entities are of the following format:

"span": list of tuples, each tuple contains the start and end character id. If more tuples are present, it means that the entity is discontinuous.

"type": entity type

"entity": entity

Funding Agency:

the Norway Grants 2014-2021 via the National Centre for Research and Development

Grant Number:

NOR/SGS/TAISTI/0323/2020

Comments

thanks in advance

Submitted by Abdelakder Bell... on Thu, 04/24/2025 - 18:27

Dataset Files

dataset csv file TASTEset.csv (1.57 MB)
exploratory data analysis of our dataset TASTEset_gold_standard_exploratory_analysis.ipynb (651.39 kB)

Datasets

Standard Dataset

TASTEset - Recipe and Food Entities Dataset

Abstract

Comments

Dataset Files

QUESTIONS?