Wineinformatics: 21st Century Bordeaux Wines Dataset

Citation Author(s):
University of Central Arkansas
Submitted by:
Bernard Chen
Last updated:
Thu, 04/30/2020 - 16:14
Data Format:
0 ratings - Please login to submit your rating.


Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016. A total of 14,349 wine reviews are collected in the first dataset, and 1359 wine reviews in the second dataset. In order to understand the relation between wine quality and characteristics, Naïve Bayes classifier is applied to predict the qualities (90+/89−) of wines. Support Vector Machine (SVM) classifier is also applied as a comparison. In the first dataset, SVM classifier achieves the best accuracy of 86.97%; in the second dataset, Naïve Bayes classifier achieves the best accuracy of 84.62%. Precision, recall, and f-score are also used as our measures to describe the performance of our models. Meaningful features associate with high quality 21 century Bordeaux wines are able to be presented through this research paper.


The dataset comes from Wine Spectator Bordeaux wine reviews in human language format from year 2000 to year 2016. A total of 14,349 wines have been collected. There are 4263 above score 90/100 wines and 10,086 below score 89/100 wines. Detailed information is available in the paper. The dataset was processed by the Computational Wine Wheel to become the uploaded dataset. The first attribute of the dataset is the name of the wine. The second attribute of the dataset is the vintage of the wine. The third attribute of the dataset is the score given by the Wine Spectator of the wine. The fourth attribute of the dataset is the price of the wine. $NA indicates the wine price was not available during the time of the wine being reviewed. The rest of the attributes are the characteristic describing the wine with true/false value.


For Publications, please cite the following papers:

Dong, Zeqing, Xiaowan Guo, Syamala Rajana, and Bernard Chen. "Understanding 21st Century Bordeaux Wines from Wine Reviews Using Naïve Bayes Classifier." Beverages 6, no. 1 (2020): 5.

Chen, Bernard, Christopher Rhodes, Aaron Crawford, and Lorri Hambuchen. "Wineinformatics: applying data mining on wine sensory reviews processed by the computational wine wheel." In 2014 IEEE International Conference on Data Mining Workshop, pp. 142-149. IEEE, 2014.

Chen, Bernard, Christopher Rhodes, Alexander Yu, and Valentin Velchev. "The Computational Wine Wheel 2.0 and the TriMax Triclustering in Wineinformatics." In Industrial Conference on Data Mining, pp. 223-238. Springer, Cham, 2016.