An annotated dataset of tongue images

Name: An annotated dataset of tongue images
Creator: Chunlei Tang
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Biomedical and Health Sciences

Citation Author(s):: Chunlei Tang

Dan Shi

Xiao Shi

Yun Xiong
Submitted by:: Chunlei Tang
Last updated:: Wed, 07/01/2020 - 07:41
DOI:: 10.21227/mtt8-q366
Links:: Two labeled tongue images compared to unlabeled ones

8445 views

Categories:

Biomedical and Health Sciences

Keywords:

tongue images

geriatric disease

CITE

Abstract

To develop a non-invasive assessment tool using machine learning in supporting a timely, accurate diagnosis in the elderly, we created an annotated dataset of 668 tongue images collected from hospitalized geriatric patients in a tertiary hospital in Shanghai, China. Images were captured via a light-field camera using CIELAB color space (to simulate human visual perception) and then were manually labeled by a panel of subject matter experts after chart reviewing patients’ clinical information documented in the hospital’s information system.

Instructions:

Subject	Aging
Specific subject area	Diagnosis – Image and text data analysis Hospitalized geriatric patients are a highly heterogeneous group often with variable diseases and conditions. Physicians, and geriatricians especially, are devoted to seeking non-invasive testing tools to support a timely, accurate diagnosis. Chinese tongue diagnosis, mainly based on the color and texture of the tongue, offers a unique solution.
Type of data	Free-text document Table Image Each patient has a folder with 1 face image, 1 tongue image, and 2 narrative documents. An additional summary formed by table is provided.
How data were acquired	We used a patented light-field camera (CN201520303463.5) called the intelligent mirror using CIE Lab* color space. Our data acquisition was handled in a standardized way (i.e., ensuring consistent sitting height and placement of the intelligent mirror) as much as possible.
Data format	The face and tongue images belong to raw data and were taken at 600 pixels per inch (about 42.3 µm per pixel) and saved as a *.jpg with minimum compression (10% compression max). One narrative document is annotated and contains the parameters generated by the intelligent mirror when creating the face and tongue images, and the other contains the annotation results from the expert panel (e.g., vital signs, clinical imaging examination, and laboratory indicators).
Parameters for data collection	The study was conducted at a Chinese tertiary, comprehensive hospital. We recruited hospitalized subjects (excluding minority groups or other sensitive or disempowered populations) in the Geriatrics Department beginning in January 1, 2019. Images were captured via a light-field camera using CIELAB color space (to simulate the human visual perception) and then were manually labeled by a panel of subject matter experts after chart reviewing patients’ clinical information documented in the hospital’s information system.
Description of data collection	Data acquisition and image annotation was conducted by subject matter experts including four fully credentialed senior-level physicians (i.e., associate chief physician and above), one resident, and two medical students. One medical student was in charge of data acquisition. The resident consolidated patients’ previous chronic medical history, clinical imaging examination, and laboratory indicators. One physician diagnosed patients’ constitutional types. Another physician gave a final admission diagnosis by considering the patient’s constitution based on both traditional Chinese medicine and Western medicine. Constitutional types are based on TCM analysis and differentiation of pathological conditions in accordance with the eight principal syndromes, namely 八纲辨证, including yin and yang (阴阳), exterior and interior (表里), cold and heat (寒热), and hypofunction and hyperfunction (虚实). All the information from the free-text data labeling was documented digitally by one medical student in Chinese and translated into English. The treatment plan corresponding to the admission diagnosis was reviewed and annotated by the remaining two physicians. A total of 12 items must be merged into an annotated document, including various indices related to tongue diagnosis, physical or mental factors, clinicians’ observations, and more. To mitigate this, we used a previously designed algorithm to generate templates automatically. Under the K-means paradigm, our previously designed algorithm (1) embedded each annotated document into a vector representation for the first 200 patients, (2) partitioned those vectors into several (e.g., K=10) clusters, and (3) designated each cluster representative as a prototype template, or a vector of real annotated document closest to the centroid. For the remaining 468 patients, we used the specified prototype template to assist with the annotation.
Data source location	Shanghai, CHN Cambridge, MA, USA

Dear authors,

Thank you so much for making this dataset available to the community which can enable research on this very attractive and high potential area that is integrative medicine (combining both chinese and western medicine).

Best regards

Hugo Ferreira

Hugo Ferreira Mon, 12/30/2019 - 14:49 Permalink

Dear Authors,

Thanks for this interesting paper and for sharing the dataset.

Best regards,

Narges Manouchehri

Narges Manouchehri Sun, 04/26/2020 - 17:54 Permalink

Will share but now is on preparing.

Chunlei Tang Sun, 04/26/2020 - 23:56 Permalink

where can we access the full dataset

Sanjana K Thu, 07/09/2020 - 04:32 Permalink

请问在哪可以下载所有数据呢？

andy tian Mon, 08/31/2020 - 08:12 Permalink

hi where can we access the full dataset?

Harry Chen Thu, 10/08/2020 - 05:13 Permalink

请问在哪可以下载所有数据呢？

Lin Jian-Ho Sun, 03/07/2021 - 09:07 Permalink

Thanks for interests. Please see the info below: Repository name: Harvard Dataverse Data identification number: N/A Direct URL to data: https://doi.org/10.7910/DVN/COJZMQ Anyway if use please reference this paper: Dan Shi, Chunlei Tang, Suzanne V. Blackley, Liqin Wang, Jiahong Yang, Yanming He, Samuel I. Bennett, Yun Xiong, Xiao Shi, Li Zhou, David W. Bates. An annotated dataset of tongue images supporting geriatric disease diagnosis, Data in Brief, Volume 32, 2020, 106153, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2020.106153.

Chunlei Tang Sun, 03/07/2021 - 13:26 Permalink

Your email address will reject all data-request email.

Xinzhou Wang Tue, 07/06/2021 - 03:43 Permalink

Hi, there - we are working on optimizing the dataset to achieve 100% labelled, will upload soon.

Chunlei Tang Tue, 07/06/2021 - 11:35 Permalink

Thanks a lot for your patience and selfless contribution. I would be very appreciate if you could send the unlabeled (or raw) tongue images to me and we will cite your work with gratitude. My E-mail: 709510112@qq.com

Xinzhou Wang Wed, 07/07/2021 - 08:22 Permalink

怎样访问数据集？

fei ling Mon, 08/16/2021 - 10:16 Permalink

Dear Author, if I am not wrong, at the moment we have data of 102 subjects available.

Bushra Jalil Thu, 04/07/2022 - 08:40 Permalink

怎样访问数据集,求数据集

ranin test ran… Mon, 06/06/2022 - 08:08 Permalink

I am developing a deep learning medical diagnosis model using tongue images. Thank you for your valuable data. Your data includes tongue images, but do you provide the medical information (labels) for each patient in a separate file?

Hideki Mori Sun, 02/25/2024 - 05:04 Permalink

Hi, we at IIT Delhi nare working on Tongue biometrics. I will be highly thankfull if you could share the data with me on anz208484@iitd.ac.In.Thanks

Amber Hayat Thu, 07/25/2024 - 06:42 Permalink

Hi, we at IIIT Gwalior are working on tongue image analysis. I would be very grateful if you could share the dataset with me at imt_2020056@iiitm.ac.in.
Thanks!

Harshil Mendpara Mon, 09/02/2024 - 11:15 Permalink