Micro-Cell Attribute Database

Citation Author(s):
Xiaohui
Du
Submitted by:
Xiaohui Du
Last updated:
Wed, 08/18/2021 - 22:42
DOI:
10.21227/12p3-hp42
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

MCAD contains 5 seen classed and 68939 cell samples, and the unseen class contains 6 categories and 32679 cell samples. Seen class samples are collected from fecal microscopic images, while unseen class samples are collected from leucorrhea microscopic images. 

Instructions: 

MCAD are collected from the Sixth People’s Hospital of Chengdu (Sichuan Province, China). Samples collected included fecal samples and leucorrhea samples. The fecal sample was used as the seen classes for the training, while the leucorrhea sample was used as the unseen classes for testing and performance verification. The fecal sample is absorbed to the slide through the pretreatment process of dilution, stirring. The sample is left standing for one minute and collected by optical microscope after the precipitation of the form components. Motic B1Digital microscope is used for fecal sample imaging with a 40× objective lens (Numerical Aperture (NA): 0.65, Material Distance: 0.6 mm). The resolution of the microscope is 1600*1200. As for the leucorrhea sample, it is absorbed to the counting pool for image acquisition by the optical system after dilution, mixing and centrifugation operation. We used an OLYMPUS CX31 biological microscope with a 40× objective lens samed as Motic for leucorrhea imaging. An EXCCD01400KMA CCD camera with a pixel size of 6.45 µm × 6.45 µm is used for exposure (resolution 1920*1080).

We collected 1885 fecal samples with 53,550 images. For leucorrhea samples, 183 leucorrhea samples were preprocessed for imaging, and a total of 5,286 images were obtained. Images of feces and leucorrhea were annotated by medical researchers. A total of 68,939 cells of fecal and 32,679 cells of leucorrhoea were labeled and obtained. For fecal samples, we selected 5 categories for classification, namely red blood cells (RBCs), white blood cells (WBCs), molds, pyocysts (Pyos), and impurity images. Among them, the bounding box IOU of the instance and the positive sample was between 0.01 and 0.1 is regard as the impurity. For the unseen leucorrhea samples, there are 6 categories, including RBCs, WBCs, molds, Pyos, epithelial cells (Epis) and impurities. The annotation of impurities is the same as above. Significantly, the RBCs, WBCs and Pyos in the leucorrhoea samples showed different morphology from the fecal samples due to the difference in the environment and the dilution concentration of the diluent, although the names were the same.

Seen classes (Fecal)      

Categories      #instances      

Impurities       15149    

RBCs      25021    

WBCs     5695      

Molds     22577    

Pyos        497               

Total       68939

 

Unseen classes (Leucorrhea)

Categories      #instances

Impurities       9327

RBCs      1438

WBCs     9828

Molds     1723

Pyos        1170

Epis        9193

Total       32679

Attributes:

Visual Space: The visual features of MCAD are extracted by the backbone of resnet-101. The image is scaled to the size of 224 * 224 by bilinear interpolation, and input into the model for inference. Finally, 2048 dimensional visual features are obtained.

Semantic Space: In terms of attributes, MCAD provides two types of attribute vectors. One is for discrete attributes, and another one is continuous attributes.

As for the discrete attributes, a vocabulary of 9 separately attributes were was selected based on the morphological features. These 9 attributes are defined as volume big, volume small, circle or ellipse, edge black, annular, inner black spot, inner multi black spot, multi part, reticular, et, al.

Continuous attributes adopts the Doc2Vec method provided by Quoc Le and Tomas Mikolov. By inputting the semantic description into the Doc2Vec model for inference, we get a 256 dimensional feature vector.

The file Details are as followed:

Image folder                                Images of positive samples

Impurity folder                            Images of impurities

attributes_description.txt       Continuous attributes description

certainties.txt                             Certainties for discrete attributes

classes.txt                                    Class name: Impurity, Ery(RBC), Leu(WBC), Mid(mold), Pyo

predicate.txt                               Discrete attributes description

predicate-matrix-binary.txt     Discrete attributes: [5, 9]

predicate-matrix-doc2vec.txt Continuous attributes: [5, 256]

statistic.txt                                  Statistic information

test.txt                                         Test image list (without impurities) and labels

test_with_impurities.txt          Test image list (with impurities) and labels

trainval.txt                                   Training image list (without impurities) and labels

trainval_with_impurities.txt   Training image list (with impurities) and labels

Attributes.mat                                     Mat format file of attributes and image list with labels

 

Res101.mat                                 Mat format file of visual space. Res-101 features.