Datasets
Standard Dataset
Micro-Cell Attribute Database
- Citation Author(s):
- Submitted by:
- Xiaohui Du
- Last updated:
- Wed, 08/18/2021 - 22:42
- DOI:
- 10.21227/12p3-hp42
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
MCAD contains 5 seen classed and 68939 cell samples, and the unseen class contains 6 categories and 32679 cell samples. Seen class samples are collected from fecal microscopic images, while unseen class samples are collected from leucorrhea microscopic images.
MCAD are collected from the Sixth People’s Hospital of Chengdu (Sichuan Province, China). Samples collected included fecal samples and leucorrhea samples. The fecal sample was used as the seen classes for the training, while the leucorrhea sample was used as the unseen classes for testing and performance verification. The fecal sample is absorbed to the slide through the pretreatment process of dilution, stirring. The sample is left standing for one minute and collected by optical microscope after the precipitation of the form components. Motic B1Digital microscope is used for fecal sample imaging with a 40× objective lens (Numerical Aperture (NA): 0.65, Material Distance: 0.6 mm). The resolution of the microscope is 1600*1200. As for the leucorrhea sample, it is absorbed to the counting pool for image acquisition by the optical system after dilution, mixing and centrifugation operation. We used an OLYMPUS CX31 biological microscope with a 40× objective lens samed as Motic for leucorrhea imaging. An EXCCD01400KMA CCD camera with a pixel size of 6.45 µm × 6.45 µm is used for exposure (resolution 1920*1080).
We collected 1885 fecal samples with 53,550 images. For leucorrhea samples, 183 leucorrhea samples were preprocessed for imaging, and a total of 5,286 images were obtained. Images of feces and leucorrhea were annotated by medical researchers. A total of 68,939 cells of fecal and 32,679 cells of leucorrhoea were labeled and obtained. For fecal samples, we selected 5 categories for classification, namely red blood cells (RBCs), white blood cells (WBCs), molds, pyocysts (Pyos), and impurity images. Among them, the bounding box IOU of the instance and the positive sample was between 0.01 and 0.1 is regard as the impurity. For the unseen leucorrhea samples, there are 6 categories, including RBCs, WBCs, molds, Pyos, epithelial cells (Epis) and impurities. The annotation of impurities is the same as above. Significantly, the RBCs, WBCs and Pyos in the leucorrhoea samples showed different morphology from the fecal samples due to the difference in the environment and the dilution concentration of the diluent, although the names were the same.
Seen classes (Fecal)
Categories #instances
Impurities 15149
RBCs 25021
WBCs 5695
Molds 22577
Pyos 497
Total 68939
Unseen classes (Leucorrhea)
Categories #instances
Impurities 9327
RBCs 1438
WBCs 9828
Molds 1723
Pyos 1170
Epis 9193
Total 32679
Attributes:
Visual Space: The visual features of MCAD are extracted by the backbone of resnet-101. The image is scaled to the size of 224 * 224 by bilinear interpolation, and input into the model for inference. Finally, 2048 dimensional visual features are obtained.
Semantic Space: In terms of attributes, MCAD provides two types of attribute vectors. One is for discrete attributes, and another one is continuous attributes.
As for the discrete attributes, a vocabulary of 9 separately attributes were was selected based on the morphological features. These 9 attributes are defined as volume big, volume small, circle or ellipse, edge black, annular, inner black spot, inner multi black spot, multi part, reticular, et, al.
Continuous attributes adopts the Doc2Vec method provided by Quoc Le and Tomas Mikolov. By inputting the semantic description into the Doc2Vec model for inference, we get a 256 dimensional feature vector.
The file Details are as followed:
Image folder Images of positive samples
Impurity folder Images of impurities
attributes_description.txt Continuous attributes description
certainties.txt Certainties for discrete attributes
classes.txt Class name: Impurity, Ery(RBC), Leu(WBC), Mid(mold), Pyo
predicate.txt Discrete attributes description
predicate-matrix-binary.txt Discrete attributes: [5, 9]
predicate-matrix-doc2vec.txt Continuous attributes: [5, 256]
statistic.txt Statistic information
test.txt Test image list (without impurities) and labels
test_with_impurities.txt Test image list (with impurities) and labels
trainval.txt Training image list (without impurities) and labels
trainval_with_impurities.txt Training image list (with impurities) and labels
Attributes.mat Mat format file of attributes and image list with labels
Res101.mat Mat format file of visual space. Res-101 features.
Documentation
Attachment | Size |
---|---|
Dataset Instruction.docx | 638.57 KB |