Datasets
Standard Dataset
SDTB dataset
- Citation Author(s):
- Submitted by:
- Hongyu Zhang
- Last updated:
- Mon, 07/22/2024 - 11:52
- DOI:
- 10.21227/byq2-0445
- License:
- Categories:
- Keywords:
Abstract
The proposed SDTB dataset is collected from microscopic testicular tissue sections of 15 patients diagnosed with azoospermia. It simulates the process of selecting high-quality sperm in a testicular puncture scenario for further infertility diagnosis and treatment. Specifically, a testicular puncture is performed on these patients, followed by cleansing and shredding of the tubule tissue. The samples are then examined under a Nikon ECLIPSS Ti microscope at 200× magnification. The annotations of sperm are initially conducted by one physician and subsequently reviewed and adjusted by another physician. From each patient, 45 to 148 images with 193 to 755 sperm instances are collected. Each image is 1320 × 1080 pixels, which we further resize and pad to 640 × 640 pixels.
The proposed SDTB dataset is collected from microscopic testicular tissue sections of 15 patients diagnosed with azoospermia. It simulates the process of selecting high-quality sperm in a testicular puncture scenario for further infertility diagnosis and treatment. Specifically, a testicular puncture is performed on these patients, followed by cleansing and shredding of the tubule tissue. The samples are then examined under a Nikon ECLIPSS Ti microscope at 200× magnification. The annotations of sperm are initially conducted by one physician and subsequently reviewed and adjusted by another physician. From each patient, 45 to 148 images with 193 to 755 sperm instances are collected. Each image is 1320 × 1080 pixels, which we further resize and pad to 640 × 640 pixels. The average width and height of the sperm instances are 9.88 pixels and 9.92 pixels, respectively, with 99.3% sperm instances being smaller than 16 × 16 pixels. Combined with the complex backgrounds and substantial noise in the images, the sperm detection task in the SDTB dataset is highly challenging. Finally, based on examination time, we divide the dataset into 689 images from patients 1 to 7 for training, 346 images from patients 8 to 10 for validation, and 306 images from patients 11 to 15 for testing.