Visual Grounding - Endoscopic Third Ventriculostomy (VG-ETV) dataset

0
0 ratings - Please login to submit your rating.

Abstract 

This dataset comprises 1718 annotated images extracted from 29 video clips recorded during Endoscopic Third Ventriculostomy (ETV) procedures, each captured at a frame rate of 25 FPS. Out of these images, 1645 are allocated for the training set, while the remainder is designated for the testing set. The images contain a total of 4013 anatomical or intracranial structures, annotated with bounding boxes and class names for each structure. Additionally, there are at least three language descriptions of varying technicality levels provided for each structure. This dataset offers a rich resource for tasks such as visual grounding (VG) or referring expression comprehension (REC) and classification. Importantly, it addresses a critical gap in the availability of specialized surgical datasets, providing researchers with invaluable data to advance research in the surgical domain.

Instructions: 

Our dataset format follows the Pascal VOC standard. It is divided into train and test data splits, with images for each of the splits extracted from distinct video clips. Within each split, users will find folders for images and annotations. The annotation folders contain XML files detailing the annotated intracranial structures, bounding box coordinates, and language descriptions for each of the structures.

 

For further details on the dataset, users are encouraged to refer to our published paper titled 'Interactive Surgical Training in Neuroendoscopy: Real-Time Anatomical Feature Localization using Natural Language Expressions.' - https://doi.org/10.1109/TBME.2024.3405814

 

We request users to cite our paper 'Interactive Surgical Training in Neuroendoscopy: Real-Time Anatomical Feature Localization using Natural Language Expressions' (https://doi.org/10.1109/TBME.2024.3405814) if they find the dataset useful in their research endeavors.