Datasets
Standard Dataset
Cross-Domain Data Alignment and Multi-Task Driven Language-Image Pre-Training for Radiology Tasks
- Citation Author(s):
- Submitted by:
- Shenshen Bu
- Last updated:
- Sun, 07/14/2024 - 23:19
- DOI:
- 10.21227/s7yk-4w39
- License:
- Categories:
- Keywords:
Abstract
Advancements in Medical Vision-Language Pre-training (Medical-VLP) progress rapidly by learning representations from paired radiology reports. Nevertheless, there still remain two issues that restrict the development of Medical-VLP: the scarcity of parallel image-report pair and monotony of pre-training tasks. Thus, we propose Multi-Grained Cross-Domain Report Searching (CDRS) strategy, and Multi-Task Driven Language-Image Pre-Training (MLIP) framework. CDRS strives to match report-less radiology images with matched reports, by leveraging collaborative alignment training that bridges cross-domain data, operating at in-domain cross-modal, out-of-domain self-supervised, and patch-wise levels. MLIP utilizes a unified model to compute multiple training objectives with minimal overhead by sharing the same computational graph. It integrates single-encoder, dual-encoder, and encoder-decoder paradigms, enabling the decoupled model to perform both discriminative and generative tasks, such as report generation and medical visual question answering. Extensive experiments and analyses demonstrate that our method achieves state-of-the-art performance across multiple discriminative and generative medical downstream tasks.
dataset of Cross-Domain Data Alignment and Multi-Task Driven Language-Image Pre-Training for Radiology Tasks
Comments
this data is very useful