Abstract

Advancements in Medical Vision-Language Pre-training (Medical-VLP) progress rapidly by learning representations from paired radiology reports. Nevertheless, there still remain two issues that restrict the development of Medical-VLP: the scarcity of parallel image-report pair and monotony of pre-training tasks. Thus, we propose Multi-Grained Cross-Domain Report Searching (CDRS) strategy, and Multi-Task Driven Language-Image Pre-Training (MLIP) framework. CDRS strives to match report-less radiology images with matched reports, by leveraging collaborative alignment training that bridges cross-domain data, operating at in-domain cross-modal, out-of-domain self-supervised, and patch-wise levels. MLIP utilizes a unified model to compute multiple training objectives with minimal overhead by sharing the same computational graph. It integrates single-encoder, dual-encoder, and encoder-decoder paradigms, enabling the decoupled model to perform both discriminative and generative tasks, such as report generation and medical visual question answering. Extensive experiments and analyses demonstrate that our method achieves state-of-the-art performance across multiple discriminative and generative medical downstream tasks.

Instructions:

dataset of Cross-Domain Data Alignment and Multi-Task Driven Language-Image Pre-Training for Radiology Tasks

Comments

this data is very useful

Submitted by Abdelmalek Eladjelet on Tue, 07/23/2024 - 19:32

Dataset Files

Files have not been uploaded for this dataset

Datasets

Standard Dataset

Cross-Domain Data Alignment and Multi-Task Driven Language-Image Pre-Training for Radiology Tasks

Abstract

Comments

Dataset Files

QUESTIONS?