Cross-Domain Data Alignment and Multi-Task Driven Language-Image Pre-Training for Radiology Tasks

Citation Author(s):
Shenshen
Bu
Sun Yat-sen University
Submitted by:
Shenshen Bu
Last updated:
Sun, 07/14/2024 - 23:19
DOI:
10.21227/s7yk-4w39
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Advancements in Medical Vision-Language Pre-training (Medical-VLP) progress rapidly by learning representations from paired radiology reports. Nevertheless, there still remain two issues that restrict the development of Medical-VLP: the scarcity of parallel image-report pair and monotony of pre-training tasks. Thus, we propose Multi-Grained Cross-Domain Report Searching (CDRS) strategy, and Multi-Task Driven Language-Image Pre-Training (MLIP) framework. CDRS strives to match report-less radiology images with matched reports, by leveraging collaborative alignment training that bridges cross-domain data, operating at in-domain cross-modal, out-of-domain self-supervised, and patch-wise levels. MLIP utilizes a unified model to compute multiple training objectives with minimal overhead by sharing the same computational graph. It integrates single-encoder, dual-encoder, and encoder-decoder paradigms, enabling the decoupled model to perform both discriminative and generative tasks, such as report generation and medical visual question answering. Extensive experiments and analyses demonstrate that our method achieves state-of-the-art performance across multiple discriminative and generative medical downstream tasks. 

Instructions: 

dataset of Cross-Domain Data Alignment and Multi-Task Driven Language-Image Pre-Training for Radiology Tasks

Comments

this data is very useful

Submitted by Abdelmalek Eladjelet on Tue, 07/23/2024 - 19:32

Dataset Files

    Files have not been uploaded for this dataset