Anatomical Structure-Guided Medical Vision-Language Pre-training

1Anonymous Organization

Two limitations of existing methods: (a) lack of interpretability and clinical relevance and (b) insufficient representation learning of image-report pairs; and our corresponding improvement.

The pipeline of Anatomical Structure-Guided Medical Vision-Language Pre-training.

Demo for Anatomical Structure-Sentence Alignment


Three scenarios of anatomical region-sentence alignment.

We provide an example, showing the alignment results under two different methods, i.e., merge bbox, split sentence.



Raw report: mild left basal atelectasis. otherwise unremarkable. ap upright and lateral views the chest were provided. mild left basal atelectasis. lungs are otherwise clear. no signs of pneumonia or edema. no large effusion or pneumothorax. cardiomediastinal silhouette is normal. bony structures are intact. no free air below the right hemidiaphragm.

Merge Bbox


Split Sent