This repository contains the implementation for our MIDL 2025 paper "Visual Prompt Engineering for Vision Language Models in Radiology".
This project aims to evaluate how different visual prompting techniques affect the performance of vision-language models (VLMs) in the domain of radiology image analysis. We assess the impact of different visual annotations (bounding boxes, circles, arrows, and crops) on model performance across several common chest X-ray datasets. We demonstrate, that visual markers, particularly a red circle, improve AUROC by up to 0.185.
- Evaluation of multiple vision-language models (BiomedCLIP, BMC_CLIP_CF) on radiology datasets
- Implementation of various visual prompting techniques:
- Bounding boxes
- Circles
- Arrows
- Region cropping
- Support for multiple radiology datasets:
- PadChest-GR
- VinDr-CXR
- NIH14 (ChestX-ray14)
- JSRT (Japanese Society of Radiological Technology)
-
Clone this repository:
git clone https://github.com/MIC-DKFZ/VPE-in-Radiology.git cd VPE-in-Radiology
-
Make sure you have Python 3.12 installed:
python --version
-
Install the required packages:
pip install -r requirements.txt
-
Download the required datasets:
- PadChest-GR: Download from BIMCV website - Requires registration for the grounded reports version
- VinDr-CXR: Download from PhysioNet - Requires PhysioNet credentialed access
- NIH ChestX-ray14: Download from NIH
- JSRT Dataset: Download from JSRT website - Requires registration
-
Configure dataset paths in
default_config.json
to match your local environment. Update the following fields for each dataset:{ "PadChestGRTrainDataset": { "metadata_file": "/path/to/padchest/metadata.csv", "images_dir": "/path/to/padchest/images", "grounded_reports": "/path/to/padchest/grounded_reports.json" }, "VinDrCXRTrainDataset": { "metadata_file": "/path/to/vindrcxr/annotations_train.csv", "images_dir": "/path/to/vindrcxr/pngs/train" } }
Alternatively, you can create a custom config file (e.g.,
custom_config.json
) and specify it using theCONFIG
environment variable:export CONFIG=/path/to/custom_config.json python src/process.py --model_name BiomedCLIPModel --dataset_name PadChestGRTrain
Run the core processing script with different parameters:
python src/process.py --model_name BiomedCLIPModel --dataset_name PadChestGRTrain
Add visual annotations to images:
# Add bounding box
python src/process.py --model_name BiomedCLIPModel --dataset_name PadChestGRTrain --image_annotation_type bbox --color red
# Add circle
python src/process.py --model_name BiomedCLIPModel --dataset_name PadChestGRTrain --image_annotation_type circle --color red
# Add arrow
python src/process.py --model_name BiomedCLIPModel --dataset_name PadChestGRTrain --image_annotation_type arrow --color red
# Use cropping
python src/process.py --model_name BiomedCLIPModel --dataset_name PadChestGRTrain --image_annotation_type crop
You can add text annotation suffixes that describe the visual prompts:
python src/process.py --model_name BiomedCLIPModel --dataset_name PadChestGRTrain --image_annotation_type bbox --color red --text_annotation_suffix ' indicated by a red bounding box'
To reproduce all experiments from our paper, run:
bash reproduce_paper.sh
src/
: Core source codemodels.py
: Implementation of vision-language modelsprocess.py
: Main processing script for running experimentsdatasets/
: Dataset-specific implementationsbase_dataloader.py
: Base class for all datasetspadchestgr/
,vindrcxr/
,nih14/
,jsrt/
: Dataset-specific modules
utils/
: Utility functions
experiments/
: Output directory for experiment resultsdefault_config.json
: Configuration file for dataset pathsrequirements.txt
: Required Python packagesreproduce_paper.sh
: Script to reproduce all experiments from the paper
Experiment results are saved under the experiments/
directory with the following structure:
experiments/
{DATASET_NAME}/
{MODEL_NAME}/
{ANNOTATION_TYPE}/
{COLOR}/
{LINE_WIDTH}/
{TEXT_ANNOTATION}/
metrics.csv
metrics.json
results.csv
Each experiment directory contains:
metrics.json
: Detailed performance metricsmetrics.csv
: Summary of performance metrics in CSV formatresults.csv
: Raw prediction results for each image
If you use this code or our findings in your research, please cite our paper:
@inproceedings{denner2025visual,
title={Visual Prompt Engineering for Vision Language Models in Radiology},
author={Denner, Stefan and Bujotzek, Markus Ralf and Bounias, Dimitrios and Zimmerer, David and Stock, Raphael and Maier-Hein, Klaus},
booktitle={Medical Imaging with Deep Learning},
year={2025}
}