LLM Benchmark for Oncology Treatment Planning

This repository contains the code to create the curated dataset from the original CORAL: expert-Curated medical Oncology Reports to Advance Language model inference provided by Physionet.

Generate the dataset from the CORAL dataset

Download the following datasets:
CORAL-Dataset: https://physionet.org/content/curated-oncology-reports/1.0/

We used the unannotated files:\

breast cancer: brca_unannotated.csv
pancretic cancer: pdac_unannotated.csv

To seperate the assessment and treatment plans from the rest of the clinical notes of the unstructured notes note_text run the following .py files:
split_assessmentplan_BRCA.py
split_assessmentplan_PDAC.py

This results into a column "clinical_case" and a column "assessment_plan"

Both columns were further structured using Claude 3.5 Sonnet to extract relevant information without changing the original text using the following .py files:
to structure the clinical_case column:
structure_BRCA.py
structure_PDAC.py

to structure the assessment_plan column:
structure_plan_BRCA.py
structure_plan_PDAC.py

Environment

Create a new environment with python==3.10 and install the libraries from requirements.txt:

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Benchmark for Oncology Treatment Planning

Generate the dataset from the CORAL dataset

Environment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Clinician_Evaluation_template		Clinician_Evaluation_template
functions		functions
README.md		README.md
requirements.txt		requirements.txt
split_assessmentplan_BRCA.py		split_assessmentplan_BRCA.py
split_assessmentplan_PDAC.py		split_assessmentplan_PDAC.py
structure_BRCA.py		structure_BRCA.py
structure_PDAC.py		structure_PDAC.py
structure_plan_BRCA.py		structure_plan_BRCA.py
structure_plan_PDAC.py		structure_plan_PDAC.py

BIMSBbioinfo/OncoLLMBenchmark

Folders and files

Latest commit

History

Repository files navigation

LLM Benchmark for Oncology Treatment Planning

Generate the dataset from the CORAL dataset

Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages