MedAlign

A Clinician-Generated Benchmark Dataset for Instruction Following with Electronic Medical Records

Large language models (LLMs) have demonstrated human-level fluency in following natural language instructions, offering potential to reduce administrative burdens in healthcare. However, evaluating LLMs on real-world clinical tasks remains a challenge. MEDALIGN addresses this by introducing a benchmark dataset of 983 natural language instructions for Electronic Health Record (EHR) data, curated by 15 clinicians across 7 specialties. The dataset includes:

  • 302 clinician-written reference responses for instruction-following evaluation.
  • 275 longitudinal EHRs to ground instruction-response pairs.
  • Comparative analysis of six LLMs, revealing high error rates ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct).
  • An 8.3% drop in accuracy when GPT-4 moved from a 32k to a 2k context length.
  • Assessment of correlations between clinician rankings and automated metrics, highlighting COMET as the best-performing automated evaluation metric.
MEDALIGN Workflow

Citation

@inproceedings{DBLP:conf/aaai/FlemingLHJRTBGS24,
  author       = {Scott L. Fleming and Alejandro Lozano and William J. Haberkorn and Jenelle A. Jindal and Eduardo Reis and Rahul Thapa and Louis Blankemeier and Julian Z. Genkins and Ethan Steinberg and Ashwin Nayak and Birju S. Patel and Chia{-}Chun Chiang and Alison Callahan and Zepeng Huo and Sergios Gatidis and Scott J. Adams and Oluseyi Fayanju and Shreya J. Shah and Thomas Savage and Ethan Goh and Akshay S. Chaudhari and Nima Aghaeepour and Christopher D. Sharp and Michael A. Pfeffer and Percy Liang and Jonathan H. Chen and Keith E. Morse and Emma P. Brunskill and Jason A. Fries and Nigam H. Shah},
  title        = {MedAlign: {A} Clinician-Generated Dataset for Instruction Following with Electronic Medical Records},
  booktitle    = {Thirty-Eighth {AAAI} Conference on Artificial Intelligence},
  year         = {2024},
  url          = {https://doi.org/10.1609/aaai.v38i20.30205},
  doi          = {10.1609/AAAI.V38I20.30205},
}