EHRSHOT

A benchmark for few-shot evaluation of EHR foundation models

15 clinical prediction tasks · longitudinal data · reproducible leaderboard

Leaderboard

Loading…

6,739
Patients
Longitudinal, non-ICU restricted
41.7M
Events
Structured EHR signals
921k
Visits
Real-world care trajectories
15
Tasks
Curated prediction suite

What is EHRSHOT?

New Longitudinal EHR Dataset

Our dataset contains de-identified structured data from the electronic health records of 6,729 patients from Stanford Medicine. Unlike MIMIC-III/IV, EHRSHOT is longitudinal and not restricted to ICU/ED.

Curated Tasks for Benchmarking

Evaluate machine learning models using 15 clinical tasks covering diagnostics, patient outcomes, and resource allocation. Tasks are few-shot focused, requiring only a few labeled examples.

Reproducible Leaderboard & Baselines

Compare your model to strong baselines and other submissions using our leaderboard to see how they perform across 15 prediction tasks and updated easily with reproducible evaluation scripts.

Citation

@inproceedings{wornow2023ehrshot,
  title={EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models},
  author={Wornow, Michael and Thapa, Rahul and Steinberg, Ethan and Fries, Jason and Shah, Nigam},
  booktitle={Advances in Neural Information Processing Systems},
  volume={36},
  year={2023}
}