Dataset

EHRSHOT is a collection of structured data from 6,739 deidentified longitudinal electronic health records (EHRs) sourced from Stanford Medicine.

Access

Datasheet

EHRSHOT contains:

  • 6,739 patients
  • 41,661,637 million clinical events
  • 921,499 visits
  • 15 prediction tasks

Each patient consists of an ordered timeline of clinical events taken from the structured data of their EHR (e.g. diagnoses, procedures, prescriptions, etc.). Note that EHRSHOT does NOT contain clinical text or images.

Sample

The raw EHRSHOT dataset is a single CSV with 41M rows that looks like:

patient_idstartendcodevalueunitvisit_idomop_table
122010-04-08 01:30:002010-04-09 10:33:00CPT4/868504930procedure_occurrence
1052001-10-28 16:11:002001-10-28 16:11:00SNOMED/387458008940drug_exposure

Format

The EHRSHOT dataset is a single CSV with these columns:

  1. patient_id - Integer - Unique identifier for patient
  2. start - Datetime - Start time of event
  3. end - Datetime (optional) - End time of event
  4. code - String - Name of the clinical event (e.g. “SNOMED/3950001” or “ICD10/I25.110”)
  5. value - Float/String (optional) - Either a numerical value associated with an event (e.g. a lab test result) or a string associated with a categorical variable (e.g. “Yes/No” questions)
  6. unit - String (optional) - Unit of measurement for Value
  7. visit_id - Integer (optional) - Unique identifier for the visit during which this event occurred
  8. omop_table - String - Name of the source OMOP-CDM table where this event was recorded

Statistics

Events
  • Events per patient (median): 2592.0
  • Events per patient (mean): 6182.2
Visits
  • Visits per patient (median): 58.0
  • Visits per patient (mean): 136.7
Timeline Lengths

Note: The timeline length is the time between the first and last visit for each patient. We exclude patients without any visits.

  • Timeline length in years per patient (median): 7.8
  • Timeline length in years per patient (mean): 8.6

Additional Details

For more information, please read the original EHRSHOT paper.

Questions?

For questions and feedback, please open an Issue on Github