Version 1.0 — Now Live

HealthAdminBench

Suhana Bedi1,*, Ryan Welch1,*, Ethan Steinberg2,*, Michael Wornow2,Taeil Matthew Kim1,
Haroun Ahmed2, Peter Sterling2, Bravim Purohit2, Qurat Akram3, Angelic Acosta3,
Esther Nubla3, Pritika Sharma3, Mike Pfeffer3, Sanmi Koyejo1, Nigam Shah1

1 Stanford University 2 Kinetic Systems 3 Stanford Healthcare
Leaderboard Environments Paper GitHub
135
Tasks
1698
Subtasks
4
GUI Environments
5
Models Evaluated

Model Rankings

Prompt
Observation Mode
Rank Model Score Avg. Steps
Loading benchmark results...

Model Comparisons

Metric Explorer
Prompt
Observation Mode
vs
Loading comparison chart...
Loading correlation...

Live GUI Environments

Click the links below to view live, hosted versions of all GUI envs. You can also self-host them by following instructions in the GitHub repo.

Dataset Explorer

HealthAdminBench contains 135 tasks sourced from three core healthcare administrative workflows (prior authorization, DME orders, appeals) across three difficulty levels.

You can view all raw task JSON files on Github here.

How It Works

A reproducible evaluation framework for measuring AI agent capability on healthcare workflows.

1

Agent Observes Portal

The AI agent is given a healthcare task mirroring real admin workflows and observes a live web portal via accessibility tree, screenshots, or both.

2

Agent Takes Actions

The agent navigates forms, reviews clinical documentation, checks coverage criteria, and submits decisions all through standard browser interactions.

3

Multi-Faceted Evaluation

Each task is scored across multiple evaluation criteria (from exact state checks to LLM-judged clinical accuracy) with statistical reproducibility across runs.

If you found this work helpful, please cite it as:

@article{healthadminbench,
  title={HealthAdminBench: A Benchmark for Evaluating LLMs on Solving Administrative Healthcare Tasks},
  author={Suhana Bedi and Ryan Welch and Ethan Steinberg and Michael Wornow and Taeil Matthew Kim and Haroun Ahmed and Peter Sterling and Bravim Purohit and Qurat Akram and Angelic Acosta and Esther Nubla and Pritika Sharma and Mike Pfeffer and Sanmi Koyejo and Nigam Shah},
  year={2026}
}