HealthBench Professional
HealthBench Professional (from OpenAI) is a benchmark of 525 physician-authored tasks testing LLMs on real clinician workflows across care consult, documentation, and medical research, graded by multiple physicians.
Orinn consistently outperforms leading AI models across HealthBench, MedQA, PubMedQA, and MMLU (Medical), delivering superior clinical reasoning, diagnostic accuracy, and real-world healthcare intelligence. Built with an advanced medical agentic system, Orinn is designed to handle complex clinical workflows, decision support, and healthcare operations with precision and reliability. From diagnosis assistance to medical documentation and workflow automation, Orinn is optimized for high-stakes healthcare environments where accuracy matters most. Its benchmark performance reflects not just strong academic results, but practical readiness for enterprise-scale healthcare deployment.
HealthBench Professional (from OpenAI) is a benchmark of 525 physician-authored tasks testing LLMs on real clinician workflows across care consult, documentation, and medical research, graded by multiple physicians.
MedAgentBench (from Stanford) is a benchmark of 300 cases testing LLMs as autonomous clinical agents across 10 EHR tasks, evaluating FHIR API use, clinical decisions, and protocol adherence.
Orinn 1.7 outperformed GPT-5.2, Gemini 3.1 Pro, Claude Opus 4.6, and Corti Symphony on real-world medical coding tasks using our internal evaluation datasets, delivering higher accuracy and stronger clinical performance.