Labops

Medical Benchmark Leadership

Orinn consistently outperforms leading AI models across HealthBench, MedQA, PubMedQA, and MMLU (Medical), delivering superior clinical reasoning, diagnostic accuracy, and real-world healthcare intelligence. Built with an advanced medical agentic system, Orinn is designed to handle complex clinical workflows, decision support, and healthcare operations with precision and reliability. From diagnosis assistance to medical documentation and workflow automation, Orinn is optimized for high-stakes healthcare environments where accuracy matters most. Its benchmark performance reflects not just strong academic results, but practical readiness for enterprise-scale healthcare deployment.

HealthBench
HealthBench Hard
Hallucination
Orinn-1.7
75.3
48.1
2.1
Baichuan-M3
65.1
44.4
3.5
GPT-5.2-High
63.0
42.0
3.8
Gemini-3-Pro
46.2
15.2
7.1
Open Access
Update 17-05-2026

HealthBench Professional

HealthBench Professional (from OpenAI) is a benchmark of 525 physician-authored tasks testing LLMs on real clinician workflows across care consult, documentation, and medical research, graded by multiple physicians.

Agentic / EHR
525 samples
Text-only
Models
Orinn-1.7
Orinn-1.7
64.8
GPT-5.4 for Clinicians
GPT-5.4 for Clinicians
59
GPT-5.4
GPT-5.4
48.1
Opus 4.7
Opus 4.7
46.2
GPT-5
GPT-5
46.2
Open Access
Update 17-05-2026

MedAgentBench

MedAgentBench (from Stanford) is a benchmark of 300 cases testing LLMs as autonomous clinical agents across 10 EHR tasks, evaluating FHIR API use, clinical decisions, and protocol adherence.

Agentic / EHR
300 samples
Text-only
Models
Orinn-1.7
Orinn-1.7
99.67
Gemini 3.1 Pro
Gemini 3.1 Pro
91.3
GPT-5.5
GPT-5.5
89.4
Claude Opus 4.7
Claude Opus 4.7
89
Gemini 3 Flash
Gemini 3 Flash
88
Open Access
Update 17-05-2026

Medical Coding

Orinn 1.7 outperformed GPT-5.2, Gemini 3.1 Pro, Claude Opus 4.6, and Corti Symphony on real-world medical coding tasks using our internal evaluation datasets, delivering higher accuracy and stronger clinical performance.

Others
180 samples
Text-only
Models
Orinn-1.7
Orinn-1.7
69
GPT-5.2
GPT-5.2
63
Gemini-3.1-Pro
Gemini-3.1-Pro
61