May 25, 2026

Issue 1 - May 25, 2026

Large language model diagnostic assistance for physicians in a lower-middle-income country

Physicians trained to use GPT-4o scored 71.4% on diagnostic reasoning tests; colleagues using conventional resources scored 42.6%. That 27.5 percentage point gap comes from a randomized controlled trial of 58 physicians at Lahore University of Management Sciences, published in Nature Health. All participants completed the same 20-hour AI literacy curriculum first; the difference reflects access, not training.

GPT-4o running solo scored 82.9%. The AI outperformed the human-AI team overall. In 31.4% of cases, though, the physician-plus-AI combination exceeded the model's median, meaning human judgment contributed something the model alone did not.

The training finding was counterintuitive. Physicians with the least prior LLM experience improved by 46.4 percentage points; those using LLMs monthly gained 25.4. Familiarity did not predict better use.

Pakistan's control group scored 42.6%. In a comparable US RCT, the control group scored 74%. That 30-point gap, before AI enters the picture, is the resource disparity the researchers were measuring.

Read the source →

AI Upside

Get the next issue

Weekly stories of AI improving human lives, in your inbox.

No spam. Unsubscribe any time.