Abstract
Editors have highlighted the following attributes while ensuring the content's credibility: fact-checked peer-reviewed publication trusted source proofread A council of five AI models working together, discussing their answers through an iterative process, achieved 97%, 93%, and 94% accuracy on 325 medical exam questions spanning the three stages of the U.S. Medical Licensing Examination (USMLE), according to a study published in PLOS Medicine by researcher Yahya Shaikh of Baltimore, U.S., and colleagues. A facilitator algorithm facilitates a deliberative process when there are divergent responses, summarizing the reasoning in each response and asking the council to deliberate and re-answer the original question. When the council was given 325 publicly available USMLE questions, including those focused on foundational biomedical sciences as well as clinical diagnosis and management, the system achieved consensus responses that were correct 97%, 93%, and 94% of the time for Step 1, Step 2 CK, and Step 3, respectively, outperforming single-instance GPT-4 models. Our work provides the first clear evidence that AI systems can self-correct through structured dialog, with the performance of the collective better than the performance of any single AI.
Key Data
-
Publication Date09 October 2025
-
Primary AuthorPublic Library of Science
-
SourceMedical Xpress
-
LanguageEnglish
Click below to visit original source: