Skip to main content

Scaling the Prompt: How Batch Size Shapes Performance of Mid-2025 State-of-the-Art LLMs in Automated Title-and-Abstract Screening

News


Abstract

Objectives To evaluate the classification performance of four state-of-the-art LLMs (Gemini 2.5 Pro, Gemini 2.5 Flash, GPT-5, and GPT-5 mini) in predicting study eligibility across a wide range of batch sizes for a systematic review of randomised controlled trials. Yes I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes bioRxiv and medRxiv thank the following for their generous financial support: The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, The University of Edinburgh, University of Washington, and Vrije Universiteit Amsterdam.
Key Data

  • Publication Date
    04 September 2025
  • Primary Author
    Natalia Borg
  • Source
    medRxiv
  • Language
    English
Click below to visit original source: