AI Real Talk Series to tackle benchmarking of LLM tools for newsrooms — AI For Newsroom

The event aimed at redefining how newsrooms evaluate large language models (LLMs). The event, led by Charlotte Li, Jeremy Gilbert, and Nicholas Diakopoulos from Northwestern University’s Generative AI in the Newsroom (GAIN) initiative, will address the persistent gap between technical AI benchmarks and the real-world editorial standards that matter to journalists.

Despite the growing adoption of LLMs in reporting, most newsrooms continue to rely on tech-centric evaluation methods. These often fail to capture critical journalistic values like accuracy, transparent sourcing, and editorial integrity. The GAIN team will share insights from their May 2025 workshop, focusing on the design and implementation of rubric-based benchmarks that reflect the unique demands of newsrooms.

The session will cover why editorially grounded benchmarks are essential for journalism, how to design effective evaluation frameworks, and a case study on benchmarking information extraction tasks. It is tailored for newsroom leaders, data journalists, product teams, and anyone responsible for assessing or implementing AI tools in journalism.