AI Benchmarks
Fantastic Bugs: Quality Issues in AI Benchmarks Exposed
New research systematically catalogs bugs and quality issues plaguing AI benchmarks, revealing how evaluation flaws impact model assessment across vision, language, and multimodal systems.