AI Benchmarks
FrontierScience Benchmark Tests AI on Expert Science Tasks
New benchmark evaluates whether frontier AI models can perform PhD-level scientific research tasks, revealing significant gaps between current capabilities and expert human performance.