LLM Research
Brittlebench: New Benchmark Measures LLM Fragility to Prompts
New research introduces Brittlebench, a systematic framework for quantifying how sensitive large language models are to minor prompt variations, revealing critical reliability gaps in AI systems.