LLM Agents
Task2Quiz: New Framework Tests How AI Agents Understand Environme
Researchers introduce Task2Quiz, a systematic paradigm for evaluating what LLM agents actually know about their operating environments, revealing critical gaps in agent world models.