LLM Agents
Diagnosing Tool Failures in Multi-Agent LLM Systems
New research introduces a systematic framework for identifying why LLM agents fail to invoke tools correctly, addressing a critical reliability gap in multi-agent AI systems.
LLM Agents
New research introduces a systematic framework for identifying why LLM agents fail to invoke tools correctly, addressing a critical reliability gap in multi-agent AI systems.
LLM Agents
Researchers introduce a determinism-faithfulness assurance harness for tool-using LLM agents, enabling reliable replay testing to catch unpredictable AI behavior in critical applications.
LLM Agents
New research introduces Aeon, a memory management system combining neural and symbolic approaches to help LLM agents maintain coherent reasoning across extended task sequences.
Agentic AI
New comprehensive survey systematically categorizes agentic AI architectures, evaluation frameworks, and taxonomies for large language model agents, providing foundational insights for autonomous AI systems.
LLM Agents
Researchers propose a constrained-topology planning approach for LLM agents that improves reliability in automated feature engineering, addressing key challenges in ML pipeline automation.
LLM Agents
Researchers introduce Task2Quiz, a systematic paradigm for evaluating what LLM agents actually know about their operating environments, revealing critical gaps in agent world models.
LLM Agents
New research presents SimpleMem, an efficient memory architecture enabling LLM agents to maintain persistent context across extended interactions without traditional retrieval overhead.
LLM Agents
Researchers challenge the assumption that LLM agents work reliably with perfect APIs, revealing how real-world complexity degrades AI performance.
AI Safety
New research explores how LLM-powered agents may develop biases against humans based on belief systems, revealing critical vulnerabilities in autonomous AI decision-making.
LLM Agents
New research uses multi-agent LLM systems simulating venture capitalists to evaluate startups, achieving notable predictive accuracy through collective roleplay-based reasoning.
LLM Agents
New research introduces GenEnv, a framework where LLM agents and environment simulators co-evolve through difficulty-aligned training, enabling more robust agent capabilities.
LLM Agents
New research introduces ABBEL, an architecture that constrains LLM agents to act through explicit belief states expressed in natural language, improving interpretability and decision-making in complex environments.