LLM Agents
New Benchmark Tests LLM Agents Against Messy Real-World APIs
Researchers challenge the assumption that LLM agents work reliably with perfect APIs, revealing how real-world complexity degrades AI performance.
LLM Agents
Researchers challenge the assumption that LLM agents work reliably with perfect APIs, revealing how real-world complexity degrades AI performance.
AI Safety
New research explores how LLM-powered agents may develop biases against humans based on belief systems, revealing critical vulnerabilities in autonomous AI decision-making.
LLM Agents
New research uses multi-agent LLM systems simulating venture capitalists to evaluate startups, achieving notable predictive accuracy through collective roleplay-based reasoning.
LLM Agents
New research introduces GenEnv, a framework where LLM agents and environment simulators co-evolve through difficulty-aligned training, enabling more robust agent capabilities.
LLM Agents
New research introduces ABBEL, an architecture that constrains LLM agents to act through explicit belief states expressed in natural language, improving interpretability and decision-making in complex environments.
AI Safety
Researchers present a framework for making multi-turn LLM agents more trustworthy through behavioral guidance, addressing critical safety concerns as AI systems become more autonomous.
Agentic AI
New research surveys the core architectural patterns enabling autonomous AI agents, from single-agent designs to multi-agent orchestration frameworks that power complex AI workflows.
LLM Agents
New research introduces a co-adaptive dual-strategy framework combining fast intuitive reasoning with slow deliberative thinking to improve LLM-based agent performance.
LLM Agents
New research introduces SABER, a safeguarding framework that identifies how small errors in LLM agent actions can cascade into significant failures, proposing intervention mechanisms.
LLM Agents
New research introduces SelfAI, a framework enabling LLM agents to autonomously generate training data and improve performance without human annotation. The system uses multi-agent collaboration for self-supervised learning.
LLM Agents
Researchers introduce a simple yet effective approach to managing conversational memory in LLM agents, addressing context window limitations through structured memory organization and retrieval mechanisms.
AI Safety
New research presents a technical framework for detecting and neutralizing malicious web-based LLM agents through real-time monitoring and intervention systems, addressing growing AI safety concerns.