LLM Agents
New Benchmark Tests LLM Agents Against Messy Real-World APIs
Researchers challenge the assumption that LLM agents work reliably with perfect APIs, revealing how real-world complexity degrades AI performance.
LLM Agents
Researchers challenge the assumption that LLM agents work reliably with perfect APIs, revealing how real-world complexity degrades AI performance.
LLM
New research introduces entropy-based adaptive speculation that detects reasoning phases in LLMs, dynamically adjusting decoding strategies to improve both speed and output quality.
LLM
New research introduces STED and Consistency Scoring, a systematic framework for measuring how reliably large language models produce structured outputs—critical for production AI systems.
LLM Inference
New research introduces Yggdrasil, a tree-based speculative decoding architecture that bridges dynamic speculation with static runtime for faster LLM inference.
LLM Agents
New research uses multi-agent LLM systems simulating venture capitalists to evaluate startups, achieving notable predictive accuracy through collective roleplay-based reasoning.
AI Governance
Researchers propose comprehensive framework for governing agentic AI systems, mapping capabilities to risks and establishing safety protocols as autonomous agents become more prevalent.
Reinforcement Learning
Liquid AI's LFM2-2.6B-Exp uses pure reinforcement learning without supervised fine-tuning, achieving dynamic hybrid reasoning that outperforms larger models on key benchmarks.
Diffusion Models
New research reveals how diffusion models suffer 'generative collapse' when trained on synthetic data, with dominated samples disappearing while dominating ones proliferate across generations.
LLM Interpretability
New research maps LLM internal representations to brain-derived axes, enabling interpretable reading and targeted steering of model behavior without fine-tuning.
LLM Agents
New research introduces ABBEL, an architecture that constrains LLM agents to act through explicit belief states expressed in natural language, improving interpretability and decision-making in complex environments.
neural networks
New research proposes training graph-based neural networks using few-shot learning without traditional backpropagation, potentially revolutionizing how AI models are trained.
Diffusion Models
New research introduces SD2AIL, combining diffusion models with adversarial imitation learning to generate synthetic expert demonstrations, advancing AI training without human data dependency.