AI Safety
Research: LLM Safety Training Survives RL Optimization
New research examines whether safety guardrails in large language models remain intact when agents are optimized for helpfulness through reinforcement learning.
AI Safety
New research examines whether safety guardrails in large language models remain intact when agents are optimized for helpfulness through reinforcement learning.
mechanistic interpretability
New research goes beyond behavioral analysis to trace the internal mechanisms LLMs use when weighing competing reward signals, offering insights into AI decision-making at the circuit level.
LLM Research
New survey examines how classical narrative frameworks are being integrated with large language models to improve automatic story generation and comprehension capabilities.
LLM Research
New research introduces a comprehensive benchmark for evaluating how well LLMs can quantify their own uncertainty when grading, with implications for AI reliability and trustworthy automated systems.
LLM Research
New research reveals how benchmark data contamination undermines the reliability of LLM-based recommendation systems, raising critical questions about AI evaluation integrity.
LLM Research
Researchers propose measuring LLM reasoning quality through 'deep-thinking tokens' rather than output length, offering new insights into how AI models actually process complex problems.
LLM Research
New research explores techniques for stabilizing native low-rank pretraining in large language models, potentially enabling more efficient training of foundation models.
LLM Research
Researchers assess how well large language models handle questions about recent events, revealing critical limitations in temporal knowledge that affect AI system reliability.
LLM Research
Researchers propose a novel framework for visualizing and benchmarking factual hallucinations in large language models by analyzing internal neural activations and clustering patterns.
LLM Research
New research shows that requiring LLMs to think step-by-step before responding can backfire in conversational settings, making AI agents appear cold and disengaged to users.
Voice AI
New research equips large language models with directional multi-talker speech capabilities, enabling AI to understand who is speaking and from where in complex audio environments.
LLM Research
Researchers propose a two-phase sparse attention mechanism that scouts relevant tokens before full computation, promising significant efficiency gains for large language model inference.