LLM Security
Special Token Attacks: The 96% LLM Jailbreak Exploit
Security researchers uncover how special tokens in LLM architectures create hidden attack surfaces, enabling jailbreak success rates as high as 96% across major models.
LLM Security
Security researchers uncover how special tokens in LLM architectures create hidden attack surfaces, enabling jailbreak success rates as high as 96% across major models.
LLM Research
Researchers propose methods to measure and eliminate hallucination risks in large language models, shifting from generative to consultative AI for high-stakes legal applications.
AI Safety
New research presents comprehensive guardrails for LLM trust, safety, and ethical deployment, addressing critical challenges in preventing harmful outputs and ensuring responsible AI development.
AI Funding
New AI startup Humans& secures one of the largest seed rounds ever at $480M, founded by veterans from Anthropic, xAI, and Google pursuing 'human-centric' AI development.
LLM Research
New research reveals how LLMs develop 'directional attractors' during reasoning tasks, showing that similarity-based retrieval mechanisms systematically steer iterative summarization toward predictable patterns.
LLM Research
New research introduces PrivacyReasoner, a framework enabling LLMs to emulate human privacy reasoning patterns for better protection of personal information in AI systems.
LLM Security
New research introduces State-Transition Amplification Ratio (STAR) to identify inference-time backdoor attacks in large language models by analyzing anomalous reasoning patterns.
LLM Alignment
Researchers introduce ECLIPTICA, a framework using Contrastive Instruction-Tuned Alignment (CITA) to enable dynamic switching between aligned and unaligned LLM behaviors for safety research.
LLM unlearning
New research introduces domain-to-instance framework for generating synthetic data to help large language models selectively forget harmful knowledge while preserving useful capabilities.
AI Safety
Researchers introduce GuardEval, a comprehensive benchmark evaluating LLM moderators across safety, fairness, and robustness dimensions—critical metrics for AI content authentication systems.
AI Certification
New research proposes maturity-based certification for embodied AI systems, introducing quantifiable trustworthiness metrics that could reshape how we evaluate AI reliability and authenticity.
LLM Security
New research proposes ALERT, a training-free method to detect jailbreak attacks on LLMs by analyzing discrepancies between internal model representations and output behavior.