AI Security
Top AI Red Teaming Tools for Securing ML Models in 2026
A roundup of leading AI red teaming tools used to probe, stress-test, and harden machine learning models against adversarial attacks, jailbreaks, and data leakage in 2026.
AI Security
A roundup of leading AI red teaming tools used to probe, stress-test, and harden machine learning models against adversarial attacks, jailbreaks, and data leakage in 2026.
disinformation
New research moves beyond surface-level detection to examine how humans actually evaluate the risk of LLM-generated disinformation, revealing gaps in current assessment frameworks.
AI Security
New research proposes combining LLM-as-a-Judge with Mixture-of-Models to detect prompt injection attacks, a growing threat to generative AI systems including video and image generators.
LLM Safety
New research introduces explainable approaches to LLM unlearning, enabling models to selectively forget information while providing transparent reasoning for the process.
LLM Safety
New research introduces FlexGuard, a continuous risk scoring framework that enables adaptive content moderation strictness for LLMs, moving beyond binary safe/unsafe classifications.
LLM Safety
New research explores whether constraining specific parameter regions in large language models can ensure safety, examining the theoretical foundations of alignment through architectural constraints.
LLM Safety
New research proposes geometric methods to enhance LLM safety alignment robustness, offering potential improvements for AI systems that moderate synthetic media and deepfake content.
LLM Safety
New research examines how persuasive content propagates through multi-agent LLM systems, revealing critical insights for AI safety and synthetic influence detection.
LLM Safety
Researchers introduce Q-realign, a technique that piggybacks safety realignment onto quantization, solving the problem of safety degradation in compressed LLMs for efficient deployment.
LLM Safety
New research introduces a framework for evaluating implicit regulatory compliance in LLM tool invocations using logic-guided synthesis, addressing critical AI safety concerns.
LLM Safety
Researchers propose a novel technique for removing toxic behaviors from large language models by projecting out malicious representations in the model's latent space.
AI Alignment
Researchers propose a scalable self-improving framework for open-ended LLM alignment that leverages collective agency principles to address evolving AI safety challenges.