YouTube
YouTube Expands Likeness Detection Tool Worldwide
YouTube is rolling out its likeness detection feature globally, letting creators identify and request removal of AI-generated deepfake videos that use their face or voice without consent.
YouTube
YouTube is rolling out its likeness detection feature globally, letting creators identify and request removal of AI-generated deepfake videos that use their face or voice without consent.
YouTube
YouTube is rolling out its AI likeness detection tool to all adult creators, letting them find and request removal of deepfake videos that use their face or voice without consent.
Content Moderation
A deep dive into designing data labeling pipelines for content moderation systems—critical infrastructure for detecting harmful synthetic media, deepfakes, and policy-violating AI-generated content at scale.
Canva
Canva issued an apology after users discovered its Magic Studio AI image tools were stripping the word 'Palestine' from designs and replacing it with unrelated content, raising fresh concerns about bias in generative AI systems.
Content Moderation
A former Facebook insider launches Moonbounce, a startup building content moderation tools designed for the AI era — tackling synthetic media, deepfakes, and AI-generated content at platform scale.
AI Policy
X announces creators face suspension from revenue-sharing for posting unlabeled AI-generated content depicting armed conflict, marking a significant enforcement shift in synthetic media disclosure policies.
LLM Safety
New research introduces FlexGuard, a continuous risk scoring framework that enables adaptive content moderation strictness for LLMs, moving beyond binary safe/unsafe classifications.
Content Moderation
New research proposes combining ML-assisted sampling with LLM labeling to measure policy-violating content at scale, offering a methodological breakthrough for detecting synthetic media and deepfakes.
LLM Safety
Researchers propose a novel technique for removing toxic behaviors from large language models by projecting out malicious representations in the model's latent space.
AI Safety
Researchers introduce GuardEval, a comprehensive benchmark evaluating LLM moderators across safety, fairness, and robustness dimensions—critical metrics for AI content authentication systems.
AI Safety
Comprehensive technical guide to implementing AI safety guardrails, from prompt-based filtering to advanced validation architectures. Covers practical methods for ensuring secure and relevant AI interactions with code examples.
AI Safety
New research exposes critical AI safety flaw: rhyming prompts bypass guardrails in 62% of language models tested, revealing how poetic formatting defeats content moderation systems through pattern recognition exploitation.