LLM Safety
FlexGuard: Adaptive Risk Scoring for LLM Content Moderation
New research introduces FlexGuard, a continuous risk scoring framework that enables adaptive content moderation strictness for LLMs, moving beyond binary safe/unsafe classifications.