research - SkrewAI (Page 16)

AI Safety

Model Raising vs Training: New AI Development Paradigm

Researchers propose fundamental shift from post-hoc alignment to intrinsic identity-based AI development, arguing current training methods create misaligned systems that require extensive correction after the fact.

Generative AI

Rectified Noise: New Generative AI Model Architecture

Researchers propose a novel generative modeling approach using positive-incentive noise, offering an alternative to traditional diffusion and flow-based methods for synthetic content generation.

Generative AI

AI Predicts Human Behavior Using Causal Graphs

New research demonstrates how generative AI combined with causal graphs can forecast counterfactual human behavior, with implications for synthetic media creation and understanding how AI models human decision-making.

agentic-ai

5.5 Billion Tokens Later: New Benchmark for Enterprise AI Agents

Researchers propose standardized benchmark for evaluating agentic AI systems after analyzing 5.5 billion tokens across enterprise workflows, revealing critical gaps in current evaluation methods and defining metrics for real-world agent performance.

LLM Watermarking

WaterMod: New Token-Rank Method for Balanced LLM Watermarking

Researchers introduce WaterMod, a modular token-rank partitioning approach that improves LLM watermarking by maintaining probability balance across model outputs, enhancing detection while preserving text quality.

LLM

Measuring the Energy Cost of Every LLM Response

New research quantifies the energy footprint of large language model inference, revealing how prompt complexity and model size impact power consumption. Critical insights for sustainable AI deployment.

LLM

New Method Verifies AI Reasoning Steps Using Uncertainty

Researchers develop uncertainty heads to efficiently verify LLM reasoning steps, achieving 93% accuracy in detecting errors while reducing compute costs by 90% compared to existing verification methods.

AI Detection

New Computational Test Spots AI Text via Language Patterns

Researchers develop computational framework revealing systematic linguistic differences between human and AI-generated text, advancing detection methods for synthetic content authentication.

LLM

KnowThyself: Agentic Assistant for LLM Interpretability

New research introduces KnowThyself, an agentic assistant that helps researchers understand how large language models work internally through automated interpretability analysis and mechanistic understanding.

AI Security

Multi-Agent LLMs Team Up to Break AI Safety Guardrails

New research demonstrates how multiple LLMs working together can generate adaptive adversarial attacks that bypass AI safety filters. The technique uses collaborative reasoning to craft prompts that exploit model vulnerabilities more effectively than single-agent approaches.

AI Safety

New Framework Evaluates Control Protocols for AI Agents

Researchers introduce comprehensive evaluation framework for control protocols designed to manage untrusted AI agents, addressing key safety challenges as autonomous systems become more capable and potentially misaligned.

AI Security

Synthetic Data Optimizes Adversarial Attacks on AI Agents

New research demonstrates how synthetic data generation can systematically optimize adversarial attacks against AI agents, revealing critical security vulnerabilities in autonomous systems through automated testing frameworks.