Machine Learning - SkrewAI (Page 7)

Agentic AI

How to Test and Measure Agentic AI System Performance

A comprehensive guide to evaluating AI agents covering benchmarks, testing frameworks, and metrics for measuring autonomous system performance in real-world applications.

LLM evaluation

New Rubric Generation Method Improves LLM Judge Accuracy

Researchers propose rethinking how evaluation rubrics are generated for LLM judges and reward models, addressing critical challenges in assessing open-ended AI outputs.

LLM Research

New Method Internalizes LLM Reasoning Through Latent Actions

Researchers propose a novel approach to improve LLM reasoning by discovering and replaying latent actions, potentially reducing inference costs while maintaining reasoning quality.

AI Security

MultiKrum: Defending Distributed AI Training from Byzantine Attac

New research on MultiKrum explores optimal robustness definitions for Byzantine machine learning, critical for securing distributed AI training against adversarial participants.

AI research

Research Questions Exponential AI Growth: A Competing Hypothesis

New arXiv research challenges the widely held belief that AI capabilities grow exponentially, presenting alternative mathematical models that could reshape how we predict and plan for AI advancement.

AI Agents

Seven-Dimensional Taxonomy Proposed for Healthcare AI Agents

New research proposes a comprehensive framework for empirically evaluating LLM-based agentic AI systems in healthcare, establishing seven key dimensions for systematic assessment.

LLM Agents

Agent-Omit: Teaching LLMs to Think More Efficiently

New research introduces Agent-Omit, a reinforcement learning framework that trains LLM agents to selectively omit unnecessary reasoning steps and observations, dramatically improving computational efficiency.

LLM Research

Knowledge Model Prompting Boosts LLM Planning Performance

New research introduces Knowledge Model Prompting, a technique that enhances LLM reasoning on complex planning tasks by structuring domain knowledge representation.

LLM Agents

AgentArk: Distilling Multi-Agent Systems Into Single LLMs

New research introduces AgentArk, a framework that transfers multi-agent intelligence into single LLM agents, potentially revolutionizing how complex AI systems are deployed efficiently.

LLM Research

Accordion-Thinking: A New Method for Efficient LLM Reasoning

New research introduces Accordion-Thinking, a self-regulated approach that compresses reasoning steps dynamically to improve LLM efficiency while maintaining readable chain-of-thought outputs.

LLM Efficiency

Dynamic Mix Precision Routing Optimizes Multi-Step LLM Efficiency

New research proposes dynamic precision routing to optimize computational resources across multi-step LLM interactions, balancing quality and efficiency through adaptive quantization strategies.

AI Agents

MARS: A New Modular Agent Architecture for AI Research Automation

New research introduces MARS, a modular agent with reflective search capabilities designed to automate AI research tasks through intelligent decomposition and self-correction.