AI Hardware
DABench-LLM: New Framework Benchmarks Post-Moore AI Accelerators
Researchers introduce DABench-LLM, a standardized framework for evaluating dataflow AI accelerators designed for large language model inference in the post-Moore era.
AI Hardware
Researchers introduce DABench-LLM, a standardized framework for evaluating dataflow AI accelerators designed for large language model inference in the post-Moore era.
LLM Inference
New research introduces DART, a speculative decoding method that borrows denoising concepts from diffusion models to dramatically accelerate large language model inference without sacrificing output quality.
LLM Inference
New research introduces Yggdrasil, a tree-based speculative decoding architecture that bridges dynamic speculation with static runtime for faster LLM inference.
LLM Inference
A deep dive into LLM inference server architecture reveals the critical optimizations enabling real-time AI applications, from batching strategies to memory management techniques.
LLM Inference
Deep dive into the Key-Value cache mechanism that enables fast language model inference, exploring memory optimization strategies and architectural decisions that power modern AI systems including video generation models.