LLM Inference
EAGLE 3.1 Fixes Attention Drift in LLM Speculative Decoding
EAGLE 3.1 introduces a refined speculative decoding algorithm that addresses attention drift in draft models, boosting LLM inference throughput without sacrificing output fidelity.