LLM Research
Sketch-and-Walk: New Sparse Attention Method Speeds Up LLM Infere
Researchers propose a two-phase sparse attention mechanism that scouts relevant tokens before full computation, promising significant efficiency gains for large language model inference.