LLM Quantization
FLRQ: Faster LLM Quantization via Low-Rank Matrix Sketching
New quantization method FLRQ achieves up to 2.5x faster compression of large language models while maintaining accuracy through flexible low-rank matrix approximation techniques.