LLM Quantization

FLRQ: Faster LLM Quantization via Low-Rank Matrix Sketching

New quantization method FLRQ achieves up to 2.5x faster compression of large language models while maintaining accuracy through flexible low-rank matrix approximation techniques.