LLM compression

Hierarchical Sparse Plus Low Rank: A New Approach to LLM Compress

New research introduces hierarchical sparse plus low rank compression for LLMs, combining structured sparsity with matrix decomposition for efficient model deployment.

LLM compression

Compressing 7B Parameter LLMs to 4.5GB: A Technical Guide

Learn how to reduce a 7 billion parameter language model from ~14GB to 4.5GB using quantization, pruning, and knowledge distillation while maintaining accuracy.