edge AI
HQP: Hybrid Quantization-Pruning for Edge AI Inference
New research combines sensitivity-aware quantization and pruning to enable ultra-low-latency AI inference on edge devices, potentially transforming how generative models deploy on mobile hardware.