Liquid AI
Liquid AI's LFM2.5-350M: Big Performance, Tiny Model
Liquid AI releases a 350M parameter model trained on 28 trillion tokens with scaled reinforcement learning, challenging assumptions about what compact models can achieve.
Liquid AI
Liquid AI releases a 350M parameter model trained on 28 trillion tokens with scaled reinforcement learning, challenging assumptions about what compact models can achieve.
NVIDIA
NVIDIA releases compact 4B parameter model combining Mamba and Transformer architectures for efficient local AI inference with 8K context support.
LLM optimization
A new library called AirLLM enables running massive 70B parameter AI models on old laptops with limited RAM by processing layers sequentially rather than loading entire models into memory.
LLM optimization
New research introduces quantized KV cache persistence for running multi-agent LLM systems on resource-constrained edge hardware, enabling local AI agents without cloud dependency.
AI Hardware
Startup Taalas is challenging GPU dominance with hardwired AI chips designed specifically for inference, claiming 17,000 tokens per second throughput for ubiquitous AI deployment.
Edge AI
New research combines sensitivity-aware quantization and pruning to enable ultra-low-latency AI inference on edge devices, potentially transforming how generative models deploy on mobile hardware.