Edge AI - SkrewAI

Liquid AI

Liquid AI's LFM2.5-350M: Big Performance, Tiny Model

Liquid AI releases a 350M parameter model trained on 28 trillion tokens with scaled reinforcement learning, challenging assumptions about what compact models can achieve.

NVIDIA

NVIDIA Nemotron 3 Nano 4B: Hybrid Architecture for Edge AI

NVIDIA releases compact 4B parameter model combining Mamba and Transformer architectures for efficient local AI inference with 8K context support.

LLM optimization

AirLLM: Running 70B Parameter Models on Consumer Laptops

A new library called AirLLM enables running massive 70B parameter AI models on old laptops with limited RAM by processing layers sequentially rather than loading entire models into memory.

LLM optimization

Persistent Q4 KV Cache Enables Multi-Agent LLM on Edge

New research introduces quantized KV cache persistence for running multi-agent LLM systems on resource-constrained edge hardware, enabling local AI agents without cloud dependency.

AI Hardware

Taalas Hardwired AI Chips Hit 17K Tokens Per Second

Startup Taalas is challenging GPU dominance with hardwired AI chips designed specifically for inference, claiming 17,000 tokens per second throughput for ubiquitous AI deployment.

Edge AI

HQP: Hybrid Quantization-Pruning for Edge AI Inference

New research combines sensitivity-aware quantization and pruning to enable ultra-low-latency AI inference on edge devices, potentially transforming how generative models deploy on mobile hardware.