LLM optimization

prompt engineering

GFlowPO: Using Flow Networks to Automatically Optimize AI Prompts

New research applies Generative Flow Networks to automatic prompt optimization, offering a novel approach to improving AI system outputs through learned prompt engineering strategies.

LLM optimization

How Quantization and Batching Cut LLM Energy Costs

New research explores how quantization, batching strategies, and serving optimizations dramatically reduce LLM energy consumption while maintaining performance.

LLM optimization

LLM Agent Automates Hardware-Aware Model Quantization

New research introduces an LLM-based agent that automatically selects optimal quantization strategies for deploying large language models across diverse hardware platforms.

LLM optimization

Engineering Efficient LLM Inference: Memory & Math Guide

Deep dive into the engineering fundamentals behind efficient large language model inference, exploring memory optimization, mathematical principles, and performance metrics that power modern generative AI systems.