mechanistic interpretability
Mechanistic Tracing Reveals How LLMs Navigate Pain-Pleasure Decis
New research goes beyond behavioral analysis to trace the internal mechanisms LLMs use when weighing competing reward signals, offering insights into AI decision-making at the circuit level.