mechanistic interpretability
SALVE: New Technique Enables Mechanistic Control of Neural Networ
Researchers introduce SALVE, combining sparse autoencoders with latent vector editing for precise mechanistic control over neural network behaviors and outputs.