LLM Alignment
Study Questions Role of Diversity in LLM Moral Alignment
New research examines whether diversity in training data actually improves moral reasoning in LLMs when using RLVR methods, challenging assumptions about alignment approaches.
LLM Alignment
New research examines whether diversity in training data actually improves moral reasoning in LLMs when using RLVR methods, challenging assumptions about alignment approaches.
LLM Alignment
Researchers introduce GRADE, a technique that replaces traditional policy gradient methods with direct backpropagation for aligning large language models, potentially offering more efficient training.
LLM Alignment
Researchers introduce ECLIPTICA, a framework using Contrastive Instruction-Tuned Alignment (CITA) to enable dynamic switching between aligned and unaligned LLM behaviors for safety research.