DUET — ICLR 2026 Accepted

Oct 2025 · 1 min read

DUET introduces an efficiently contextualized teacher that demonstrates refusals on unsafe knowledge and distills the behavior to a student model using Top-K logit alignment instead of full-vocabulary KL. The approach:

  • Achieves precise forgetting with minimal utility loss on MUSE, TOFU, and WMDP.
  • Uses only 2,233 tokens of training data (~1/645 of the corpus) yet improves ROUGE-Retain/MMLU by 10% while lowering leakage by 4%.
  • Remains robust when facing reverse prompts or QA→continuation format shifts, keeping leakage near 6–7 even when the teacher spikes to 37+.

Role: first author. I designed the unlearning objective, scaling recipe, and evaluation harness.