1 comments

  • dapurv5 12 hours ago
    We've analyzed how popular watermarking methods (KGW, Gumbel) affect language model alignment—revealing critical tradeoffs impacting truthfulness, safety, and helpfulness. We propose "Alignment Resampling," a simple method to mitigate these alignment degradations, with theoretical insights and empirical results.

    Paper: https://huggingface.co/papers/2506.04462

    Feedback appreciated!