Skip to content

Apply KL-divergence epsilon floor symmetrically to p and q#184

Merged
MaxGhenis merged 1 commit intomainfrom
fix/kl-epsilon-symmetric
Apr 17, 2026
Merged

Apply KL-divergence epsilon floor symmetrically to p and q#184
MaxGhenis merged 1 commit intomainfrom
fix/kl-epsilon-symmetric

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Fixes finding #12. kl_divergence previously clipped only q_receiver by epsilon=1e-10, so a category present in p but absent in q contributed p * log(p/eps) (a large finite value that depends arbitrarily on epsilon), while a category present in q but absent in p contributed rel_entr(0, q) = 0 — a free pass. The sign of the divergence bias depended entirely on which side had the missing mass.

Now both p_donor and q_receiver get the same epsilon floor and are renormalised back to probability vectors.

Test plan

  • test_kl_divergence_symmetric_epsilon_handling: a category absent from the donor but present in the receiver contributes positively to KL and the result is finite.
  • Full tests/test_metrics.py passes (53/53)

Previously kl_divergence clipped only q_receiver by epsilon=1e-10:
    q_receiver = np.maximum(q_receiver, epsilon)
    return np.sum(rel_entr(p_donor, q_receiver))

This was asymmetric: a category present in p but absent in q
contributed p * log(p / eps) (a large finite value that depends
arbitrarily on epsilon), while a category present in q but absent in p
contributed rel_entr(0, q) = 0 (a free pass). The sign of the
divergence bias depended entirely on which side had the missing mass.

Now apply the epsilon floor to BOTH p_donor and q_receiver, then
renormalise so they are still valid probability vectors. KL behaves
consistently regardless of which side has a missing category.

Tests
- test_kl_divergence_symmetric_epsilon_handling: a category absent
  from the donor but present in the receiver contributes positively
  to KL and the result is finite.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microimpute-dashboard Ready Ready Preview, Comment Apr 17, 2026 0:55am

Copy link
Copy Markdown
Contributor Author

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KL symmetric epsilon verified:

  • Both p_donor and q_receiver are floored with epsilon=1e-10 and renormalised back to probability vectors.
  • test_kl_divergence_symmetric_epsilon_handling asserts strictly positive and finite KL when donor lacks a category the receiver has (previously this was a free pass: rel_entr(0, q) = 0).

Trivial, correct. CI all green. Mergeable. LGTM.

@MaxGhenis MaxGhenis merged commit 079403d into main Apr 17, 2026
7 checks passed
@MaxGhenis MaxGhenis deleted the fix/kl-epsilon-symmetric branch April 17, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant