Fix flaky normalize_distribution test (float32 tolerance)#228
Merged
Conversation
The test compares normalize_distribution output (float32, from the TF pose body) against a numpy reference (float64) using np.allclose with its default atol=1e-8. On near-zero normalized values, float32 rounding (~1e-7..1e-5) exceeds that tolerance, so the test flaked ~2% of runs depending on the random data. Reproduced at ~6/300 runs; root cause is purely the dtype mismatch (max abs diff ~1e-5, never a logic error). Use atol=1e-4 — well above float32 noise, still orders of magnitude tighter than any real regression in z-score output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
pose_tensorflow_test.py::test_pose_tf_posebody_normalize_distribution_eager_mode_correct_resultflakes ~2% of runs (it's what failed on #226's PR CI).It compares
normalize_distributionoutput — float32, from the TensorFlow pose body — against a NumPy reference computed in float64, usingnp.allclosewith its defaultatol=1e-8. On near-zero normalized values, float32 rounding error (~1e-7…1e-5) exceeds that tolerance and the assertion fails, depending on the random mask/data drawn that run.Investigation
Reproduced locally at 6/300 runs. Captured a failing case: max absolute difference was ~1e-5, on an expected value of ~0.0018 — never a logic error, purely dtype precision.
actdtypefloat32,expdtypefloat64.Tolerance sweep over 5000 trials each:
Fix
Use
atol=1e-4— comfortably above float32 noise, yet still 4 orders of magnitude tighter than any real regression (the output is z-scores, O(1)). 30× reruns of the test: 0 failures.🤖 Generated with Claude Code