Skip to content

Fix flaky normalize_distribution test (float32 tolerance)#228

Merged
AmitMY merged 1 commit into
masterfrom
fix-flaky-normalize-distribution
Jun 20, 2026
Merged

Fix flaky normalize_distribution test (float32 tolerance)#228
AmitMY merged 1 commit into
masterfrom
fix-flaky-normalize-distribution

Conversation

@AmitMY

@AmitMY AmitMY commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Problem

pose_tensorflow_test.py::test_pose_tf_posebody_normalize_distribution_eager_mode_correct_result flakes ~2% of runs (it's what failed on #226's PR CI).

It compares normalize_distribution output — float32, from the TensorFlow pose body — against a NumPy reference computed in float64, using np.allclose with its default atol=1e-8. On near-zero normalized values, float32 rounding error (~1e-7…1e-5) exceeds that tolerance and the assertion fails, depending on the random mask/data drawn that run.

Investigation

Reproduced locally at 6/300 runs. Captured a failing case: max absolute difference was ~1e-5, on an expected value of ~0.0018 — never a logic error, purely dtype precision. act dtype float32, exp dtype float64.

Tolerance sweep over 5000 trials each:

atol fails
default (1e-8) flaky
1e-6 4/5000
5e-6 0/5000
1e-5 0/5000 (worst diff spiked ~1.1e-5 in another sweep)

Fix

Use atol=1e-4 — comfortably above float32 noise, yet still 4 orders of magnitude tighter than any real regression (the output is z-scores, O(1)). 30× reruns of the test: 0 failures.

🤖 Generated with Claude Code

The test compares normalize_distribution output (float32, from the TF pose
body) against a numpy reference (float64) using np.allclose with its default
atol=1e-8. On near-zero normalized values, float32 rounding (~1e-7..1e-5)
exceeds that tolerance, so the test flaked ~2% of runs depending on the
random data.

Reproduced at ~6/300 runs; root cause is purely the dtype mismatch (max abs
diff ~1e-5, never a logic error). Use atol=1e-4 — well above float32 noise,
still orders of magnitude tighter than any real regression in z-score output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AmitMY AmitMY merged commit 980a16e into master Jun 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant