Avoid dropping metrics when forceFlush() races with the periodic export#8437
Open
mahitha-ada wants to merge 1 commit into
Open
Avoid dropping metrics when forceFlush() races with the periodic export#8437mahitha-ada wants to merge 1 commit into
mahitha-ada wants to merge 1 commit into
Conversation
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8437 +/- ##
============================================
+ Coverage 90.96% 91.00% +0.04%
- Complexity 7809 7814 +5
============================================
Files 892 892
Lines 23702 23711 +9
Branches 2361 2363 +2
============================================
+ Hits 21561 21579 +18
+ Misses 1420 1411 -9
Partials 721 721 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PeriodicMetricReader.forceFlush() delegated to Scheduled.doRun(), which acquires the exportAvailable flag with compareAndSet(true, false). When a force flush happened while the periodic export (or a previous force flush) was still in progress, doRun() took the else branch, logged 'Exporter busy. Dropping metrics.' and returned a failed result -- silently dropping the metrics the caller asked to flush. Make forceFlush() wait for the in-flight export to complete and then retry, mirroring how shutdown() already uses flushInProgress to wait for an in-flight export before its final collection. doRun() is split into tryDoRun(), which returns null when an export is already in progress, and doRun(), which preserves the previous drop-and-fail behavior for the periodic schedule (it retries on the next tick). forceFlush() uses tryDoRun() and, on contention, chains off flushInProgress to retry. Adds a regression test that fails without this change: a forceFlush() racing an in-flight export now waits and exports the latest metrics instead of dropping them. Fixes open-telemetry#8433
4d50446 to
54dafb3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Avoid dropping metrics when
forceFlush()races with the periodic exportFixes #8433
What's the problem this PR is trying to solve?
PeriodicMetricReader.forceFlush()delegates toScheduled.doRun(), which acquires the export slot viaexportAvailable.compareAndSet(true, false). When a force flush happens while the periodic export (or a previous force flush) is still in progress,doRun()takes theelsebranch:So the force flush logs "Exporter busy. Dropping metrics." and returns a failed result — silently dropping the very metrics the caller asked to flush. The reporter saw this happen on a recurring basis (~0.2% of force flushes on a health-check path that flushes every few seconds), with no exception surfaced to the
forceFlush()API.What is the proposed solution?
Make
forceFlush()wait for the in-flight export to finish and then retry, rather than dropping. This mirrors the patternshutdown()already uses: it joins onflushInProgressbefore performing its final collection so it doesn't lose the last batch.Scheduled.doRun()is split into:tryDoRun()— returnsnullwhen an export is already in progress (instead of dropping).doRun()— preserves the previous drop-and-fail behavior for the periodic schedule (which simply retries on the next tick, so dropping a contended periodic collection is harmless).forceFlush()callstryDoRun(). On contention (null), it chains offflushInProgress.whenComplete(...)and retries, so the flush reflects the latest metrics.No public API changes; the periodic path behavior is unchanged.
Tests
Adds
forceFlush_whileExportInFlight_waitsAndExportsLatest, which starts a blocking export, fires a secondforceFlush()that collides with it, and asserts the second flush:This test fails against the current code (the second flush fails immediately with "Dropping metrics") and passes with the fix. All 18 existing
PeriodicMetricReaderTestcases continue to pass (19 total).