test: Fix flaky BasicAuthMSQTest#19593
Open
amaechler wants to merge 7 commits into
Open
Conversation
FrankChen021
left a comment
Member
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
Reviewed 1 of 1 changed files.
This is an automated review by Codex GPT-5.5
Basic-auth state propagates from the Coordinator to other services asynchronously: an async push from the Coordinator plus a poll on each service every druid.auth.basic.common.pollingPeriod. The Broker never reads the security metadata store directly, so there is a window after a security API call during which its auth cache is stale. BasicAuthMSQTest creates the test user/role in @beforeeach and grants permissions in each test body, then immediately submits an MSQ task to a Broker. When the request beats the propagation, the test sees: - 401 Unauthorized instead of the expected 403 Forbidden, when the newly created user has not yet propagated (authentication), and - a transient 403 Forbidden in the positive tests, before the granted permission has propagated (authorization). Retry the task submission while it fails with these transient auth errors so the assertions only run once the Broker's auth cache reflects the test setup. Other failures are not retried, so real errors still fail fast.
Replace the hand-rolled messageContains cause-chain walk with the ExceptionMatcher already used for the assertion, so the retry predicate and the assertion inspect the exception through the same mechanism.
a78fbab to
dfd90f3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
BasicAuthMSQTestis intermittently flaky: a test occasionally fails with401 Unauthorizedinstead of the expected403 Forbidden.The permission updates in the tests are eventually consistent, but propagate to other services (like the broker) asynchronously, so the MSQ task in the test can reach the Broker before its auth cache has caught up.
Fix
Retry the task submission while it fails with a transient auth errors, so the assertions only run once the Broker's auth cache reflects the test setup. Other failures are not retried, so genuine errors still fail fast. This follows the retry-on-propagation pattern already used by sibling tests (e.g.
TLSTest).Verified by compiling, running checkstyle, and running the test; a fault-injection run that forces a transient
401then403confirms the retries fire and all four tests recover.Analysis and implementation done with the help of Claude Code.
This PR has: