chore(auth): Address remaining Regional Access Boundary feedback#12867
chore(auth): Address remaining Regional Access Boundary feedback#12867vverman wants to merge 4 commits intogoogleapis:regional-access-boundariesfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the RegionalAccessBoundary and RegionalAccessBoundaryManager classes to improve resource management and testability. Key changes include replacing environment-variable-based feature toggling with a ThreadLocal mechanism for tests, migrating from manual thread creation to a bounded ExecutorService for asynchronous refresh tasks to prevent resource exhaustion, and ensuring HTTP responses are properly disconnected in a finally block. Additionally, the test suite has been migrated to JUnit 5, and several tests were cleaned up by removing deprecated environment provider mocks. I have no feedback to provide.
| // Unbounded thread creation is discouraged in library code to avoid resource | ||
| // exhaustion. A shared, bounded executor service ensures a hard limit (5) | ||
| // on concurrent refresh tasks, while threadCount provides unique names | ||
| // for easier debugging. | ||
| private static final AtomicInteger threadCount = new AtomicInteger(0); | ||
| private static final ExecutorService EXECUTOR = | ||
| Executors.newFixedThreadPool( | ||
| 5, | ||
| r -> { | ||
| Thread t = new Thread(r, "RAB-refresh-" + threadCount.getAndIncrement()); | ||
| t.setDaemon(true); | ||
| return t; | ||
| }); |
There was a problem hiding this comment.
The tradeoff we are making is between:
- Executor - Possible memory leak where the executor and the threads cannot be released as the auth library does not have a lifecycle.
- Thread - Possible unbounded threads created (note: we aim to limit the amount of threads spawning per credential, but it may be possible to spawn an unbounded number of credentials).
I don't think there will be perfect solution. For a middle ground, is it possible to add allowCoreThreadTimeOut() and set a keep-alive value of something like ~1ish hour(s)? Single credentials would end up just invoking new Thread() in the pool as the RAB call should be idle for the other ~5 hours. If there are multiple credentials that need RAB calls, the executor should be able to use the existing threads as they'll no longer be marked as idle. WDYT?
There was a problem hiding this comment.
I think this is a good idea to manage memory usage, especially in situations where the application is redeployed.
In the present implementation, since we don't shutdown the threads within the pool, they would not be garbage collected and the reference would live on while the server doesn't use the old application.
With this thread shutdown suggestion, we avoid that pitfall.
I agree that the threads which are idle for an hour can be timed-out and removed from the pool.
|
@vverman The sdk-platform-java-ci CI issues are not relevant for this PR (existing issues as part of the monorepo migration). PTAL at the conventialcommit CI complaints: https://github.com/googleapis/google-cloud-java/pull/12867/checks?check_run_id=72739487063 |
|
Conventional commits addressed. |
| private static final ThreadLocal<Boolean> DISABLE_RAB_FOR_TESTS = | ||
| ThreadLocal.withInitial(() -> false); | ||
|
|
||
| @VisibleForTesting | ||
| static void disableForTests() { | ||
| DISABLE_RAB_FOR_TESTS.set(true); | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| static void enableForTests() { | ||
| DISABLE_RAB_FOR_TESTS.set(false); | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| static void resetForTests() { | ||
| DISABLE_RAB_FOR_TESTS.remove(); | ||
| } |
There was a problem hiding this comment.
can you explain this part a bit more? I'm not sure I'm following why this is needed
There was a problem hiding this comment.
With this PR, the env variable gate on RAB refresh is removed. Which means on each request call, the RAB refresh will be triggered.
This causes all the tests which test the request headers flow in the auth library to trigger RAB refresh.
I did spend some time trying to fix this but it is complicated for the following reasons:
-
Tests that verify the header flow (such as getDefaultCredentials_compute_providesToken or tests calling the testUserProvidesToken helper) explicitly call getRequestMetadata to check the bearer token. Because getRequestMetadata now triggers the async RAB refresh on every call, these tests will spawn background requests that interfere with assertions on mock transports.
-
Without this isolation, a background thread started in one test could leak into subsequent tests and interfere with their mocks, leading to flaky tests in CI
Hence we have an injectable disableRABForTests which ensures the tests for the non-RAB flow don't trigger the RAB refresh.
There was a problem hiding this comment.
But nothing calls disableForTests(), enableForTests(), or resetForTests()?
Also, since this is only disables RAB on the thread that calls it. That may not cover async getRequestMetadata(...) tests where the callback runs on an executor thread.
lqiu96
left a comment
There was a problem hiding this comment.
Changes LGTM overall. Added a few comments if you could take a look. I'm not entirely sure what the DISABLE_RAB_TESTS_ENV is for
| } | ||
|
|
||
| @VisibleForTesting | ||
| static void setEnvironmentProviderForTest(@Nullable EnvironmentProvider provider) { |
There was a problem hiding this comment.
This is called in LoggingTest.java which also needs to be updated.
| 5, // maximumPoolSize: max threads allowed | ||
| 1, // keepAliveTime: time to wait before terminating idle threads | ||
| TimeUnit.HOURS, // unit for keepAliveTime | ||
| new LinkedBlockingQueue<>(), // work queue |
There was a problem hiding this comment.
LinkedBlockingQueue<>() is unbounded, so a process with many credentials could still build up unlimited pending refreshes.
Can we use a bounded queue here?
| private static final ThreadLocal<Boolean> DISABLE_RAB_FOR_TESTS = | ||
| ThreadLocal.withInitial(() -> false); | ||
|
|
||
| @VisibleForTesting | ||
| static void disableForTests() { | ||
| DISABLE_RAB_FOR_TESTS.set(true); | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| static void enableForTests() { | ||
| DISABLE_RAB_FOR_TESTS.set(false); | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| static void resetForTests() { | ||
| DISABLE_RAB_FOR_TESTS.remove(); | ||
| } |
There was a problem hiding this comment.
But nothing calls disableForTests(), enableForTests(), or resetForTests()?
Also, since this is only disables RAB on the thread that calls it. That may not cover async getRequestMetadata(...) tests where the callback runs on an executor thread.
| } | ||
| String lowercased = enabled.toLowerCase(); | ||
| return "true".equals(lowercased) || "1".equals(enabled); | ||
| return !DISABLE_RAB_FOR_TESTS.get(); |
There was a problem hiding this comment.
Why was this behavior changed to remove the gate? Are you planning on holding this PR until we're ready to launch?
There was a problem hiding this comment.
This PR is to a feature branch so it won't be released till we're ready to launch.
The RAB refresh uses a direct executor with a fixed thread pool as opposed to instantiating a new thread each time.
The RAB env gate -> GOOGLE_AUTH_TRUST_BOUNDARY_ENABLE_EXPERIMENT has been removed. This means RAB refresh triggers by default.
Added other fixes/suggestions made in the previous Java PR.