feat(iv): OperationRepo IV runtime — pre-HYDRATE deferral, FAIL_UNAUTHORIZED handler, IdentityVerificationService (3/6) by nan-li · Pull Request #2625 · OneSignal/OneSignal-Android-SDK

nan-li · 2026-04-24T19:20:58Z

Description

One Line Summary

PR 3 of 6 against the Identity Verification integration branch — wires queue-level IV behavior (pre-HYDRATE deferral, FAIL_UNAUTHORIZED handler, anonymous-op purge) on top of PR 2's Operation.externalId base-class refactor.

Details

Motivation

PR 1 landed the gate state (IdentityVerificationGates) and JWT store; PR 2 gave every operation an externalId. This PR is where IV starts to actually do something at runtime. It lands the three behavioral pieces of OperationRepo + IdentityVerificationService that are necessary before the public API can be added:

Pre-HYDRATE deferral — the SDK can't dispatch ops until it knows whether jwt_required is true or false, otherwise a Phase 2 user's first-launch ops would race past the JWT gate and 401.
FAIL_UNAUTHORIZED handling — without this, a 401 drops the op and the developer has no way to recover; the SDK needs to notify the app so it can supply a fresh JWT, then retry.
Anonymous-op purge — when IV becomes active, ops queued under the previous anonymous user can't authenticate and would otherwise block the queue indefinitely.

Scope

IOperationRepo gains two methods (both used only by internal SDK components — external callers don't see these change on the public surface):

fun setJwtInvalidatedHandler(handler: ((externalId: String) -> Unit)?) — single-handler setter, set once by PR 5's UserManager with a lambda that fires the developer-facing listener EventProducer. Intentionally not an EventProducer here: there's exactly one internal subscriber.
fun removeOperationsWithoutExternalId() — drops anon ops from both the in-memory queue and the persistent OperationModelStore so they don't resurrect on cold start.

OperationRepo:

Injects JwtTokenStore directly. We considered a narrow IJwtStore interface defined in core (implemented by user-layer JwtTokenStore) to preserve core→user layering, but found OperationRepo already imports user-layer classes directly (e.g. LoginUserOperation, NewRecordsState), and 34 core files import from user overall. The core/user split is organizational, not architectural, so adding an abstraction for this one case would be inconsistent with the codebase.
Pre-HYDRATE deferral at the top of getNextOps:
```
if (_configModelStore.model.useIdentityVerification == JwtRequirement.UNKNOWN) return null
```
Unconditional — applies even to Phase 1 non-IV users — because pre-HYDRATE we can't distinguish phases. First-launch delay is typically <2 s; useIdentityVerification is cached in ConfigModel after the first successful params fetch so subsequent launches skip this.
Outer newCodePathsRun dispatch at the getNextOps filter and the FAIL_UNAUTHORIZED executor-result branch. Ops on the old code path (feature flag off, jwt_required=false) are unchanged. Split FAIL_UNAUTHORIZED out of the combined FAIL_NORETRY/FAIL_CONFLICT branch because its handling is now conditional.

New OperationRepoIvExtensions.kt (side-by-side with OperationRepo.kt per the PR plan's Option A file layout):

hasValidJwtIfRequired(jwtTokenStore, op) — inner ivBehaviorActive guard short-circuits to true when IV behavior is off, so Phase 3 users (new code paths on, IV off) run through the dispatch site but no actual JWT gating happens. Returns false for anonymous ops when IV is active; otherwise checks jwtTokenStore.getJwt(op.externalId) != null.
handleFailUnauthorized(startingOp, ops, jwtTokenStore, handler) — same inner gate. Returns false when IV inactive or op is anon so the caller falls through to the default drop-on-fail path. When handled: invalidates the JWT, fires the handler before waking waiters (otherwise an enqueueAndWait caller could return before the handler propagates to the app), wakes waiters with false, and re-queues ops without their waiters so they can retry once a fresh JWT arrives.

Both extensions take their dependencies as parameters rather than accessing OperationRepo private fields — keeps encapsulation intact and the extensions pure-ish.

New IdentityVerificationService (DI-registered IStartableService):

Subscribes to ConfigModelStore; reacts only to HYDRATE.
Schedules the purge via suspendifyOnIO { operationRepo.awaitInitialized(); ... } — fix for an earlier race where anon-op purge ran against an empty in-memory queue on cold start because it fired before loadSavedOperations completed.
Purges anon ops only when the new useIdentityVerification is REQUIRED. Always calls forceExecuteOperations to wake the queue from the pre-HYDRATE deferral regardless of the new state.

CoreModule registers IdentityVerificationService as an IStartableService.

Two-layer gating

The memory-plan design is preserved:

Outer gate (dispatch site, OperationRepo.kt): if (IdentityVerificationGates.newCodePathsRun) extensionFunction(...). When off, old code path runs untouched.
Inner gate (extension body): if (!IdentityVerificationGates.ivBehaviorActive) return <no-op>. Lets Phase 3 users (new code paths on, IV off) run through the new dispatch without any actual IV gating applied — this is how we validate the new code paths structurally before flipping IV active for each customer.

What is NOT in this PR

Public API (OneSignal.login(externalId, jwt), updateUserJwt, JWT-invalidated listeners) — PR 4.
HTTP layer Authorization: Bearer header attachment — PR 5.
Logout IV-aware behavior + IAM integration — PR 6.
JwtTokenStore.pruneToExternalIds() invocation on cold start — deferred; not needed yet.

Testing

Unit testing

11 new tests, 789 existing tests still pass (2 pre-existing SDKInitTests failures on the integration branch are unrelated — same 2 that fail on feat/iv-foundation-01 / feat/iv-oprepo-gating-02).

New IdentityVerificationServiceTests (5):

start subscribes to ConfigModelStore.
HYDRATE with REQUIRED purges anon ops then wakes the queue (after awaitInitialized) — uses coVerifyOrder to assert the purge runs after awaitInitialized(), which is the whole point of the race fix.
HYDRATE with NOT_REQUIRED wakes the queue but does NOT purge.
HYDRATE with UNKNOWN wakes the queue (no purge).
non-HYDRATE model replacement is ignored.

New OperationRepoTests (6):

Pre-HYDRATE deferral blocks execution (queue retains op; executor not called).
Flipping useIdentityVerification to NOT_REQUIRED and calling forceExecuteOperations unblocks the queue.
removeOperationsWithoutExternalId filters the queue AND removes the persisted ops from the store.
setJwtInvalidatedHandler round-trip via internal accessor.
FAIL_UNAUTHORIZED with IV active: invalidates JWT, fires handler, re-queues ops (not dropped), wakes enqueueAndWait caller with false.
FAIL_UNAUTHORIZED with IV inactive: handler NOT fired, default drop-on-fail path runs (op removed from store).

Each IV test scopes its IdentityVerificationGates mutations in try/finally to reset the singleton; Mocks() sets configModel.useIdentityVerification = JwtRequirement.NOT_REQUIRED by default so the pre-HYDRATE deferral doesn't hold up tests that aren't exercising IV state.

Manual testing

Not applicable for this PR — no runtime behavior change for Phase 1 users (feature flag off, jwt_required eventually false) once HYDRATE completes; IV-specific behavior is gated off pending the rest of the rollout series. The pre-HYDRATE deferral is covered by unit tests.

Affected code checklist

Notifications
Outcomes
Sessions
In-App Messaging
REST API requests (ops can now be held/re-queued/dropped based on IV state; no change to request shapes)
Public API changes

Checklist

Overview

I have filled out all REQUIRED sections above
PR does one thing — wires queue-level IV behavior onto the refactor from PR 2
Any Public API changes are explained in the PR details (none in this PR)

Testing

I have included test coverage for these changes
All automated tests pass locally (pre-existing SDKInitTests failures are unrelated — same 2 on integration branch)
Manual testing N/A (IV-specific behavior is gated off pending PR 4+ wiring)

Final pass

Code is as readable as possible.
I have reviewed this PR myself, ensuring it meets each checklist item

…HORIZED handler, IdentityVerificationService PR 3 of 6 against the Identity Verification integration branch — wires queue- level IV behavior on top of PR 2's `Operation.externalId` refactor. Interface additions on `IOperationRepo`: - `setJwtInvalidatedHandler((externalId: String) -> Unit)?)` — single-handler setter; PR 5's UserManager will wire in a lambda that fires the public listener EventProducer. - `removeOperationsWithoutExternalId()` — drops anon ops from both in-memory queue and the persistent store, so they don't resurrect on cold start. `OperationRepo`: - Injects `JwtTokenStore` (no cross-package-interface abstraction since core already imports user-layer classes elsewhere — 34 existing imports make the layering boundary organizational, not architectural). - Pre-HYDRATE deferral at the top of `getNextOps`: when `useIdentityVerification == JwtRequirement.UNKNOWN`, return null so the loop waits. Applies unconditionally because we can't tell Phase 1 / 2 / 3 apart pre-HYDRATE; brief first-launch delay for non-IV users is tolerable (typically <2s, cached for subsequent launches). - Outer `newCodePathsRun` dispatch at `getNextOps` filter and the FAIL_UNAUTHORIZED executor-result branch. IV-specific logic lives in a new `OperationRepoIvExtensions.kt`. `OperationRepoIvExtensions.kt` (side-by-side with OperationRepo per the PR plan): - `hasValidJwtIfRequired(jwtTokenStore, op)`: inner `ivBehaviorActive` check short-circuits to `true` when IV is inactive (Phase 3 runs the new paths without actual JWT gating). Otherwise checks `requiresJwt`, then presence of a JWT for the op's `externalId`. - `handleFailUnauthorized(startingOp, ops, jwtTokenStore, handler)`: when IV is active and the op has an externalId, invalidates the JWT, fires the handler, wakes waiters with `false`, and re-queues ops without their waiters (so they can retry once a fresh JWT arrives). Returns `false` when IV inactive or op is anon, so the base class falls through to the default drop-on-fail path. Handler fires BEFORE waiters wake so `enqueueAndWait` callers can't return before the handler propagates. New class `IdentityVerificationService`: - Subscribes to `ConfigModelStore`; reacts only to HYDRATE. - Schedules the purge via `suspendifyOnIO` + `awaitInitialized()` — fix for an earlier race where anon-op purge ran against an empty in-memory queue on cold start because it fired before `loadSavedOperations` completed. - When the new state is `JwtRequirement.REQUIRED`, purges anon ops. Always calls `forceExecuteOperations` to wake the queue from the pre-HYDRATE deferral. Tests (+11): - 5 `IdentityVerificationServiceTests`: subscribe behavior, HYDRATE with each requirement (REQUIRED purges + wakes, NOT_REQUIRED wakes only, UNKNOWN wakes only), non-HYDRATE tag ignored. `coVerifyOrder` asserts purge runs after `awaitInitialized()` — the race-fix invariant. - 6 new OperationRepoTests: pre-HYDRATE deferral (op not executed, queue preserved); flip-to-NOT_REQUIRED unblocks; `removeOperationsWithoutExternalId` filters queue and persists removal; `setJwtInvalidatedHandler` round-trip; FAIL_UNAUTHORIZED with IV active invalidates JWT + fires handler + re-queues; FAIL_UNAUTHORIZED with IV inactive falls through to default drop. Existing 791 tests still pass (2 pre-existing `SDKInitTests` failures on the integration branch remain; unrelated).

… UNKNOWN Bot caught a regression: deferral keyed on `useIdentityVerification == UNKNOWN` permanently stalls the queue whenever the backend response omits `require_ident_auth` (older deployments, dev/test environments, partial rollouts — the `TODO: New` comment in ParamsBackendService explicitly flags this field as newly-added). HYDRATE flips `isInitializedWithRemote` to true but leaves `useIdentityVerification` at UNKNOWN; `getNextOps` re-defers on every wake and ops never dispatch. Switch the gate to `!isInitializedWithRemote` — set unconditionally by `ConfigModelStoreListener` on any successful HYDRATE, so it correctly signals "HYDRATE has completed" regardless of which keys the backend returned. Post- HYDRATE with a silent backend, `useIdentityVerification` stays UNKNOWN and `ivBehaviorActive` is false, so ops flow through the old code path normally (no JWT gating applied, matching Phase 1 behavior). Tests: - Updated both existing pre-HYDRATE tests to flip `isInitializedWithRemote`. - New regression-guard test: post-HYDRATE with backend silent on the field (isInitializedWithRemote=true, useIdentityVerification=UNKNOWN) dispatches ops instead of deadlocking. - `MockHelper.configModelStore` defaults (in the core-module test helpers) now also set `isInitializedWithRemote=true` so tests that don't exercise pre-HYDRATE aren't accidentally held up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nan-li · 2026-04-24T20:34:22Z

@claude review

claude

My prior concern about the pre-HYDRATE deferral deadlocking when the backend omits require_ident_auth is addressed — the gate is now on isInitializedWithRemote and there's a regression test for the backend-silent case. No new bugs found on this pass, but this PR is complex and touches identity-verification/auth-sensitive queue runtime across several files, so a human review is still warranted.

Extended reasoning...

Overview

PR 3 of 6 in the Identity Verification rollout, wiring queue-level IV behavior onto OperationRepo. Changes span: (1) a pre-HYDRATE deferral at the top of getNextOps so the queue waits until remote params are fetched before dispatching ops, (2) a new FAIL_UNAUTHORIZED handler path that invalidates the JWT, fires a developer-facing handler, and re-queues ops (when IV is active), (3) removeOperationsWithoutExternalId to purge anon ops from both the in-memory queue and the persistent OperationModelStore, (4) a new IdentityVerificationService IStartableService that subscribes to ConfigModelStore HYDRATE events and schedules purge + wake via suspendifyOnIO + awaitInitialized to fix a cold-start race, and (5) outer/inner two-layer gating on IdentityVerificationGates.newCodePathsRun and ivBehaviorActive. 11 new unit tests plus a regression test for the backend-silent deadlock case I previously flagged.

Security risks

This PR is squarely in auth-sensitive territory: it changes when and how operations are dispatched based on JWT/IV state, adds a handler for 401 Unauthorized responses, and introduces anonymous-op purging. The handler-fires-before-waiter-wake ordering in handleFailUnauthorized is load-bearing for correctness of enqueueAndWait callers. The two-layer gating (newCodePathsRun outer + ivBehaviorActive inner) is a risk-management mechanism for staged rollout — if either gate is miscomputed, Phase 1/3 users could see IV behavior they shouldn't. The pre-HYDRATE deferral is now correctly gated on isInitializedWithRemote rather than the IV param itself, so no deadlock on silent-backend responses.

Level of scrutiny

High. This touches the core operation dispatch path that every network-bound SDK action flows through (push subscription creates, tag updates, logins, outcomes, etc.). Regressions here could silently stall the entire queue for an app. The IV-specific logic is also directly on the auth/identity path. My prior bug report was the kind of issue that would have slipped past the existing test suite (an UNKNOWN test existed but only asserted forceExecuteOperations was called, not that any op actually dispatched). The new post-HYDRATE with backend silent on require_ident_auth test plus the semantic shift to isInitializedWithRemote closes that hole properly.

Other factors

The PR description is thorough and accurate — it correctly calls out the rationale for taking JwtTokenStore directly rather than abstracting behind an IJwtStore interface, and it explains the layering/Phase matrix clearly. Test coverage is strong for the new paths (6 OperationRepoTests, 5 IdentityVerificationServiceTests). The two pre-existing SDKInitTests failures noted as unrelated have been consistently failing on the integration branch across prior PRs. A code owner (nan-li) is the one who pinged me to re-review, signaling they're already engaged. Given the scope, complexity, cross-PR coupling, and the security-sensitive nature of the surface area, this is exactly the kind of change that benefits from a human code owner's review rather than bot approval.

claude Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread ...DK/onesignal/core/src/main/java/com/onesignal/core/internal/operations/impl/OperationRepo.kt

nan-li force-pushed the feat/iv-oprepo-gating-02 branch from 0caf0f3 to aec3996 Compare April 24, 2026 19:55

nan-li force-pushed the feat/iv-queue-runtime-03 branch from 9f21c64 to 89c5d54 Compare April 24, 2026 20:00

claude Bot reviewed Apr 24, 2026

View reviewed changes

nan-li mentioned this pull request Apr 24, 2026

feat(iv): attach JWT in HTTP client and operation executors (4/6) #2626

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iv): OperationRepo IV runtime — pre-HYDRATE deferral, FAIL_UNAUTHORIZED handler, IdentityVerificationService (3/6)#2625

feat(iv): OperationRepo IV runtime — pre-HYDRATE deferral, FAIL_UNAUTHORIZED handler, IdentityVerificationService (3/6)#2625
nan-li wants to merge 2 commits intofeat/iv-oprepo-gating-02from
feat/iv-queue-runtime-03

nan-li commented Apr 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

nan-li commented Apr 24, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nan-li commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

One Line Summary

Details

Motivation

Scope

Two-layer gating

What is NOT in this PR

Testing

Unit testing

Manual testing

Affected code checklist

Checklist

Overview

Testing

Final pass

Uh oh!

Uh oh!

nan-li commented Apr 24, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nan-li commented Apr 24, 2026 •

edited

Loading