Conversation
…se 3.D PR 5b-1) Foundation for PR 5b's Send/Receive partition fanout. Adds sqsMsg*KeyDispatch wrappers that route to the legacy or partitioned constructor based on meta.PartitionCount. Pure scaffold: helpers are added but no production call site uses them yet — those switch in PR 5b-2. What's added adapter/sqs_keys_dispatch.go (new file): - sqsMsgDataKeyDispatch - sqsMsgVisKeyDispatch - sqsMsgDedupKeyDispatch - sqsMsgGroupKeyDispatch - sqsMsgByAgeKeyDispatch - sqsMsgVisPrefixForQueueDispatch - effectivePartitionCount helper for fanout iteration Each dispatch helper takes (meta, queueName, partition, gen, …) and routes to the legacy single-partition constructor when meta.PartitionCount <= 1 (existing behaviour, byte-identical output) or to the partitioned constructor when > 1 (PR #715's keyspace). Why a separate scaffold PR PR 5b's full scope (Send/Receive fanout + receipt-handle v2 wiring + dormancy gate-lift + CreateQueue capability check) is ~1000 LOC across multiple files. The lessons-learned discipline from the recent rounds (PR #723 missed the raftadmin stub because the audit was function-scoped, not interface-scoped) favours smaller diffs. Splitting PR 5b into three: - PR 5b-1 (this PR): dispatch helpers + tests, zero behaviour change. - PR 5b-2: SendMessage / ReceiveMessage / DeleteMessage / ChangeMessageVisibility wire through the helpers. Receipt- handle v2 dispatch wired. Still gated by §11 PR 2 dormancy. - PR 5b-3 (atomic): gate-lift + CreateQueue capability check via PollSQSHTFIFOCapability. Small-but-critical. What does NOT change yet - 14 call sites in adapter/sqs_messages.go still call the legacy sqsMsg*Key constructors directly. PR 5b-2 swaps them. - encodeReceiptHandleV2 stays dormant. - decodeClientReceiptHandle still rejects v2 unconditionally. - CreateQueue still rejects PartitionCount > 1 with InvalidAttributeValue. Tests (5 top-level) - TestSQSKeysDispatch_LegacyMatchesLegacyConstructor: 8 sub-cases pinning byte-identical output for nil meta, PartitionCount=0, PartitionCount=1. - TestSQSKeysDispatch_PartitionedMatchesPartitionedConstructor: 6 sub-cases pinning byte-identical output for PartitionCount=4 partitioned constructors. - TestSQSKeysDispatch_LegacyAndPartitionedAreDistinct: keyspace isolation invariant — same logical record on legacy vs partitioned never shares a byte sequence. - TestEffectivePartitionCount: iteration-count contract (nil/0/1 → 1; explicit count otherwise). - TestSQSKeysDispatch_PartitionIgnoredOnLegacy: stale- partition caller against a non-partitioned queue produces the same key — protects against silent strand of messages. Self-review (per CLAUDE.md) 1. Data loss — wrappers only; no FSM/Pebble/retention path. Byte- identical output to legacy on non-partitioned queues. No issue. 2. Concurrency — pure functions, no shared state. No issue. 3. Performance — one nil-check + one comparison per dispatch. The non-partitioned path (steady state today) goes through the same constructor it always did. No issue. 4. Data consistency — keyspace isolation is pinned by TestSQSKeysDispatch_LegacyAndPartitionedAreDistinct. Stale- partition stranding is pinned by TestSQSKeysDispatch_PartitionIgnoredOnLegacy. No issue. 5. Test coverage — 19 sub-tests across the contract surface. Existing partitioned-constructor tests (PR #703) and legacy constructor tests continue to pass unchanged.
|
@claude review Phase 3.D PR 5b-1 — pure-additive dispatch helpers for partitioned-FIFO routing. Foundation for PR 5b-2's send/receive/delete wiring; PR 5b-3 then lifts the §11 PR 2 dormancy gate atomically with the CreateQueue capability check. No behaviour change in this PR — the 14 production call sites still use the legacy constructors directly. PR description has the rationale for the 3-way split and the full self-review. |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR introduces dispatch helper functions that route SQS key construction to either legacy single-partition or partitioned variants based on queue metadata. It includes an ChangesSQS Key Dispatch Helpers
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 51 minutes and 54 seconds.Comment |
|
Claude finished @bootjp's task in 2m 29s —— View job Review of PR #731 — SQS per-key dispatch helpers (Phase 3.D PR 5b-1)Todo
OverallClean scaffold. The dispatch helpers are minimal and correct: all 6 share the same guard ( Issues1. PR description overstates nil/PartitionCount=0 coverageThe PR description says:
Looking at the actual test ( Since all 6 guards are currently identical this is not a correctness risk today, but the gap matters for ratchet value and the description creates a false impression of coverage. Suggest either:
2. No boundary test at PartitionCount=2 (the minimal partitioned value)
Observations (no action required)
Partition arg is silently dropped on legacy path. This is the correct design (documented in the file header), but PR 5b-2 callers should be audited: any path that constructs SummaryTwo suggested improvements before merge:
Neither is a correctness defect in the scaffold itself. The implementation is sound and the split strategy is appropriate. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08968f9361
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if meta == nil || meta.PartitionCount <= 1 { | ||
| return 1 | ||
| } | ||
| return meta.PartitionCount |
There was a problem hiding this comment.
Honor perQueue mode in effective partition count
effectivePartitionCount returns meta.PartitionCount for every queue with PartitionCount > 1, but partitionFor explicitly forces all message groups to partition 0 when FifoThroughputLimit == "perQueue" (adapter/sqs_partitioning.go). In that valid configuration, a Receive fanout built on this helper will scan empty partitions on every poll, multiplying read/CPU work by up to the configured partition count (max 32) with no correctness benefit. The helper should collapse to 1 partition in perQueue mode to match the routing behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces a central abstraction for SQS key dispatching, routing between legacy single-partition and new partitioned-FIFO layouts based on the queue's partition count. It includes several dispatch helpers for data, visibility, deduplication, and group keys, along with comprehensive unit tests to ensure backward compatibility and keyspace isolation. I have no feedback to provide.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
adapter/sqs_keys_dispatch_test.go (1)
15-63: 💤 Low valueEager key construction in table rows is slightly atypical — consider moving calls inside closures.
Both table-driven tests build
dispatchedandlegacy/partitionedvalues at slice-literal initialization time (before any sub-test runs), so a panic in a key constructor surfaces as a failure of the parent function rather than the named sub-test. This also means thet.Parallel()race detector cannot observe the key-construction code. Moving the calls inside the sub-test closure gives sharper failure attribution and full race coverage.♻️ Proposed refactor (shown for LegacyMatchesLegacyConstructor; apply the same pattern to PartitionedMatchesPartitionedConstructor)
cases := []struct { name string - dispatched []byte - legacy []byte + dispatch func() []byte + legacy func() []byte }{ - {"meta=nil data", sqsMsgDataKeyDispatch(nil, queue, 0, gen, msgID), - sqsMsgDataKey(queue, gen, msgID)}, + {"meta=nil data", + func() []byte { return sqsMsgDataKeyDispatch(nil, queue, 0, gen, msgID) }, + func() []byte { return sqsMsgDataKey(queue, gen, msgID) }, + }, // ... remaining cases follow the same pattern } for _, tc := range cases { tc := tc t.Run(tc.name, func(t *testing.T) { t.Parallel() - require.Equal(t, tc.legacy, tc.dispatched, ...) + require.Equal(t, tc.legacy(), tc.dispatch(), ...) }) }Also applies to: 71-115
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adapter/sqs_keys_dispatch_test.go` around lines 15 - 63, The table builds dispatched and legacy byte slices eagerly which runs before sub-tests and t.Parallel(); move the calls to sqsMsgDataKeyDispatch, sqsMsgDataKey, sqsMsgVisKeyDispatch, sqsMsgVisKey, sqsMsgDedupKeyDispatch, sqsMsgDedupKey, sqsMsgGroupKeyDispatch, sqsMsgGroupKey, sqsMsgByAgeKeyDispatch, sqsMsgByAgeKey, and sqsMsgVisPrefixForQueueDispatch/sqsMsgVisPrefixForQueue inside the t.Run closure so each sub-test computes its own dispatched and legacy values (and any panics or race issues are attributed to the specific sub-test).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@adapter/sqs_keys_dispatch_test.go`:
- Around line 15-63: The table builds dispatched and legacy byte slices eagerly
which runs before sub-tests and t.Parallel(); move the calls to
sqsMsgDataKeyDispatch, sqsMsgDataKey, sqsMsgVisKeyDispatch, sqsMsgVisKey,
sqsMsgDedupKeyDispatch, sqsMsgDedupKey, sqsMsgGroupKeyDispatch, sqsMsgGroupKey,
sqsMsgByAgeKeyDispatch, sqsMsgByAgeKey, and
sqsMsgVisPrefixForQueueDispatch/sqsMsgVisPrefixForQueue inside the t.Run closure
so each sub-test computes its own dispatched and legacy values (and any panics
or race issues are attributed to the specific sub-test).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2165b9b5-ba43-4632-80de-8c61f03e47eb
📒 Files selected for processing (2)
adapter/sqs_keys_dispatch.goadapter/sqs_keys_dispatch_test.go
PR #731 round 1 review findings: 1) Codex P2 — effectivePartitionCount must honor perQueue mode partitionFor (sqs_partitioning.go:76) forces every MessageGroupId to partition 0 when meta.FifoThroughputLimit == "perQueue", regardless of PartitionCount. effectivePartitionCount returned the literal PartitionCount in that mode, so ReceiveMessage's fanout would scan up to 31 guaranteed-empty partitions per poll for no correctness benefit — multiplying read / CPU work by the configured partition count. Fix: collapse to 1 when FifoThroughputLimit is the perQueue short-circuit. Mirrors partitionFor's routing decision. TestEffectivePartitionCount_PerQueueModeCollapsesToOne pins this with 4 sub-cases (perQueue + 4, perQueue + 32, perMessageGroupId + 4, empty + 4) so a future refactor that drops the branch is caught. 2) Claude finding 1 — expand legacy-match test coverage The PR description claimed nil and PartitionCount=0 sub-cases for all 6 helpers, but the actual table only had them for sqsMsgDataKeyDispatch. Expanded to cover all 6 helpers. Without this, a future edit that drops the `meta != nil` guard from one of the non-data helpers would compile and pass the existing test suite. TestSQSKeysDispatch_LegacyMatchesLegacyConstructor grew from 8 to 15 sub-cases. 3) Claude finding 2 — PartitionCount=2 boundary TestSQSKeysDispatch_PartitionedMatchesPartitionedConstructor used PartitionCount=4. An off-by-one in the dispatch threshold (>= 1 vs > 1) wouldn't be caught because 2 was never exercised. New TestSQSKeysDispatch_BoundaryAtPartitionCount2 pins the boundary cheaply. CodeRabbit nitpick on the eager-construction test pattern is acknowledged as "Low value" in their own classification — left as-is since the existing pattern is consistent with the project's other table-driven tests. go test -race ./adapter/... and golangci-lint clean.
|
@claude review Round 2 — addresses 3 round-1 findings: Codex P2 (real, fixed): Claude finding 1 (real, fixed): PR description claimed nil/PartitionCount=0 sub-cases for all 6 helpers, but the actual table only had them for Claude finding 2 (real, fixed): New CodeRabbit nitpick (eager construction in table rows, marked "Low value" in their own classification): left as-is to stay consistent with the project's other table-driven tests.
|
|
Claude finished @bootjp's task in 1m 53s —— View job Review of PR #731 — Round 2Todo
All three round-1 findings are correctly addressed. No new issues. Fix 1 — perQueue collapse in
|
…(Phase 3.D PR 5b-2) (#732) ## Summary Stage 2 of Phase 3.D PR 5b: wire `SendMessage` / `ReceiveMessage` / `DeleteMessage` / `ChangeMessageVisibility` through the per-key dispatch helpers landed in PR 5b-1 (#731). The §11 PR 2 dormancy gate still rejects `PartitionCount > 1` at `CreateQueue`, so production queues stay on the legacy keyspace and observable behaviour is byte-identical. **PR 5b-3 lifts the gate atomically with the capability check.** ## What's wired ### Send paths - `sendMessage`, `sendBatchStandardOnce`: partition is a compile-time `0` (Standard queues reject `PartitionCount > 1` via the cross-attribute validator). Dispatch helpers route to the legacy keyspace on the steady-state path. - `sendFifoMessage`: hashes `MessageGroupId` once via `partitionFor` and threads the result through every key construction (data, vis, byage, dedup, group-lock). ### Receive fanout - `scanAndDeliverOnce` wraps the scan loop with a per-partition fanout over `effectivePartitionCount(meta)` iterations. The shared wall-clock + per-call `Max` budget caps apply across the whole call, not per-partition. - `sqsMsgCandidate` carries a new `partition` field stamped at scan time, so `loadCandidateRecord` / `expireMessage` / `commitReceiveRotation` / `classifyFifoGroupLock` all route to the partition the message was originally stored under. ### Delete + ChangeMessageVisibility - `handle.Partition` from the v2 receipt handle drives every key construction. - `loadMessageForDelete` / `loadAndVerifyMessage` invoke the new `validateReceiptHandleVersion` once meta is in scope. ### Receipt-handle dispatch (the contract shift) - `encodeReceiptHandleDispatch(meta, partition, ...)` is the single point that picks v1 vs v2. - `decodeClientReceiptHandle` is now version-agnostic; the queue-aware version check moved to `validateReceiptHandleVersion`. v1-on-partitioned and v2-on-non-partitioned both surface as `ReceiptHandleIsInvalid`, **preserving the dormancy promise** (no v2 wire-format probability from the public API) under the new contract. Since the §11 PR 2 dormancy gate keeps every production queue non-partitioned in this PR, every v2 handle still surfaces as `ReceiptHandleIsInvalid` downstream — exactly the PR 5a observable behaviour. ### Reaper The reaper iterates legacy byAge keys only — partition-iterating enumeration ships in the later partition-reaper PR. `buildReapOps` / `reapOneRecord` pass `nil` meta + `partition 0` through dispatch helpers so the keys are byte-identical to the pre-PR-5b layout. ## What does NOT change yet - `CreateQueue` still rejects `PartitionCount > 1` with `InvalidAttributeValue` (the §11 PR 2 dormancy gate, lifted atomically in PR 5b-3). - Production queues never enter the partitioned dispatch branches in this PR — observable behaviour is byte-identical. - The reaper does not yet enumerate partitioned byAge keys (a later PR). ## Test plan 10 new tests across the contract surface. **Unit tests** (`adapter/sqs_keys_dispatch_v2_test.go`): - [x] `TestEncodeReceiptHandleDispatch_PicksVersionByPartitionCount` — pins the version dispatch decision across `nil` / `0` / `1` / `4` metas. - [x] `TestEncodeReceiptHandleDispatch_LegacyByteIdenticalToV1` — protects the byte-identical guarantee on legacy queues. - [x] `TestEncodeReceiptHandleDispatch_PerQueueUsesV2` — `perQueue + PartitionCount=4` still produces v2 handles (keyspace is partitioned even when `partitionFor` collapses every group to partition 0). - [x] `TestValidateReceiptHandleVersion_QueueAwareRules` — 10 sub-cases covering the (`meta.PartitionCount` × `handle.Version`) matrix, including `perQueue + PartitionCount=4`. - [x] `TestValidateReceiptHandleVersion_NilHandle` — defensive nil branch. - [x] `TestValidateReceiptHandleVersion_RejectsV2OnNonPartitioned` — named regression for the dormancy guarantee under the new contract. - [x] `TestSQSMsgVisScanBoundsDispatch_LegacyMatchesLegacy` — byte-identical to legacy `sqsMsgVisScanBounds` on legacy meta. - [x] `TestSQSMsgVisScanBoundsDispatch_PartitionedUsesPartitionedPrefix` — different partitions yield disjoint scan ranges. - [x] `TestSQSMsgVisScanBoundsDispatch_PerQueueOnPartitionedKeyspace` — pins the **PR #731 round 2 forward-note invariant**: `perQueue + PartitionCount=4` keeps the partitioned vis prefix at partition 0; collapsing to legacy would silently strand send writes. **Integration tests** (`adapter/sqs_partitioned_dispatch_test.go`) install a partitioned meta directly on a queue created via the public API, short-circuiting the dormancy gate without disabling it for production `CreateQueue`: - [x] `TestSQSServer_PartitionedFIFO_SendReceiveDeleteRoundTrip` — end-to-end smoke test: send 6 groups, receive surfaces all via fanout, every handle is v2, delete via v2 handle, queue is empty afterwards, legacy keyspace stays empty. - [x] `TestSQSServer_PartitionedFIFO_RejectsV1Handle` — forged v1 handle on a partitioned queue surfaces as `ReceiptHandleIsInvalid` via `DeleteMessage` and `ChangeMessageVisibility`. - [x] `TestSQSServer_PartitionedFIFO_PerQueueCollapsesToPartitionZero` — `perQueue + PartitionCount=4` receive surfaces every message in one fanout pass; every v2 handle records `Partition=0`. Updated `TestDecodeClientReceiptHandle_RejectsV2` → `TestDecodeClientReceiptHandle_AcceptsV2` to reflect the contract shift (rejection moved from API boundary to meta-aware `validateReceiptHandleVersion`). - [x] `go test -race ./adapter/...` (targeted SQS scope) clean. - [x] `golangci-lint run ./adapter/...` clean. ## Self-review (per CLAUDE.md) 1. **Data loss** — Dispatch helpers byte-identical on legacy queues (`PartitionCount<=1` routes to legacy constructors). DLQ FIFO computes `dlqPartition` via `partitionFor(dlqMeta, srcRec.MessageGroupId)` so cross-queue redrive lands in the right partition. Receive fanout scans every partition. No issue. 2. **Concurrency** — Receive fanout iterates partitions sequentially under one shared wall-clock + max budget, so the original per-call SLA is preserved. `meta` is loaded once and passed by pointer through helpers; no concurrent meta refresh in a single call. `validateReceiptHandleVersion` runs after the gen check so a `SetQueueAttributes` race cannot flip the answer (`PartitionCount` is immutable). `go test -race` clean. 3. **Performance** — Hot path adds 1 nil-check + 1 `PartitionCount` compare per dispatch. Receive fanout iterates `effectivePartitionCount(meta)` times: 1 on legacy / `perQueue`, N on `perMessageGroupId` with N partitions splitting roughly 1/N of the messages each. 4. **Data consistency** — Receipt handle's recorded partition matches the storage partition (`commitReceiveRotation` uses `cand.partition` for both `newVisKey` and the handle). Cross-version handles rejected as `ReceiptHandleIsInvalid` — no leak between keyspaces. `perQueue + PartitionCount=4` invariant pinned by both unit and integration tests. 5. **Test coverage** — 10 new tests across the contract surface. Existing legacy-path SQS tests pass unchanged. ## Caller audit (semantic-change discipline) `decodeClientReceiptHandle` semantics changed from "reject all v2" to "decode any version, defer version validation to meta-aware caller". All 3 production callers audited via grep: - `parseQueueAndReceipt` (`sqs_messages.go:1576`) → `deleteMessageWithRetry` → `loadMessageForDelete` → `validateReceiptHandleVersion` after meta load. ✅ - `DeleteMessageBatch` entry (`sqs_messages_batch.go:475`) → `deleteMessageWithRetry` → same as above. ✅ - `ChangeMessageVisibilityBatch` entry (`sqs_messages_batch.go:575`) → `changeVisibilityWithRetry` → `loadAndVerifyMessage` → `validateReceiptHandleVersion`. ✅ The dormancy guarantee (v2 → `ReceiptHandleIsInvalid` on non-partitioned) is preserved across every caller path. ## Follow-ups - PR 5b-3 (next): atomic gate-lift + `CreateQueue` capability check via `PollSQSHTFIFOCapability` (#721). Removes the `validatePartitionDormancyGate` call and verifies cluster-wide HT-FIFO support before accepting `PartitionCount > 1`. - PR 6: partition-iterating reaper (drops the "legacy byAge only" caveat in `buildReapOps`). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Partition-aware message routing for FIFO and standard queues. * Receipt-handle v2 introduced for partitioned queues; v1 retained for non-partitioned queues. * Per-queue receive fanout rotation to round-robin partitions. * **Bug Fixes** * Stronger queue-aware receipt-handle validation and bounds checking. * Partition-aware fixes across send/receive/delete/change-visibility/redrive/reaper flows. * Per-queue fanout state reset on queue create/delete. * **Tests** * Extensive partitioned dispatch, receipt-handle, visibility-scan, dedup, rotation, and end-to-end tests. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…3.D PR 5b-3) (#734) ## Summary Phase 3.D PR 5b-3: lift the §11 PR 2 dormancy gate (PartitionCount > 1 hard-rejected at CreateQueue) and replace it with the §8.5 cluster-wide htfifo capability poll, atomically in one commit. The data plane caught up in #731 + #732, so partitioned FIFO queues are now creatable end-to-end. **What changes** - **Remove** `validatePartitionDormancyGate` and `htfifoTemporaryGateMessage` from `adapter/sqs_partitioning.go`. Both were marked "Removed in PR 5 in the same commit that wires the data plane so the gate-and-lift land atomically" — this is that commit. - **Add** `(*SQSServer).validateHTFIFOCapability` in a new `adapter/sqs_capability_gate.go`, called from `createQueueCore`. Two-stage fail-closed check on `PartitionCount > 1`: 1. **Local**: this binary must advertise htfifo (`htfifoCapabilityAdvertised`). Refuses with `InvalidAttributeValue` otherwise. 2. **Peers**: every entry in `s.leaderSQS` must report htfifo via `/sqs_health` within the poller's per-peer timeout. Any timeout, HTTP error, malformed body, or missing capability blocks the create. Vacuous on `PartitionCount <= 1` and on empty `leaderSQS` (single-node cluster — the local check is the whole cluster). - **Helpers**: `collectSQSPeers` returns `leaderSQS` values in deterministic sorted order with empty/duplicate addresses filtered. `buildHTFIFOCapabilityRejection` composes the rejection message with each failing peer's address + reason so an operator triaging a partial-rolling-upgrade cluster does not need to re-run the poll out-of-band. **CreateQueue control flow on `PartitionCount > 1`** ``` schema validators (validatePartitionConfig, etc.) → validateHTFIFOCapability → htfifoCapabilityAdvertised check (local) → PollSQSHTFIFOCapability(ctx, collectSQSPeers(), …) → reject with InvalidAttributeValue on any failure → createQueueWithRetry ``` **Caller audit:** `validateHTFIFOCapability` has exactly one production caller (`createQueueCore` in `sqs_catalog.go`); both the JSON handler and the future query-protocol handler reach it through that one path. `SetQueueAttributes` is unaffected because `PartitionCount` is immutable post-create (`validatePartitionImmutability`). **Test changes** - Delete `TestValidatePartitionDormancyGate_RejectsAboveOne` (function gone). - Convert `TestSQSServer_HTFIFO_DormancyGate_RejectsPartitionedCreate` → `TestSQSServer_HTFIFO_CapabilityGate_AcceptsOnSingleNode`: the same wire payloads now SUCCEED because the local node advertises htfifo and there are no peers to poll. - Rename `TestSQSServer_HTFIFO_DormancyGate_AllowsPartitionCountOne` → `TestSQSServer_HTFIFO_CapabilityGate_AllowsPartitionCountOne` for consistency. - Refresh comments on related tests + helpers. **New unit tests** (`sqs_capability_gate_test.go`): - `TestValidateHTFIFOCapability_ShortCircuitsOnLegacyMeta` - `TestValidateHTFIFOCapability_AcceptsWhenAllPeersAdvertise` - `TestValidateHTFIFOCapability_AcceptsOnEmptyPeerList` - `TestValidateHTFIFOCapability_RejectsWhenOnePeerLacksCapability` - `TestValidateHTFIFOCapability_RejectsWhenPeerUnreachable` - `TestCollectSQSPeers_Deterministic` - `TestBuildHTFIFOCapabilityRejection_ShapesOperatorMessage` ## Self-review (CLAUDE.md) 1. **Data loss** — None. The gate strictly tightens CreateQueue acceptance vs. the previous dormancy reject; no path now accepts a write that would have been rejected before. The dormancy gate's invariant ("partitioned-shape meta cannot land on a binary that does not handle the partitioned keyspace") is preserved by the local `htfifoCapabilityAdvertised` check and strengthened by the cluster-wide poll. 2. **Concurrency / distributed failures** — Poll runs concurrently across peers via the existing `PollSQSHTFIFOCapability` helper (covered by its own race tests). `collectSQSPeers` + sort are pure / deterministic. `leaderSQS` is only mutated at SQSServer construction (`WithSQSLeaderMap`), not at request time, so no read/write races. Leader transitions during the poll are handled by the existing `proxyToLeader` path that gates `createQueue` before `validateHTFIFOCapability` runs. 3. **Performance** — Poll cost is O(peers) and only paid on `PartitionCount > 1` creates (rare control-plane operation). Legacy / single-partition CreateQueue calls pay one short-circuit branch. `collectSQSPeers`' sort is O(N log N) on a small N (cluster size). No hot-path impact. 4. **Data consistency** — Schema validators (`PartitionCount` shape, dedup-scope rule, perMessageGroupId rule) still run BEFORE the capability gate inside `parseAttributesIntoMeta`, so an invalid shape rejects with the schema's reason rather than the gate's. SetQueueAttributes immutability remains the guard for post-create partition-shape changes. 5. **Test coverage** — Gate function: 5 unit tests covering short-circuit, happy path, vacuous empty, rolling-upgrade, and unreachable-peer classes. Helpers: 2 unit tests pinning deterministic order and message shape. Wire-level: existing HT-FIFO integration tests carry forward, with the dormancy-reject test converted to the new accepts-on-single-node happy path. ## Test plan - [x] `make lint` — golangci-lint 0 issues - [x] Targeted: `TestValidateHTFIFOCapability* | TestCollectSQSPeers | TestBuildHTFIFOCapabilityRejection | TestSQSServer_HTFIFO* | TestValidatePartition*` (-race, clean) - [x] Wider regression sweep on CreateQueue / DeleteQueue / PurgeQueue / SetQueueAttributes / Send/Receive/Delete / PartitionedFIFO / dispatch (-race, clean) - [ ] CI: full Jepsen + race on the PR <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Cluster-wide capability validation for creating partitioned FIFO queues—creation now requires peers to advertise HT‑FIFO. * Client-facing rejection messages are sanitized; detailed per-peer diagnostics are recorded in server logs only. * **Bug Fixes** * Partitioned FIFO queues can be created on single-node clusters (no longer rejected). * **Tests** * Added comprehensive tests for the capability gate, peer collection, deterministic behavior, and server-side log formatting. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Foundation for PR 5b's Send/Receive partition fanout. Adds
sqsMsg*KeyDispatchwrappers that route to the legacy or partitioned constructor based onmeta.PartitionCount. Pure scaffold: helpers are added but no production call site uses them yet — those switch in PR 5b-2.Why a separate scaffold PR
PR 5b's full scope (Send/Receive fanout + receipt-handle v2 wiring + dormancy gate-lift + CreateQueue capability check) is ~1000 LOC across multiple files. The lessons-learned discipline from recent rounds (PR #723 missed the raftadmin stub because the audit was function-scoped, not interface-scoped) favours smaller diffs.
Splitting PR 5b into three:
PollSQSHTFIFOCapability. Small-but-critical.What's added
adapter/sqs_keys_dispatch.go(new file):sqsMsgDataKeyDispatch,sqsMsgVisKeyDispatch,sqsMsgDedupKeyDispatch,sqsMsgGroupKeyDispatch,sqsMsgByAgeKeyDispatch,sqsMsgVisPrefixForQueueDispatcheffectivePartitionCounthelper for fanout iterationEach helper takes
(meta, queueName, partition, gen, …)and routes to the legacy single-partition constructor whenmeta.PartitionCount <= 1(existing behaviour, byte-identical output) or to the partitioned constructor (#703's keyspace) when> 1.What does NOT change yet
adapter/sqs_messages.gostill call legacysqsMsg*Keyconstructors directly. PR 5b-2 swaps them.encodeReceiptHandleV2stays dormant.decodeClientReceiptHandlestill rejects v2 unconditionally.PartitionCount > 1withInvalidAttributeValue.Test plan
5 top-level tests (19 sub-cases):
TestSQSKeysDispatch_LegacyMatchesLegacyConstructor— 8 sub-cases pinning byte-identical output for nil meta,PartitionCount=0,PartitionCount=1across all 6 helpers.TestSQSKeysDispatch_PartitionedMatchesPartitionedConstructor— 6 sub-cases pinning byte-identical output forPartitionCount=4across all partitioned constructors.TestSQSKeysDispatch_LegacyAndPartitionedAreDistinct— keyspace isolation invariant.TestEffectivePartitionCount— iteration-count contract.TestSQSKeysDispatch_PartitionIgnoredOnLegacy— protects against stale-partition stranding on non-partitioned queues.go test -race ./adapter/...pass.golangci-lint ./adapter/...clean.Self-review (per CLAUDE.md)
TestSQSKeysDispatch_LegacyAndPartitionedAreDistinct. Stale-partition stranding pinned byTestSQSKeysDispatch_PartitionIgnoredOnLegacy. No issue.Summary by CodeRabbit
New Features
Tests