From b2e8c14d130c3b25afc9a355f8726b4252e134cc Mon Sep 17 00:00:00 2001 From: "Yoshiaki Ueda (bootjp)" Date: Tue, 28 Apr 2026 23:55:39 +0900 Subject: [PATCH 1/4] docs(design): promote admin dashboard from _partial_ to _implemented_ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per docs/design/README.md's lifecycle convention. The original P1–P4 plan has fully shipped: - P1 (admin skeleton + Dynamo + AdminForward) — #634/#635/#644/#648 - P2 (S3 endpoints incl. write paths and AdminForward integration) — #658 / #669 / #673 / #695 (TOCTOU safety net) - P3 (React SPA + embed) — #649 / #650 - P4 (TLS / role / CSRF / operator doc / deployment runbook / scripts/rolling-update.sh admin support) — #674 / #669 / #678 The AdminDeleteBucket TOCTOU caught during PR #669 review (the last "in-flight" item that kept the doc at _partial_) is fully resolved by the safety-net design landed in #695. What changed: - git mv 2026_04_24_partial_admin_dashboard.md → 2026_04_24_implemented_admin_dashboard.md (history follows the rename) - Header Status line: "Partial" → "Implemented", explanation updated to reflect the post-fix state and the rationale for promotion. - "Last updated" bumped to 2026-04-28 with the rename trigger. - Section heading "Outstanding open items" → "Out-of-scope follow-ups" — the remaining three entries (criterion 5, object browser, TLS hot-reload) are not in-flight work; they are deferred-at-design or Non-goal items. The TOCTOU bullet is removed (resolved) and replaced with a one-line cross-link to the safety-net design + admin_deployment.md §4.6 contract. - Removed the closing "rename trigger" sentence — we just did the rename. - Status table: P2 row now lists #695 alongside #658/#669/#673 so a future reader can find the TOCTOU fix from the index. - Cross-references updated everywhere the old filename appeared: docs/admin.md (header link + Cross-references) docs/admin_deployment.md (header link + final cross-ref) docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md (Background section pointer) internal/admin/config.go (Section 7.1 reference comment) No code changes other than the comment-only filename refresh in config.go. --- docs/admin.md | 4 ++-- docs/admin_deployment.md | 4 ++-- ...2026_04_24_implemented_admin_dashboard.md} | 19 +++++++++---------- ...proposed_admin_delete_bucket_safety_net.md | 4 ++-- internal/admin/config.go | 2 +- 5 files changed, 16 insertions(+), 17 deletions(-) rename docs/design/{2026_04_24_partial_admin_dashboard.md => 2026_04_24_implemented_admin_dashboard.md} (93%) diff --git a/docs/admin.md b/docs/admin.md index 28c771b7e..14bd4fb85 100644 --- a/docs/admin.md +++ b/docs/admin.md @@ -2,7 +2,7 @@ This document covers configuration and day-2 operation of the admin HTTP listener. Architecture and design rationale live in -[docs/design/2026_04_24_proposed_admin_dashboard.md](design/2026_04_24_proposed_admin_dashboard.md); +[docs/design/2026_04_24_implemented_admin_dashboard.md](design/2026_04_24_implemented_admin_dashboard.md); read that first if you're touching the code. ## What the admin dashboard is @@ -336,6 +336,6 @@ to populate the embedded `dist` directory, then rebuild the binary. ## Cross-references - Deployment runbook: [docs/admin_deployment.md](admin_deployment.md) (login flow, rollout via `scripts/rolling-update.sh`, key/TLS rotation, failure-mode runbooks) -- Design rationale: [docs/design/2026_04_24_proposed_admin_dashboard.md](design/2026_04_24_proposed_admin_dashboard.md) (renamed to `_partial_` in PR #675; this link will follow once that lands) +- Design rationale: [docs/design/2026_04_24_implemented_admin_dashboard.md](design/2026_04_24_implemented_admin_dashboard.md) - Architecture overview: [docs/architecture_overview.md](architecture_overview.md) - AdminForward RPC contract: `proto/admin_forward.proto` diff --git a/docs/admin_deployment.md b/docs/admin_deployment.md index dc8e94bb0..095aa45ac 100644 --- a/docs/admin_deployment.md +++ b/docs/admin_deployment.md @@ -11,7 +11,7 @@ read [`docs/admin.md`](admin.md) first — this doc assumes you have already skimmed it. For design rationale, see -[`docs/design/2026_04_24_partial_admin_dashboard.md`](design/2026_04_24_partial_admin_dashboard.md). +[`docs/design/2026_04_24_implemented_admin_dashboard.md`](design/2026_04_24_implemented_admin_dashboard.md). --- @@ -402,7 +402,7 @@ mean the cluster has lost quorum. - [`docs/admin.md`](admin.md) — per-flag configuration reference, audit log shapes, troubleshooting catalogue. -- [`docs/design/2026_04_24_partial_admin_dashboard.md`](design/2026_04_24_partial_admin_dashboard.md) — +- [`docs/design/2026_04_24_implemented_admin_dashboard.md`](design/2026_04_24_implemented_admin_dashboard.md) — design rationale, acceptance criteria, outstanding items. - [`scripts/rolling-update.sh`](../scripts/rolling-update.sh) — the rollout driver this doc references throughout. diff --git a/docs/design/2026_04_24_partial_admin_dashboard.md b/docs/design/2026_04_24_implemented_admin_dashboard.md similarity index 93% rename from docs/design/2026_04_24_partial_admin_dashboard.md rename to docs/design/2026_04_24_implemented_admin_dashboard.md index 54319ca22..6e88b31c8 100644 --- a/docs/design/2026_04_24_partial_admin_dashboard.md +++ b/docs/design/2026_04_24_implemented_admin_dashboard.md @@ -1,27 +1,26 @@ # elastickv Admin Dashboard Design -**Status:** Partial — every phase of the original P1–P4 plan has shipped. The doc stays at `_partial_` (rather than `_implemented_`) because AdminForward acceptance criterion 5 (rolling-upgrade compatibility flag) is explicitly deferred and the AdminDeleteBucket TOCTOU caught during PR #669 review is tracked here as a pre-existing limitation. See the status table for the per-phase breakdown and Outstanding open items below. +**Status:** Implemented — every phase of the original P1–P4 plan has shipped, the AdminDeleteBucket TOCTOU caught during PR #669 review is fixed (PR #695 with the two-phase split required by the production coordinator's dispatch validation), and operator documentation + deployment tooling are in place. The remaining items in §"Out-of-scope follow-ups" below are either explicitly deferred at design time or were called out as Non-goals in §2.2; none block dashboard usability today. **Author:** bootjp **Date:** 2026-04-24 -**Last updated:** 2026-04-27 (P2 write paths + P4 operator doc landed; status table refreshed) +**Last updated:** 2026-04-28 (renamed from `_partial_` to `_implemented_` after PR #695 landed the TOCTOU safety-net fix) -## Implementation status (as of 2026-04-27) +## Implementation status (as of 2026-04-28) | Phase | Status | Landed via | |---|---|---| -| **P1** — `internal/admin/` skeleton, auth, DynamoDB list/create/describe/delete, AdminForward (Section 3.3 acceptance criteria 1–4 + 6; criterion 5 deferred — see outstanding items) | ✅ shipped | #634, #635, #644, #648 | -| **P2** — S3 bucket list/create/delete/ACL, DescribeTable | ✅ shipped | #658 (read-only slice 1) + #669 (writes, slice 2a) + #673 (AdminForward integration, slice 2b) | +| **P1** — `internal/admin/` skeleton, auth, DynamoDB list/create/describe/delete, AdminForward (Section 3.3 acceptance criteria 1–4 + 6; criterion 5 deferred — see follow-ups) | ✅ shipped | #634, #635, #644, #648 | +| **P2** — S3 bucket list/create/delete/ACL, DescribeTable | ✅ shipped | #658 (read-only slice 1) + #669 (writes, slice 2a) + #673 (AdminForward integration, slice 2b) + #695 (AdminDeleteBucket TOCTOU safety net) | | **P3** — React SPA + embed | ✅ shipped | #649, #650 | | **P4** — TLS, read-only role, CSRF, `docs/admin.md`, deployment runbook + `scripts/rolling-update.sh` admin support | ✅ shipped | TLS / role / CSRF live in P1; operator doc + runbook + script wiring in #674 / #669 / #678 | -Outstanding open items (kept here so future readers know what is still owed against the original proposal): +Out-of-scope follow-ups (recorded so future readers know what was deliberately deferred): -- **AdminForward acceptance criterion 5** — rolling-upgrade compatibility flag (`admin.leader_forward_v2`). Deferred behind a cluster-version bump; not blocking dashboard usability today because every node forwards through the same `pb.AdminOperation` enum. -- ~~AdminDeleteBucket TOCTOU~~ — **fixed**. The empty-probe → commit race is now covered by a `DEL_PREFIX` safety net on the same `OperationGroup`: `AdminDeleteBucket` and `s3.go:deleteBucket` both wipe every per-bucket key family (manifest / upload-meta / upload-part / blob / gc-upload / route) at the shared commitTS, so objects that landed in the race window are tombstoned together with `BucketMetaKey` instead of orphaning. Trade-off: a `PutObject` that returned 200 OK during the race window can be swept by the concurrent delete — operators should pause writes before bucket delete (now documented in `docs/admin_deployment.md` §4.6). See [`2026_04_28_proposed_admin_delete_bucket_safety_net.md`](2026_04_28_proposed_admin_delete_bucket_safety_net.md) for the design. -- **S3 object browser** — explicitly called out as "next phase" in Section 2 Non-goals; no work item yet. +- **AdminForward acceptance criterion 5** — rolling-upgrade compatibility flag (`admin.leader_forward_v2`). Deferred at design time behind a cluster-version bump that does not exist yet; not blocking dashboard usability today because every node forwards through the same `pb.AdminOperation` enum. +- **S3 object browser** — explicitly called out as "next phase" in §2.2 Non-goals; no work item yet. - **Operator-visible TLS cert reload** — out of scope; restart-to-rotate is the documented model in `docs/admin.md`. -When the rolling-upgrade flag (the only remaining functional blocker after the TOCTOU fix landed) is addressed, this doc is renamed `2026_04_24_implemented_admin_dashboard.md` per `docs/design/README.md`'s lifecycle convention. +The AdminDeleteBucket TOCTOU is fully resolved: see [`2026_04_28_proposed_admin_delete_bucket_safety_net.md`](2026_04_28_proposed_admin_delete_bucket_safety_net.md) for the safety-net design and [`docs/admin_deployment.md`](../admin_deployment.md) §4.6 for the operator-side contract (a `PutObject` 200-OK landing during the race window can be swept by the concurrent admin delete; pause writes before delete to retain in-flight writes). --- diff --git a/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md b/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md index 164171e52..1068bb906 100644 --- a/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md +++ b/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md @@ -8,8 +8,8 @@ `AdminDeleteBucket` and the SigV4 `s3.go:deleteBucket` share a known TOCTOU race documented in -[`docs/design/2026_04_24_partial_admin_dashboard.md`](2026_04_24_partial_admin_dashboard.md) -under Outstanding open items. coderabbitai 🔴/🟠 flagged it during PR +[`docs/design/2026_04_24_implemented_admin_dashboard.md`](2026_04_24_implemented_admin_dashboard.md) +under Out-of-scope follow-ups. coderabbitai 🔴/🟠 flagged it during PR #669 review. The current shape: diff --git a/internal/admin/config.go b/internal/admin/config.go index 241a68d00..ffec30bb0 100644 --- a/internal/admin/config.go +++ b/internal/admin/config.go @@ -15,7 +15,7 @@ const ( ) // Config captures everything the admin listener needs at startup. It mirrors -// the Section 7.1 table in docs/design/2026_04_24_proposed_admin_dashboard.md +// the Section 7.1 table in docs/design/2026_04_24_implemented_admin_dashboard.md // and intentionally uses plain Go fields rather than a config library so the // existing flag-based wiring in main.go can hand values over without a new // dependency. From 0dd986aa8abc848ed87adc57f02a462e6fe8186e Mon Sep 17 00:00:00 2001 From: "Yoshiaki Ueda (bootjp)" Date: Wed, 29 Apr 2026 01:34:45 +0900 Subject: [PATCH 2/4] Update docs/design/2026_04_24_implemented_admin_dashboard.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- docs/design/2026_04_24_implemented_admin_dashboard.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2026_04_24_implemented_admin_dashboard.md b/docs/design/2026_04_24_implemented_admin_dashboard.md index 6e88b31c8..6fef49097 100644 --- a/docs/design/2026_04_24_implemented_admin_dashboard.md +++ b/docs/design/2026_04_24_implemented_admin_dashboard.md @@ -9,7 +9,7 @@ | Phase | Status | Landed via | |---|---|---| -| **P1** — `internal/admin/` skeleton, auth, DynamoDB list/create/describe/delete, AdminForward (Section 3.3 acceptance criteria 1–4 + 6; criterion 5 deferred — see follow-ups) | ✅ shipped | #634, #635, #644, #648 | +| **P1** — internal/admin/ skeleton, auth, DynamoDB list/create/describe/delete, AdminForward (Section 3.3 acceptance criteria 1–4 + 6; criterion 5 deferred — see [follow-ups](#out-of-scope-follow-ups)) | ✅ shipped | #634, #635, #644, #648 | | **P2** — S3 bucket list/create/delete/ACL, DescribeTable | ✅ shipped | #658 (read-only slice 1) + #669 (writes, slice 2a) + #673 (AdminForward integration, slice 2b) + #695 (AdminDeleteBucket TOCTOU safety net) | | **P3** — React SPA + embed | ✅ shipped | #649, #650 | | **P4** — TLS, read-only role, CSRF, `docs/admin.md`, deployment runbook + `scripts/rolling-update.sh` admin support | ✅ shipped | TLS / role / CSRF live in P1; operator doc + runbook + script wiring in #674 / #669 / #678 | From e2ed240a4f9ebd62f47ae8eeae4a46bbbdb72df7 Mon Sep 17 00:00:00 2001 From: "Yoshiaki Ueda (bootjp)" Date: Wed, 29 Apr 2026 01:55:57 +0900 Subject: [PATCH 3/4] docs(design): restore inline-code formatting on internal/admin/ in P1 row MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The gemini suggestion accepted in 0dd986aa added the [follow-ups](#out-of-scope-follow-ups) anchor link as intended, but the suggestion text dropped the backticks around `internal/admin/` in the P1 status-table row. Other rows in the same table (P4's references to docs/admin.md and scripts/rolling-update.sh) still use backticks for path-like identifiers, so the unbacktick'd internal/admin/ is the formatting outlier. Re-added the backticks. No other changes — the anchor link the suggestion introduced is preserved. --- docs/design/2026_04_24_implemented_admin_dashboard.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2026_04_24_implemented_admin_dashboard.md b/docs/design/2026_04_24_implemented_admin_dashboard.md index 6fef49097..ef465147d 100644 --- a/docs/design/2026_04_24_implemented_admin_dashboard.md +++ b/docs/design/2026_04_24_implemented_admin_dashboard.md @@ -9,7 +9,7 @@ | Phase | Status | Landed via | |---|---|---| -| **P1** — internal/admin/ skeleton, auth, DynamoDB list/create/describe/delete, AdminForward (Section 3.3 acceptance criteria 1–4 + 6; criterion 5 deferred — see [follow-ups](#out-of-scope-follow-ups)) | ✅ shipped | #634, #635, #644, #648 | +| **P1** — `internal/admin/` skeleton, auth, DynamoDB list/create/describe/delete, AdminForward (Section 3.3 acceptance criteria 1–4 + 6; criterion 5 deferred — see [follow-ups](#out-of-scope-follow-ups)) | ✅ shipped | #634, #635, #644, #648 | | **P2** — S3 bucket list/create/delete/ACL, DescribeTable | ✅ shipped | #658 (read-only slice 1) + #669 (writes, slice 2a) + #673 (AdminForward integration, slice 2b) + #695 (AdminDeleteBucket TOCTOU safety net) | | **P3** — React SPA + embed | ✅ shipped | #649, #650 | | **P4** — TLS, read-only role, CSRF, `docs/admin.md`, deployment runbook + `scripts/rolling-update.sh` admin support | ✅ shipped | TLS / role / CSRF live in P1; operator doc + runbook + script wiring in #674 / #669 / #678 | From 1825f1d212a2d934d0abd71f911e609b08e370ef Mon Sep 17 00:00:00 2001 From: "Yoshiaki Ueda (bootjp)" Date: Wed, 29 Apr 2026 02:02:19 +0900 Subject: [PATCH 4/4] docs(design): fix Claude-bot follow-ups from #701 review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three issues from the @claude review of PR #701: 1. **Anchor link target now resolves**. The accepted gemini suggestion added `[follow-ups](#out-of-scope-follow-ups)` to the P1 status-table row, but the target was a plain-text paragraph — GitHub only auto-generates anchors for headings, so the link silently scrolled to the top of the document. Promoted "Out-of-scope follow-ups" to a `### ` heading and moved the parenthetical caption onto its own italicised line below it. Anchor `#out-of-scope-follow-ups` now actually resolves. 2. **Stale subsection pointer in safety-net doc**. The 2026_04_28_proposed_admin_delete_bucket_safety_net.md Background section pointed at "Out-of-scope follow-ups" for the TOCTOU description, but after the rename the TOCTOU is no longer in that section — it's resolved (described in the implementation-status block at the top of the implemented doc, line 23). Updated the prose to say "resolved by this design and recorded in the implementation-status block of …" so the cross-link points where the TOCTOU is actually discussed in the renamed doc. 3. **Stale link description in admin_deployment.md**. The final cross-references list called the design doc's third bullet "outstanding items" — that is the `_partial_`-era wording. Renamed to "out-of-scope follow-ups" to match the promoted doc's actual section heading. Also reordered the implementation-status block: TOCTOU resolution paragraph now precedes the Out-of-scope section so the doc reads "what shipped → how the late-breaking issue was fixed → what was deliberately deferred". Previously the TOCTOU paragraph sat after the deferred-items bullet list which read awkwardly. No content changes other than the three fixes above and the section reorder. --- docs/admin_deployment.md | 2 +- docs/design/2026_04_24_implemented_admin_dashboard.md | 8 +++++--- .../2026_04_28_proposed_admin_delete_bucket_safety_net.md | 8 ++++---- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/docs/admin_deployment.md b/docs/admin_deployment.md index 095aa45ac..42792f31b 100644 --- a/docs/admin_deployment.md +++ b/docs/admin_deployment.md @@ -403,7 +403,7 @@ mean the cluster has lost quorum. - [`docs/admin.md`](admin.md) — per-flag configuration reference, audit log shapes, troubleshooting catalogue. - [`docs/design/2026_04_24_implemented_admin_dashboard.md`](design/2026_04_24_implemented_admin_dashboard.md) — - design rationale, acceptance criteria, outstanding items. + design rationale, acceptance criteria, out-of-scope follow-ups. - [`scripts/rolling-update.sh`](../scripts/rolling-update.sh) — the rollout driver this doc references throughout. - [`scripts/rolling-update.env.example`](../scripts/rolling-update.env.example) — diff --git a/docs/design/2026_04_24_implemented_admin_dashboard.md b/docs/design/2026_04_24_implemented_admin_dashboard.md index ef465147d..a06b8b966 100644 --- a/docs/design/2026_04_24_implemented_admin_dashboard.md +++ b/docs/design/2026_04_24_implemented_admin_dashboard.md @@ -14,14 +14,16 @@ | **P3** — React SPA + embed | ✅ shipped | #649, #650 | | **P4** — TLS, read-only role, CSRF, `docs/admin.md`, deployment runbook + `scripts/rolling-update.sh` admin support | ✅ shipped | TLS / role / CSRF live in P1; operator doc + runbook + script wiring in #674 / #669 / #678 | -Out-of-scope follow-ups (recorded so future readers know what was deliberately deferred): +The AdminDeleteBucket TOCTOU is fully resolved: see [`2026_04_28_proposed_admin_delete_bucket_safety_net.md`](2026_04_28_proposed_admin_delete_bucket_safety_net.md) for the safety-net design and [`docs/admin_deployment.md`](../admin_deployment.md) §4.6 for the operator-side contract (a `PutObject` 200-OK landing during the race window can be swept by the concurrent admin delete; pause writes before delete to retain in-flight writes). + +### Out-of-scope follow-ups + +_Recorded so future readers know what was deliberately deferred._ - **AdminForward acceptance criterion 5** — rolling-upgrade compatibility flag (`admin.leader_forward_v2`). Deferred at design time behind a cluster-version bump that does not exist yet; not blocking dashboard usability today because every node forwards through the same `pb.AdminOperation` enum. - **S3 object browser** — explicitly called out as "next phase" in §2.2 Non-goals; no work item yet. - **Operator-visible TLS cert reload** — out of scope; restart-to-rotate is the documented model in `docs/admin.md`. -The AdminDeleteBucket TOCTOU is fully resolved: see [`2026_04_28_proposed_admin_delete_bucket_safety_net.md`](2026_04_28_proposed_admin_delete_bucket_safety_net.md) for the safety-net design and [`docs/admin_deployment.md`](../admin_deployment.md) §4.6 for the operator-side contract (a `PutObject` 200-OK landing during the race window can be swept by the concurrent admin delete; pause writes before delete to retain in-flight writes). - --- ## 1. Background and Motivation diff --git a/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md b/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md index 1068bb906..4565527b8 100644 --- a/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md +++ b/docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md @@ -7,10 +7,10 @@ ## 1. Background `AdminDeleteBucket` and the SigV4 `s3.go:deleteBucket` share a known -TOCTOU race documented in -[`docs/design/2026_04_24_implemented_admin_dashboard.md`](2026_04_24_implemented_admin_dashboard.md) -under Out-of-scope follow-ups. coderabbitai 🔴/🟠 flagged it during PR -#669 review. +TOCTOU race resolved by this design and recorded in the +implementation-status block of +[`docs/design/2026_04_24_implemented_admin_dashboard.md`](2026_04_24_implemented_admin_dashboard.md). +coderabbitai 🔴/🟠 flagged it during PR #669 review. The current shape: