feat(controlplane): drift-correct iceberg.enabled in reconcileReady#555
Merged
Conversation
Mirror the pgbouncer.enabled drift correction pattern for iceberg so
flipping iceberg_enabled in the configstore (admin API) is sufficient
to propagate to a warehouse already in Ready state. Without this,
operators have to also kubectl patch the Duckling CR.
iceberg.namespace is intentionally not drift-corrected — the XRD's CEL
admission rule rejects post-create namespace changes, so a drift attempt
would just hit a 422. Namespace changes require warehouse re-creation.
Adds GetIcebergEnabled / SetIcebergEnabled on DucklingClient (RFC 7396
merge patch, only touches the iceberg block) and parallel
TestReconcileReadyPatchesCRWhenIcebergFlipped{On,Off} tests covering both
directions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
client.Register was using context.Background() — no timeout. Any environment that can't reach Let's Encrypt (LE outage, CI runner without outbound network, etc.) would hang duckgres startup forever. Failure was already handled as warn-and-continue further down; the timeout just lets us reach that path instead of blocking. CI was hitting this deterministically on TestACMEDNSCertCache (the test runs through NewACMEDNSManager which goes out to LE before reaching the AWS-config-failure path the test is actually exercising). Bundled into this PR rather than separate so the iceberg drift correction can land — same root unblocker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
fuziontech
added a commit
that referenced
this pull request
May 10, 2026
PR #554 added the Iceberg fields to ManagedWarehouse (configstore) and to the upsert column allowlist, but missed the strict-decode struct managedWarehouseRequest that gates which top-level keys a PUT body may carry. The doc comment on that struct literally warns: "If you add a field here without a matching tag on ManagedWarehouse, strict decode will accept it and the merge will silently drop it." Without this fix, `PUT /orgs/<id>/warehouse` with `{"iceberg":{...}}` returns 400 "unknown field iceberg" — making it impossible to enable Iceberg via admin API on existing warehouses (the whole reason the drift correction in #555 exists). Adds the matching test mirroring TestPutWarehouseDisablesPgBouncerWhenSetToFalse. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
fuziontech
added a commit
that referenced
this pull request
May 12, 2026
The Duckling-status → configstore propagation for iceberg (ARN, region, namespace, state=ready) currently only fires in `reconcileProvisioning`. Once a warehouse transitions to `Ready`, the controller stops pulling status fields from the Duckling CR. That's wrong for the late-iceberg-enable case: an admin flips iceberg.enabled=true via the admin API on a warehouse that's already Ready; #555's drift correction patches `spec.iceberg.enabled` on the Duckling CR; the Crossplane composition provisions the bucket and writes `status.iceberg.tableBucketArn`; but the controller never reads that status back because the warehouse is still Ready (never moved back through Provisioning). The configstore's iceberg.table_bucket_arn stays empty and the worker activator's IcebergConfig.TableBucket stays empty — AttachIcebergCatalog gets gated out, iceberg never actually attaches at the worker. This was observed on prod-us today: the cascade completed at the Crossplane layer (Duckling.status.iceberg.tableBucketArn populated) but `iceberg_state` in the configstore never flipped from "" to "ready". Recovery required a manual admin API PUT. Extract the propagation into `addIcebergStatusUpdates`; call from both `reconcileProvisioning` (initial turn-up) and `reconcileReady` (late-enable). Added TestReconcileReadyPropagatesIcebergStatusToConfigstore covering the bug. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the gap left by #554: when an operator flips
iceberg_enabledin the configstore on a warehouse already in Ready state, the provisioner controller now patchesspec.iceberg.enabledon the existing Duckling CR. Mirrors the pgbouncer.enabled drift correction pattern.Why
Without this, the admin-API path to enable Iceberg on an existing warehouse takes two steps: configstore flip + manual
kubectl patch. Now the configstore flip alone is sufficient.What's drift-corrected and what isn't
iceberg.enabled— both directions (false→true and true→false). Symmetric with pgbouncer.iceberg.namespace— the XRD's CEL admission rule rejects post-create namespace changes (#11032), so a drift attempt would just hit a 422. Namespace changes require warehouse re-creation.What about the table bucket itself?
managementPolicies: ["Observe", "Create"](set-once). Flippingiceberg.enabled=trueprovisions the bucket; flipping back tofalsemakes the GoTemplate skip rendering the Workspace, which removes the K8s MR but does NOT delete the AWS table bucket (noDeletein management policies). Data is preserved; the worker just stops attaching the catalog.Test plan
TestReconcileReadyPatchesCRWhenIcebergFlipped{On,Off}covering both drift directions.TestReconcileReadyNoDriftDoesNotPatchstill passes (no spurious patch).go test -tags kubernetes ./controlplane/...green.go build ./...andgo build -tags kubernetes ./...clean.🤖 Generated with Claude Code