Skip to content

[bundle] Report deploy operations to Deployment Metadata Service (step 5)#5357

Draft
shreyas-goenka wants to merge 6 commits into
mainfrom
shreyas-goenka/cli-op-reporting
Draft

[bundle] Report deploy operations to Deployment Metadata Service (step 5)#5357
shreyas-goenka wants to merge 6 commits into
mainfrom
shreyas-goenka/cli-op-reporting

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

Summary

Step 5 of the DMS CLI integration plan (see #4856 for the kitchen-sink reference). Stacked on top of #5355 (state-read from DMS) — this branch was cut from that PR's HEAD and should be reviewed after it merges.

When DATABRICKS_BUNDLE_MANAGED_STATE is set, the direct engine now reports per-resource Create/Update/Delete operations to the deployment metadata service inline with apply. With this in place the DMS server has the full picture of which resources exist for a deployment, which closes the loop with the state-read path from step 4.

  • bundle/deploy/lock/operation_reporter.go — new asyncReporter ships operationEvents to DMS from a single sender goroutine fed by a buffered channel (buffer size = direct.defaultParallelism so apply workers rarely block). planActionToOperationAction maps the deploy-plan action enum onto sdkbundle.OperationActionType; Skip → empty (no DMS operation), unknowns → error.
  • bundle/deploy/lock/operation_reporter_test.go — exhaustive mapping test covering every ActionType including Skip, Undefined, and an unknown garbage value.
  • bundle/deploy/lock/deployment_metadata_service.go — start the async reporter on Acquire, Close() it in Release before CompleteVersion so every event lands under the right version_id.
  • bundle/direct/pkg.go — new OperationReporter type and DeploymentBundle field; left nil when DMS is off so the file-state path is unaffected.
  • bundle/direct/bundle_apply.go — call the reporter inline after each Create/Update/Delete (success or failure). Delete reports resource_id only (no state payload), failed operations report the error message with OPERATION_STATUS_FAILED. Migrate+DMS combo is rejected up-front.
  • bundle/deployplan/plan.go, bundle/direct/bundle_plan.go — comment-only notes that Lineage/Serial don't apply under DMS.
  • acceptance/bundle/dms/{add-resources,sequential-deploys,deploy-error} — three new acceptance tests exercising CREATE, DELETE, and FAILED operations end-to-end against the fake DMS server.
  • acceptance/bundle/dms/plan-and-summary — updated golden output now that the deploy reports a CREATE, so plan/summary see the resource as already-deployed.
  • acceptance/bin/print_requests.py — open the recorded-requests file with encoding="utf-8" to handle non-ASCII bytes in the trace.

Error handling

Operation reporting matches the heartbeat behaviour from step 3 (PR #5331): DMS-side delivery errors are logged and the deploy continues. The reporter only fails the deploy when the deploy context is cancelled mid-send. The buffer (size 10) gives the sender a small cushion against transient DMS slowness without serializing the deploy.

Out of scope

  • Bind/unbind support under DMS — still rejected with an error at lock acquisition.
  • --plan under DMS — server-side version ordering already exists, but the lineage/serial bridge isn't built; deferred per the split plan.

Test plan

  • go test ./bundle/direct/... ./bundle/deploy/lock/... ./bundle/deployplan/... green (incl. new TestPlanActionToOperationAction).
  • go test ./acceptance -run "TestAccept/bundle/dms" green across all five DMS acceptance tests (add-resources, sequential-deploys, deploy-error, plan-and-summary, release-lock-error).
  • Verify CI green once the step-4 PR ([bundle] Read deployment state from Deployment Metadata Service (step 4) #5355) merges and this PR is rebased.

This pull request and its description were written by Isaac.

Pure code-movement refactor. Wraps the existing workspace-filesystem lock
behavior behind a DeploymentLock interface so a follow-up PR can introduce
an alternative metadata-service-backed lock implementation without
touching deploy/destroy/bind callers again.

What changed:
- New bundle/deploy/lock/lock.go: DeploymentLock interface, Goal enum
  (moved from release.go), DeploymentStatus enum, and a
  NewDeploymentLock factory that unconditionally returns the workspace
  filesystem implementation.
- New bundle/deploy/lock/workspace_filesystem.go: workspaceFilesystemLock
  struct that implements DeploymentLock. Preserves the historical
  behavior of the deleted acquire.go / release.go mutators: lock-disabled
  short-circuit, locker.CreateLocker initialization, the
  permissions.ReportPossiblePermissionDenied branch on fs.ErrPermission
  / fs.ErrNotExist, and the destroy-mode locker.AllowLockFileNotExist
  unlock quirk.
- Deleted bundle/deploy/lock/acquire.go and bundle/deploy/lock/release.go.
- Updated bundle/phases/{deploy,destroy,bind}.go to construct the lock
  once via NewDeploymentLock and call Acquire / Release directly instead
  of through bundle.ApplyContext. The deferred Release now reports
  DeploymentSuccess / DeploymentFailure based on logdiag.HasError so a
  future DMS-backed implementation can record the outcome.

Behavior is preserved end-to-end: lock-related acceptance goldens
(pipelines/{deploy,destroy}/force-lock, bundle/help/bundle-{deploy,
destroy}) all pass unchanged.
…ent lock

Wires up a new env-var gate (DATABRICKS_BUNDLE_MANAGED_STATE) that switches
the deployment lock from the workspace filesystem to the bundle deployment
metadata service (DMS). The env var accepts the usual boolean spellings
(true/false, 1/0, yes/no, on/off); the historical filesystem lock is the
default.

The DMS-backed lock uses the SDK's databricks-sdk-go/service/bundle client
(merged via #5311, available since v0.135.0) — no hand-rolled DMS client.
Acquire calls CreateDeployment / CreateVersion and starts a background
heartbeat goroutine; Release stops the heartbeat and calls CompleteVersion
(plus DeleteDeployment on successful destroy). Bind and Unbind are not yet
supported under DMS and return an error at lock construction.

Deployment ID persistence lives inside the lock package for now
(managed_service.json in the workspace state dir). In step 4 of the DMS
split it will move to bundle/statemgmt so it can be shared with the
state-from-DMS path.

Co-authored-by: Isaac
Step 4 of the DMS CLI integration plan. When DATABRICKS_BUNDLE_MANAGED_STATE
is set, resource state is now fetched from the deployment metadata service
on the server rather than the workspace-side resources.json file:

- statemgmt/state_dms.go: new LoadStateFromDMS reads resources via the SDK
  Bundle.ListResourcesAll and seeds the in-memory state DB via
  DeploymentState.OpenWithData (no local file is touched).
- statemgmt/state_pull.go: when DMS is active, pull only the deployment_id
  pointer from managed_service.json and short-circuit. The full state is
  loaded later via LoadStateFromDMS so file-state lineage/serial logic stays
  off the DMS path entirely.
- statemgmt/state_push.go: no-op under DMS — the server owns the state,
  there is nothing useful to upload.
- cmd/bundle/utils/process.go: route through LoadStateFromDMS instead of
  StateDB.Open when DMS is on; reject --plan under DMS because plan-vs-state
  lineage checks don't translate to the server's version-based locking.
- bundle/bundle.go: add Bundle.DeploymentID, the in-memory carrier for the
  DMS deployment record id (populated by either state pull or lock acquire).
- statemgmt/managed_service_json.go: promote managed_service.json constant
  and struct out of the lock package so the state-read path can share them.
- deploy/lock/deployment_metadata_service.go: switch to the exported names
  and publish b.DeploymentID after acquiring the lock so subsequent reads
  in the same process see it without re-reading the file.

Operation reporting (writing per-resource operations back to DMS during
deploy) is intentionally deferred to step 5; today the resources list
returned by ListResources is empty until that lands.

The release-lock-error acceptance fixture loses the "Updating deployment
state..." line because PushResourcesState is now a no-op under DMS. A new
plan-and-summary acceptance test exercises the full read path end-to-end
and verifies both bundle plan and bundle summary hit ListResources.
…ient

Adds three table-style tests:
  * NoDeploymentID — empty b.DeploymentID short-circuits without any RPC and
    leaves the in-memory state DB safe to read (ExportState returns empty).
  * PopulatesFromList — resources land under "resources.<key>" with their
    ID and State preserved verbatim; the nil-State path is exercised too.
  * ListError — DMS-side failures are wrapped with the package-level prefix
    so callers can tell them apart from local-filesystem errors.

Together these cover the three branches a code reviewer would otherwise have
to read the source to verify.
Wire per-resource Create/Update/Delete outcomes from the direct engine
through to the deployment metadata service (DMS), completing step 5 of
the DMS CLI integration plan.

* bundle/deploy/lock/operation_reporter.go — new asyncReporter ships
  operationEvents to DMS from a single sender goroutine fed by a buffered
  channel (size matches direct.defaultParallelism so apply workers
  rarely block). planActionToOperationAction maps the deploy-plan action
  enum onto sdkbundle.OperationActionType.
* bundle/deploy/lock/operation_reporter_test.go — exhaustive mapping
  test covering every ActionType (incl. Skip → empty, Undefined → error).
* bundle/deploy/lock/deployment_metadata_service.go — start the reporter
  on Acquire and Close() it in Release before CompleteVersion so all
  per-resource events land under the right version_id.
* bundle/direct/pkg.go — new OperationReporter type and DeploymentBundle
  field; left nil when DMS is off so the file-state path is unaffected.
* bundle/direct/bundle_apply.go — call the reporter inline after each
  Create/Update/Delete (success or failure); skip-actions are filtered
  out at the sender, deletes report resource_id only, failures carry the
  error message and no state payload.
* bundle/deployplan/plan.go, bundle/direct/bundle_plan.go — comment-only
  notes that Lineage/Serial don't apply under DMS.
* acceptance/bundle/dms/{add-resources,sequential-deploys,deploy-error}
  — three new acceptance tests exercising CREATE, DELETE, and FAILED
  operations end-to-end against the fake DMS server.
* acceptance/bundle/dms/plan-and-summary — updated golden output now
  that the deploy reports a CREATE, so plan/summary see the resource
  already deployed.
* acceptance/bin/print_requests.py — open recorded-requests file with
  utf-8 to handle non-ASCII bytes in the trace.

Co-authored-by: Isaac
Migrate mode rewrites local state in-place and never calls resource
adapters, so there are no operations to report. Without this check, a
DMS-enabled migrate would silently leave the server-side resource
inventory empty.

Co-authored-by: Isaac
@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

Commit: d98818d

Run: 26549065531

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants