feat(tests/integration): Phase 0 harness + smoke (envtest + linstor CLI)#5
feat(tests/integration): Phase 0 harness + smoke (envtest + linstor CLI)#5kvaps wants to merge 1 commit into
Conversation
Build the Tier 2 integration-test scaffold per docs/test-strategy.md so the 12 Phase 1 group agents have a working harness to extend. The scaffold boots envtest + the full controller-runtime manager + the LINSTOR REST server on a free port, drives it with the real upstream `linstor` CLI, and ships an in-process satellite mock that writes the healthy steady-state Status fields (Ready, FreeCapacity, UpToDate) without shelling out. Files all guarded by `//go:build integration` so `go test ./...` without the tag is unaffected. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Cherry-picked harness files into main directly (dev workflow, no PR review). Playbook revert dropped — keeping the no-PR convention. |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive Tier 2 integration test harness for the blockstor project, featuring an envtest setup, a controller-runtime manager with all reconcilers wired, a LINSTOR-compatible REST server, and a satellite mock. The review feedback highlights two necessary improvements for the satellite mock: implementing periodic heartbeat updates to prevent nodes from being marked offline during long tests and refining the storage pool capacity defaulting logic to allow for the simulation of completely full pools.
| if node.Status.ConnectionStatus == desiredStatus && hasReadyTrue(node.Status.Conditions) { | ||
| continue | ||
| } |
There was a problem hiding this comment.
The current logic skips updating the node status if it is already Online and Ready. This means LastHeartbeatTime is only set once and never updated again. If the NodeHeartbeatReconciler (which is wired in the manager) enforces a heartbeat timeout, nodes might be marked as offline during long-running tests. Consider updating the heartbeat timestamp periodically (e.g., every few seconds) even if the status hasn't changed.
| free := pool.Status.FreeCapacity | ||
| if free == 0 { | ||
| free = total | ||
| } |
There was a problem hiding this comment.
This defaulting logic makes it impossible to simulate a completely full storage pool (where FreeCapacity is 0). If a test intentionally sets FreeCapacity to 0, the mock will overwrite it with the total capacity. Consider applying these defaults only when the status is uninitialized (e.g., when TotalCapacity is 0).
| free := pool.Status.FreeCapacity | |
| if free == 0 { | |
| free = total | |
| } | |
| free := pool.Status.FreeCapacity | |
| if free == 0 && pool.Status.TotalCapacity == 0 { | |
| free = total | |
| } |
…ug 327) User-reported, repeat #5: after `linstor r d <node> <rd>` followed by `linstor r c <node> <rd>` (no --diskless flag), the new replica came back as Diskless with DRBD connection state stuck in Connecting. Root cause: `createOneResource` persisted the wire body verbatim. The LINSTOR Python CLI posts a bare ResourceCreate with no flags AND no StorPoolName, relying on upstream CtrlRscCrtApiHelper to resolve the pool from the parent RG's SelectFilter.StoragePool before staging. The pre-fix blockstor handler skipped that resolution, so the satellite saw a Resource with no DISKLESS flag AND no StorPoolName and dispatched the replica with an empty pool, bringing the DRBD slot up as `disk none;` (Diskless). Fix mirrors upstream: when the request omits a pool AND the new replica is not explicitly DISKLESS, resolve one from the same fallback chain Bug 260 uses for the witness-takeover path — sibling diskful replica first, parent RG SelectFilter.StoragePool second — and stamp it onto the persisted Resource before the store Create. A TIE_BREAKER peer on another node MUST NOT leak its DISKLESS + TIE_BREAKER flags into a new diskful spawn; the resolver only mines peers for their StorPoolName, never copies flags. Regression test: TestBug327ResourceCreateOnNodeWithExistingTieBreakerProducesDiskful locks in the contract that `r c <node> <rd>` with no flag MUST produce a diskful replica when an existing TIE_BREAKER peer is on another node. Companion tests cover the sibling-fallback path (RG without a default pool) and the inverse invariant (explicit --diskless must NOT be silently stripped). All three fail on the broken code; all pass after fix. User explicitly asked for test coverage — bug has recurred five times; the regression test prevents the next re-introduction. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Summary
Build the Tier 2 integration-test scaffold per
docs/test-strategy.mdso the 12 Phase 1 group agents have a working harness to extend.The scaffold boots envtest + the full controller-runtime manager + the LINSTOR REST server on a free 127.0.0.1 port, drives it with the upstream
linstorPython CLI (--machine-readable), and ships an in-process satellite mock that writes the healthy steady-state Status fields (Node Ready, StoragePool FreeCapacity, Resource UpToDate) without ever shelling out. Every file undertests/integration/is guarded by//go:build integrationso the defaultgo test ./...flow is untouched.What's in this PR
tests/integration/harness/envtest.go— bootsenvtest.Environment{}with the CRDs fromconfig/crd/bases/; readsKUBEBUILDER_ASSETSand fails with an actionable message if unsetharness/manager.go— wires every reconcilercmd/controller/main.goregisters, plus the REST server on a free port, plus the satellite mock, and waits for/v1/healthzbefore returningharness/satellite.go— in-process steady-state simulator. Knobs:SimulatePoolMissing(node, sp),SimulateDRBDState(rd, node, state),FailNext(cmd)harness/linstor.go—exec.Command(\"linstor\", \"--controllers\", URL, \"--machine-readable\", args...)with python-traceback / xml.etree / HTTPConnectionPool guards (Bug-59 class).JSON()flattens the[[obj…]]envelope the CLI emitsharness/csi.go— exposespkg/csi-driver.Driverplus the underlyinglapi.Clientagainst the in-process REST URL. See caveat test(wave2): close 5.W03 — DRBD --ping-timeout net-option #1 belowharness/fixtures.go—SeedThreeNodeClustercreates 3 Nodes (worker-1/2/3), 9 StoragePools (lvm-thin/zfs-thin/file × 3), default RG. Idempotentharness/asserts.go—Eventually, genericMustList[L, T], genericMustGet[T],WaitForDRBDStateharness/concurrent.go—RunParallel(t, n, fn)with panic propagation tot.Fatalsmoke_test.go— single testTestSmokeNodeList. Validates the whole stack in one round-tripReconcilers wired into the manager
In strict lockstep with
cmd/controller/main.go:NodeReconcilerNodeHeartbeatReconcilerNodeLabelSyncReconcilerStoragePoolReconcilerResourceGroupReconcilerRGRebalanceReconcilerResourceDefinitionReconcilerResourceReconcilerResourceMigrationReconcilerSnapshotReconcilerAutoSnapshotRunnableAutoEvictReconcilerPlus the
harness.Satellitemock (manager.Runnable) andpkg/rest.Server(manager.Runnable).Validation
```
$ export KUBEBUILDER_ASSETS=$(setup-envtest use --print path 1.34.x)
$ go test -tags=integration -count=1 ./tests/integration/... -run '^TestSmokeNodeList' -v
=== RUN TestSmokeNodeList
--- PASS: TestSmokeNodeList (6.32s)
PASS
ok github.com/cozystack/blockstor/tests/integration 6.352s
? github.com/cozystack/blockstor/tests/integration/harness [no test files]
```
```
$ go build -tags=integration ./tests/integration/...
clean
$ golangci-lint run --build-tags=integration ./tests/integration/...
0 issues.
$ go vet ./...
clean
$ go test -count=1 ./tests/integration/...
go: warning: "./tests/integration/..." matched no packages
no packages to test # confirms: existing go test ./... is unaffected by the new files
```
Caveats / TODOs for Phase 1
cmd/csi-plugin/main.godoes not exist in the repo today;pkg/csi-driver.Driveris alapi.Client-backed shim that the real linstor-csi sidecar wraps.harness.CSIfollows the existing project shape:Driver+ rawlapi.Clientagainst the in-process REST URL. When/if the project grows a gRPC CSI binary,harness.NewCSIcan return a buffered-listener gRPC stack withcsi.IdentityClient/csi.ControllerClient/csi.NodeClientinstead — Phase 1 Group J authors should call this out in their PR if they need itSatellite.FailNextis a no-op today — the mock never shells out, so there is nothing to fail. The slot exists so Phase 1 Group F (Bug 83 reproduction) can declare the intent without churning the harness API. If a Phase 1 test needs real FakeExec wiring through the simulator, that's a harness extension and should come back here, not be inlined into a group fileAutoDiskfulReconcileris not wired becausecmd/controller/main.goitself does not wire it. The wave2/auto-diskful branch landed the reconciler but never plumbed it into the main bootstrap. Mirroring main exactly (per the task brief) means this harness has the same gap; the production binary's behavior is what these tests need to assert againstStatus.DrbdState(the existing CRD field).WaitForDRBDState(t, stack, rd, node, want)does the rd+node→Resource-name resolution via the CEL-pinned<rd>.<node>conventionTest plan
go build -tags=integration ./tests/integration/...is cleanKUBEBUILDER_ASSETS=… go test -tags=integration -count=1 ./tests/integration/... -run '^TestSmokeNodeList'passes locallygolangci-lint run --build-tags=integration ./tests/integration/...reports 0 issuesgo test ./...(no build tag) is unaffected: integration files compile only with-tags=integration