Skip to content

cocoon gc: surface per-module summary (scanned / collected / freed) #33

@CMGS

Description

@CMGS

What

`cocoon gc` currently emits a single line — `GC completed` — regardless of whether it scanned 7 modules and freed 30 GB or did literally nothing. There is no signal in the default log path telling the operator what happened.

Repro on the testbed: `/var/lib/cocoon` was 38 GB, all blobs/dirs/snapshots referenced by their respective indices. `cocoon gc` finished in 5 ms with the single `GC completed` line. From the operator side it is indistinguishable from "GC ran but found no targets" vs "GC short-circuited because of a bug".

Why

Operators reach for `cocoon gc` when disk fills up. If GC truly has nothing to delete (because every blob is index-referenced), the operator should be told that — so they know to `image rm` first instead of poking at GC further. Conversely, if GC reclaims something, they want to see the bytes / objects.

Where the data already exists

`gc/orchestrator.go::Run` already iterates the modules with full per-module knowledge:

```go
// Phase 2: resolve deletion targets.
targets := make(map[string][]string)
for _, m := range locked {
if ids := m.resolveTargets(...); len(ids) > 0 {
targets[m.getName()] = ids
}
}

// Phase 3: collect (skip modules with no targets).
for _, m := range locked {
ids := targets[m.getName()]
if len(ids) == 0 {
continue
}
if err := m.collect(ctx, ids); err != nil { ... }
}
```

So we already know per module: `name`, `len(ids) before collect`, `collect error`. We do not yet have bytes freed; that would require either pre-stat'ing the targets or having `collect` return a delta.

Proposed UX

INFO-level output (default), one line per module that did something or was skipped:

```
gc oci: 0 orphan blobs (16 referenced)
gc cloudimg: 0 orphan blobs (3 referenced)
gc snapshot: 1 stale-pending reclaimed
gc cloudhypervisor: 2 orphan run dirs reclaimed (vm IDs: …)
gc completed: 4 modules, 3 objects collected
```

If a module is skipped because its lock was busy:

```
gc oci: skipped (lock busy)
gc aborted: modules skipped (lock busy): oci
```

(`Run` already returns this as an error; we just want the summary log to mirror it.)

Suggested implementation

  1. Make each `gc.Module[S].Collect` return `(int, error)` — count of actually deleted objects (some Collect impls walk a list and skip-on-error; the count returned should reflect successful deletions).
  2. `Orchestrator.Run` accumulates per-module `(name, scanned, collected)` into a small struct and logs them at INFO before returning.
  3. Optional: have each module's `ReadDB` / `Resolve` also surface the "referenced N" count for the noop case so the log is informative when nothing was orphan.

Out of scope: bytes-freed accounting (would require stat'ing every target, slows GC; can be added later as DEBUG-only if useful).

Priority

Low — purely a UX / observability fix. No correctness bug, no behavior change, no API churn beyond `Module.Collect` return type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions