Skip to content

update install: defer to external supervisor (systemd) instead of spawning a competing detached daemon #82

@Rinse12

Description

@Rinse12

Problem

bitsocial update install (default --restart-daemons) restarts the daemon by stopping it and spawn(\"bitsocial\", [\"daemon\", ...], { detached: true }) — a process no supervisor owns. On a host where the daemon is managed by systemd, this detached daemon competes with systemd for the PKC RPC port (ws://localhost:9138). systemd's own bitsocial.service then can never bind, exits with "PKC RPC port is already in use", and Restart=on-failure restarts it forever — an infinite crash loop that churns daemon log files (capacity 5) every few seconds.

Root cause

src/cli/commands/update/install.ts has no notion of who supervises the daemon, so it always tears it down and re-spawns it itself, escaping systemd.

Fix

Make update install supervisor-aware, keyed off the daemon's own state file:

  1. Daemon records its supervisor at startup (src/cli/commands/daemon.ts + src/common-utils/daemon-state.ts). systemd sets \$INVOCATION_ID for every service it spawns; the unit name is parsed from /proc/self/cgroup. Stored as supervisor?: { type: \"systemd\"; unit: string } on DaemonState (absent ⇒ unsupervised/standalone/legacy).
  2. update install routes each daemon by supervisor:
    • Supervised ⇒ don't pre-stop, don't detached-spawn; swap the binary, then systemctl restart <unit> (systemd owns the new process, no port race — systemd waits for the whole cgroup, including kubo, to exit before restart).
    • Unsupervised ⇒ current behavior unchanged (SIGINT + wait + detached spawn).
    • Legacy daemons (no supervisor field, started before this change) ⇒ fall back to inferring the unit from /proc/<pid>/cgroup.
  3. Off-Linux / no /proc ⇒ falls back to current behavior (zero regression).

Tests

  • Pure cgroup parser: system.slice/bitsocial.service ⇒ unit; user.slice/.../session-NN.scope ⇒ none; cgroup v1 multi-line; templated foo@1.service; missing file.
  • detectSelfSupervisor: no INVOCATION_ID ⇒ none; set + service cgroup ⇒ unit; set + scope cgroup ⇒ none.
  • resolveDaemonSupervisor: state field wins; legacy fallback to cgroup.
  • Restart routing (injectable lifecycle): supervised ⇒ restartManaged (systemctl), never detached spawn / pre-stop; unsupervised ⇒ stop + detached spawn; mixed ⇒ both; dedup per unit.
  • State round-trip preserves supervisor.

Notes

A StartLimit guardrail on the unit (caps the loop) was considered and explicitly declined in favor of fixing the root cause in the CLI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions