Problem
bitsocial update install (default --restart-daemons) restarts the daemon by stopping it and spawn(\"bitsocial\", [\"daemon\", ...], { detached: true }) — a process no supervisor owns. On a host where the daemon is managed by systemd, this detached daemon competes with systemd for the PKC RPC port (ws://localhost:9138). systemd's own bitsocial.service then can never bind, exits with "PKC RPC port is already in use", and Restart=on-failure restarts it forever — an infinite crash loop that churns daemon log files (capacity 5) every few seconds.
Root cause
src/cli/commands/update/install.ts has no notion of who supervises the daemon, so it always tears it down and re-spawns it itself, escaping systemd.
Fix
Make update install supervisor-aware, keyed off the daemon's own state file:
- Daemon records its supervisor at startup (
src/cli/commands/daemon.ts + src/common-utils/daemon-state.ts). systemd sets \$INVOCATION_ID for every service it spawns; the unit name is parsed from /proc/self/cgroup. Stored as supervisor?: { type: \"systemd\"; unit: string } on DaemonState (absent ⇒ unsupervised/standalone/legacy).
update install routes each daemon by supervisor:
- Supervised ⇒ don't pre-stop, don't detached-spawn; swap the binary, then
systemctl restart <unit> (systemd owns the new process, no port race — systemd waits for the whole cgroup, including kubo, to exit before restart).
- Unsupervised ⇒ current behavior unchanged (SIGINT + wait + detached spawn).
- Legacy daemons (no
supervisor field, started before this change) ⇒ fall back to inferring the unit from /proc/<pid>/cgroup.
- Off-Linux / no
/proc ⇒ falls back to current behavior (zero regression).
Tests
- Pure cgroup parser:
system.slice/bitsocial.service ⇒ unit; user.slice/.../session-NN.scope ⇒ none; cgroup v1 multi-line; templated foo@1.service; missing file.
detectSelfSupervisor: no INVOCATION_ID ⇒ none; set + service cgroup ⇒ unit; set + scope cgroup ⇒ none.
resolveDaemonSupervisor: state field wins; legacy fallback to cgroup.
- Restart routing (injectable lifecycle): supervised ⇒
restartManaged (systemctl), never detached spawn / pre-stop; unsupervised ⇒ stop + detached spawn; mixed ⇒ both; dedup per unit.
- State round-trip preserves
supervisor.
Notes
A StartLimit guardrail on the unit (caps the loop) was considered and explicitly declined in favor of fixing the root cause in the CLI.
Problem
bitsocial update install(default--restart-daemons) restarts the daemon by stopping it andspawn(\"bitsocial\", [\"daemon\", ...], { detached: true })— a process no supervisor owns. On a host where the daemon is managed by systemd, this detached daemon competes with systemd for the PKC RPC port (ws://localhost:9138). systemd's ownbitsocial.servicethen can never bind, exits with "PKC RPC port is already in use", andRestart=on-failurerestarts it forever — an infinite crash loop that churns daemon log files (capacity 5) every few seconds.Root cause
src/cli/commands/update/install.tshas no notion of who supervises the daemon, so it always tears it down and re-spawns it itself, escaping systemd.Fix
Make
update installsupervisor-aware, keyed off the daemon's own state file:src/cli/commands/daemon.ts+src/common-utils/daemon-state.ts). systemd sets\$INVOCATION_IDfor every service it spawns; the unit name is parsed from/proc/self/cgroup. Stored assupervisor?: { type: \"systemd\"; unit: string }onDaemonState(absent ⇒ unsupervised/standalone/legacy).update installroutes each daemon by supervisor:systemctl restart <unit>(systemd owns the new process, no port race — systemd waits for the whole cgroup, including kubo, to exit before restart).supervisorfield, started before this change) ⇒ fall back to inferring the unit from/proc/<pid>/cgroup./proc⇒ falls back to current behavior (zero regression).Tests
system.slice/bitsocial.service⇒ unit;user.slice/.../session-NN.scope⇒ none; cgroup v1 multi-line; templatedfoo@1.service; missing file.detectSelfSupervisor: noINVOCATION_ID⇒ none; set + service cgroup ⇒ unit; set + scope cgroup ⇒ none.resolveDaemonSupervisor: state field wins; legacy fallback to cgroup.restartManaged(systemctl), never detached spawn / pre-stop; unsupervised ⇒ stop + detached spawn; mixed ⇒ both; dedup per unit.supervisor.Notes
A
StartLimitguardrail on the unit (caps the loop) was considered and explicitly declined in favor of fixing the root cause in the CLI.