Skip to content

NixOS support#379

Open
mrosseel wants to merge 250 commits into
brickbots:mainfrom
mrosseel:nixos
Open

NixOS support#379
mrosseel wants to merge 250 commits into
brickbots:mainfrom
mrosseel:nixos

Conversation

@mrosseel

Copy link
Copy Markdown
Collaborator

Summary

  • Full NixOS-based system for PiFinder (replaces Raspbian)
  • Declarative system configuration via Nix flake
  • SD card image, netboot, and migration bootstrap tarball builds
  • Software update via nixos-rebuild with GitHub release/PR channels

Test plan

  • Flash SD image and verify boot
  • Test WiFi AP and client mode switching
  • Test software update UI channels
  • Test hostname rename via web UI

🤖 Generated with Claude Code

mrosseel and others added 30 commits February 4, 2026 19:02
- build.yml: single build + Cachix push + unstable channel updates
- release.yml: manual release workflow for stable/beta channels

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The SD image module provides filesystems, but toplevel builds need
a minimal stub to evaluate successfully.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required for NixOS module system to accept devMode setting.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required when module has both options and config sections.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaces FIXME placeholders with actual SRI hashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Uses Pi5 runner when RUNNER_LABELS variable is set, falls back to
ubuntu with QEMU emulation otherwise.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Filter to only Pi 4B device tree (CM4 incompatible with our overlays)
- Use shorthand DTS syntax for PWM overlay

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Untracked file was excluded from Nix flake source tree, causing
"No module named 'PiFinder.sys_utils_base'" on SD card boot.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add camera overlay (imx477) to netboot config.txt via flake.nix
- Fix sys_utils import in main.py to use utils.get_sys_utils()
- Add hip_main.dat fetch to pifinder-src.nix for starfield plotting
- Add dma_heap udev rule for libcamera/picamera2 access
- Fix shared memory naming in solver.py (remove leading /)
- Add DNS nameservers for netboot environment
- Document power control scripts in CLAUDE.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add runtimeCameraSelection option to hardware.nix (default: true)
- SD image includes config.txt with "include camera.txt" directive
- Users can edit camera.txt and reboot to switch cameras
- Supported cameras: imx296, imx290 (imx462), imx477
- Fix cameraDriver scope in hardware.nix (moved to top-level let)
- Add sudoers rules for systemctl stop/start pifinder.service
- Add DMA heap udev rule for libcamera video group access
- Netboot config sets cameraType = "imx477" for HQ camera dev

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Refactor sys_utils modules to use common base class
- Add sys_utils_nixos.py for NixOS-specific implementations
- Add get_sys_utils() detection in utils.py for platform selection
- Add flake.lock for reproducible builds
- Add NetworkManager config to networking.nix
- Add deploy-image-to-nfs.sh for netboot development workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update build.yml CI workflow
- Fix fonts.py import
- Fix marking_menus.py formatting
- Add missing import to preview.py
- Simplify objects_db.py
- Add catalog_imports improvements
- Update pifinder_objects.db

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Switch to NFSv4 with caching disabled (noac, actimeo=0)
- Disable auto-optimise-store in devMode (hard links fail on NFS)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ServerAliveInterval/CountMax to prevent timeout during transfers
- Use rsync -R (relative) to preserve directory structure correctly

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comets.txt is downloaded at runtime and must be in a writable
location, not the read-only Nix store.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extend eth0 wait to 30 seconds with debug output
- Wait for link carrier before DHCP
- Add DHCP retries (3 attempts)
- Add LIBCAMERA_IPA_MODULE_PATH to pifinder service environment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restore SUBSYSTEM=="pwm" udev rule that was accidentally removed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Turns on keypad LEDs during sysinit for early visual boot feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- boot-splash.c: displays welcome image with scanning animation
- Starts at sysinit, stops when pifinder.service starts
- Much faster than Python splash

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove nixos-hardware module (saves 659MB linux-firmware)
- Fetch nixos-rebuild at runtime (saves ~500MB llvm/nix deps)
- Remove git from systemPackages (nix has built-in git for flakes)

Target: ~150MB vs current 1.7GB
- Remove default packages (vim, nano, etc)
- Disable polkit, udisks2, speechd
- Should reduce closure significantly
NetworkManager-vpnc alone has 1.1GB closure (webkitgtk, llvm, etc).
Disable all NM plugins for bootstrap - we just need WiFi.
mrosseel added 3 commits June 25, 2026 16:23
A trusted build (secrets + write token) is the only thing that can push to
the cache and write the nixos-manifest branch — a fork PR never can. Restore
the push trigger but gate the build jobs on github.repository so it acts only
in brickbots: a mrosseel push still builds nothing, while a merge to brickbots
builds and publishes (attic push + update-manifest, whose fork-PR skip doesn't
apply to pushes).
Adopt brickbots' canonical CI + the trusted testable-PR builder from brickbots#493
instead of the nixos branch's divergent workflows:
  + nixos-pr-build.yml  (pull_request_target, label-gated, trusted)
  + nox.yml             (brickbots standard lint/test)
  ~ web-integration-tests.yml  (match brickbots main)
  - build.yml, lint.yml, release.yml  (nixos-only; superseded)
publish_manifest.sh / update_manifest.py already match brickbots#493.
@mrosseel mrosseel added testable Ready for testing via PiFinder software update and removed testable Ready for testing via PiFinder software update labels Jun 25, 2026
mrosseel added 2 commits June 26, 2026 00:21
…over rotation

The cutover recreated the Attic cache with a fresh key (Vkem), but nothing
deployed trusted it — the whole 8UU fleet got stranded. Attic can't dual-sign
(serves only its own key), so the fix is to put the cache back on 8UU (done
server-side) and revert the config to match. pifinder-release key kept (new
cache, never previously trusted).
It holds always-on device config (avahi, hostname, substituters/keys, sudo),
used long after the Debian→NixOS migration — the name was misleading. Update
the flake import and the RELEASE.md reference. Pure rename, no content change.
@mrosseel mrosseel force-pushed the nixos branch 7 times, most recently from 8920417 to c2bdfcf Compare June 26, 2026 08:49
mrosseel and others added 2 commits June 26, 2026 10:50
…ostname

nixos_upgrade.py:
- Download progress now shows a size bar that moves *within* a path and names
  the package being copied, parsed live from nix's internal-json (resProgress
  byte events summed over copyPath activities; denominator = the dry-run
  'unpacked' total, since Attic narinfos omit a compressed FileSize). All
  accounting is throttled and wrapped so a counter bug can never stall the
  stream or abort the upgrade; only a short log tail is kept (not the
  ~800k-line stream). Dropped the now-dead per-path path-info query, the size
  map, and write_sizes_file/UPGRADE_SIZES_FILE.
- key-proof: fetch each cache's current signing key from its anonymous Attic
  cache-config endpoint and trust it for the pull (extra-trusted-public-keys;
  verification stays on) so a cache key rotation can't strand the fleet.

networking.nix: own avahi here (the module shared by the running system and the
migration build); fix the boot race (NM dispatcher re-scans avahi on connect)
and make the PiFinder_data hostname stick — hostname-mode=none plus the
dispatcher re-asserting hostname + avahi-set-host-name (NixOS bakes
host-name=<static> into avahi's config, which a restart would otherwise revert).

sys_utils.py / software.py: parse and show the package label under the bar.
…crashing

The external observing-lists feature added `list_descriptions` to
CompositeObject, but CACHE_VERSION was not bumped. Devices upgrading from the
prior release keep their existing composite_objects.pkl, whose unpickled
objects lack the new field (dataclass defaults are not applied on unpickle),
so opening any object's details crashes the whole app:

    AttributeError: 'CompositeObject' object has no attribute 'list_descriptions'

The main process hosts the multiprocessing shared-state manager, so its death
cascades BrokenPipe/connection-reset into every worker — the symptom seen in
the logs; the real cause was masked because main()'s handler logs via the
multiprocess queue and then os._exit()s before the record is written.

- catalog_cache: bump CACHE_VERSION 1 -> 2 so pre-list_descriptions caches are
  rebuilt on upgrade (the real fix for deployed devices).
- composite_object: getattr guard in composed_sections so a stale-cached object
  degrades gracefully instead of taking down the process.
- main: print + flush the traceback before os._exit so a fatal exception in
  main() lands in the journal instead of being lost to the log queue.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
mrosseel and others added 4 commits June 26, 2026 12:01
Camera switching only changed the persisted choice and ran
switch-to-configuration boot, but the generic-extlinux builder always writes
DEFAULT=nixos-default (the base camera) and device-tree overlays load only at
boot. So a device set to imx477 kept booting the base imx462 DTB, the imx290
driver bound to absent hardware (Error writing reg 0x3038), and the camera
never worked.

Fix A keeps the one-image/no-rebuild specialisation design and just makes the
chosen specialisation the boot default:
- set-extlinux-default: fail-safe helper that repoints extlinux DEFAULT to the
  latest-generation nixos-<gen>-<camera> entry (base camera -> nixos-default);
  leaves a bootable DEFAULT untouched if the entry is missing.
- pifinder-switch-camera: repoint DEFAULT to the chosen camera, then reboot
  (DT overlays are boot-only).
- nixos_upgrade: re-apply the persisted camera's DEFAULT after activation,
  before the upgrade reboot (every rebuild resets DEFAULT to base).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
The ADR said device update downloads ship only the genuinely-new chunks
(~80 MB for a 1.5 GB closure). That conflates server-side storage / CI-upload
dedup with the device download. Attic serves whole NARs over the standard
binary-cache protocol: a device fetches the full compressed NAR of every
changed store path, with no chunk-delta against the previous version. The only
device-side saving is path-level (unchanged paths are not refetched). True
client-side chunk-delta needs a casync/desync client with a local chunk store.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
pifinder-src is rebuilt (new store hash) on every code change, and Attic ships
whole NARs — so each change re-downloads ~43MB and rewrites ~70MB to the SD,
even for a one-line edit. 56MB of that is stable: fonts (~31MB) and the pinned
tetra3/cedar-solve solver (~25MB, ~15MB after trimming examples/tests/docs).

Move both into their own derivations (like astro-data) and symlink them into
pifinder-src. A routine code change now rewrites only the ~16MB code path;
fonts/tetra3 are distributed once and shared across changes. tetra3 keeps
cedar_detect_pb2 (ships in the repo) and is pre-compiled; it is symlinked after
compileall so bytecode isn't written into the read-only store path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
Adds a 'Rollback' channel next to stable/beta/unstable that lists the on-disk
system generations you can roll back to (all but the current one).

- sys_utils.list_rollback_targets(): reads ONLY immutable generation data (the
  /nix/var/nix/profiles symlinks + store-path labels) — no sidecar JSON state to
  evolve or corrupt across up/downgrades. Entry = label + generation + date.
- software.py: the Rollback channel is built locally, so it's available even
  when the manifest fetch fails — i.e. exactly when you're stranded on a bad
  build. Select -> confirm -> reuses update_software() (the entry's ref is the
  generation's store path, so it activates + reboots with no download).
- nixos_upgrade.cleanup_old_generations: keep +3 (was +2) -> 2 rollback targets.

Labels currently come from the store-path name; a follow-up can set
system.nixos.label at build time for 'PR-379'-style names (and matching
bootloader entries).
Comment thread nixos/networking.nix
{ config, lib, pkgs, ... }:
{
networking = {
hostName = "pifinder";

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does networking.nix hardcode the hostName and revert it every time nix is invoked? (Cf. Hashed password vs. initialPassword)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the hostName is cared for later on. Please confirm

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, yes the hostname is stored in the pifinder data area

Comment thread nixos/networking.nix
# NTP server can't block the clock. FallbackNTP alone is skipped whenever a
# per-interface server is known — too fragile to rely on for first-boot
# migration, which gates the binary-cache fetch on a synchronized clock.
services.timesyncd.servers = [

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does timesyncd support sourcing from GPS time?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, was this a feature we had before? If not I propose to fix that in a PR against this branch. It's already a big migration , want to wrap it up asap

Comment thread nixos/networking.nix

[wifi]
mode=ap
ssid=PiFinderAP

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does PiFinder with nix handle changes of the WiFi AP name? Is this overwritten every time a new version is selected and downloaded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testable Ready for testing via PiFinder software update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants