Skip to content

[WIP] Gaia charts#362

Open
mrosseel wants to merge 252 commits into
brickbots:mainfrom
mrosseel:deepchart
Open

[WIP] Gaia charts#362
mrosseel wants to merge 252 commits into
brickbots:mainfrom
mrosseel:deepchart

Conversation

@mrosseel

@mrosseel mrosseel commented Nov 23, 2025

Copy link
Copy Markdown
Collaborator

Add a gaia achart view to all objects, down to mag 17 but with an auto limiting magnitude taking into account SQM and equipment. This auto limit is overridable.

TODO:

  • further performance optimisations
  • fix missing tile bug
  • get the data < 1Gb
  • choose how to get this data to the users
  • 90 degree rotation bug?

@mrosseel mrosseel changed the base branch from release to main November 23, 2025 22:49
mrosseel and others added 29 commits February 4, 2026 19:08
Required for NixOS module system to accept devMode setting.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required when module has both options and config sections.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaces FIXME placeholders with actual SRI hashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Uses Pi5 runner when RUNNER_LABELS variable is set, falls back to
ubuntu with QEMU emulation otherwise.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Filter to only Pi 4B device tree (CM4 incompatible with our overlays)
- Use shorthand DTS syntax for PWM overlay

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Untracked file was excluded from Nix flake source tree, causing
"No module named 'PiFinder.sys_utils_base'" on SD card boot.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add camera overlay (imx477) to netboot config.txt via flake.nix
- Fix sys_utils import in main.py to use utils.get_sys_utils()
- Add hip_main.dat fetch to pifinder-src.nix for starfield plotting
- Add dma_heap udev rule for libcamera/picamera2 access
- Fix shared memory naming in solver.py (remove leading /)
- Add DNS nameservers for netboot environment
- Document power control scripts in CLAUDE.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add runtimeCameraSelection option to hardware.nix (default: true)
- SD image includes config.txt with "include camera.txt" directive
- Users can edit camera.txt and reboot to switch cameras
- Supported cameras: imx296, imx290 (imx462), imx477
- Fix cameraDriver scope in hardware.nix (moved to top-level let)
- Add sudoers rules for systemctl stop/start pifinder.service
- Add DMA heap udev rule for libcamera video group access
- Netboot config sets cameraType = "imx477" for HQ camera dev

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Refactor sys_utils modules to use common base class
- Add sys_utils_nixos.py for NixOS-specific implementations
- Add get_sys_utils() detection in utils.py for platform selection
- Add flake.lock for reproducible builds
- Add NetworkManager config to networking.nix
- Add deploy-image-to-nfs.sh for netboot development workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update build.yml CI workflow
- Fix fonts.py import
- Fix marking_menus.py formatting
- Add missing import to preview.py
- Simplify objects_db.py
- Add catalog_imports improvements
- Update pifinder_objects.db

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Switch to NFSv4 with caching disabled (noac, actimeo=0)
- Disable auto-optimise-store in devMode (hard links fail on NFS)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ServerAliveInterval/CountMax to prevent timeout during transfers
- Use rsync -R (relative) to preserve directory structure correctly

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comets.txt is downloaded at runtime and must be in a writable
location, not the read-only Nix store.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extend eth0 wait to 30 seconds with debug output
- Wait for link carrier before DHCP
- Add DHCP retries (3 attempts)
- Add LIBCAMERA_IPA_MODULE_PATH to pifinder service environment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restore SUBSYSTEM=="pwm" udev rule that was accidentally removed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Turns on keypad LEDs during sysinit for early visual boot feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- boot-splash.c: displays welcome image with scanning animation
- Starts at sysinit, stops when pifinder.service starts
- Much faster than Python splash

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove nixos-hardware module (saves 659MB linux-firmware)
- Fetch nixos-rebuild at runtime (saves ~500MB llvm/nix deps)
- Remove git from systemPackages (nix has built-in git for flakes)

Target: ~150MB vs current 1.7GB
- Remove default packages (vim, nano, etc)
- Disable polkit, udisks2, speechd
- Should reduce closure significantly
NetworkManager-vpnc alone has 1.1GB closure (webkitgtk, llvm, etc).
Disable all NM plugins for bootstrap - we just need WiFi.
- Disable xdg.mime/icons/sounds (pulls xdg-utils -> perl 112MB)
- Disable command-not-found (pulls perl)
- Disable fuse (86MB)
- Disable initrd extra filesystems
mrosseel and others added 6 commits June 24, 2026 18:51
The update-manifest job branched nixos-manifest off the full source
checkout and only git-added the manifest, so the branch inherited the
entire 1400+ file source tree and history. Concurrent trunk/PR stamps
also clobbered or failed each other on the single update-manifest.json.

Extract publishing into .github/scripts/publish_manifest.sh, shared by
build.yml and release.yml:
- rebuild nixos-manifest as a single-file orphan tree every run
- fetch -> re-apply this run's entry -> retry on a rejected push, so
  concurrent writers cannot lose an update (a git ref update is CAS)
Add a manifest-write concurrency group on the build update-manifest job
as a coarse serializer in front of the retry loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
mrosseel and others added 8 commits June 25, 2026 10:45
…ollapse

nixos-manifest is metadata-only (just update-manifest.json); the find-prune +
add -A was one-time cleanup of the already-collapsed branch. Keep only: fetch
tip, rewrite the entry, push, with the concurrency retry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
cache.pifinder.eu's chunk store moved from local disk to AWS S3 (bucket
pifinder-nix-cache, us-west-1). The cutover used a fresh atticd DB, which
regenerated both caches' NAR signing keypairs. Rotate the pinned dev key
and wire in the now-provisioned pifinder-release key.

  pifinder:         8UU/...jBmE=  ->  Vkem...3gck=
  pifinder-release: (newly pinned)    WG/F...KpoM=

ATTIC_TOKEN is unchanged (RS256 server secret preserved), so CI keeps
working once these land.
A push to main/nixos triggered the whole build pipeline (build-native →
build-emulated → update-manifest → migration-tarball). By design mrosseel
commits must build nothing; only labeled PRs (brickbots contributions)
and pre-releases/releases (release.yml) should build. Drop the `push:`
trigger and the event_name=='push' arms in build-native/native-wait;
update-manifest still follows the build jobs, so it runs on labeled PRs.
…n fork PRs

- software.py: read update-manifest.json from brickbots/PiFinder (canonical),
  not the mrosseel fork — devices were pointed at the wrong repo.
- build.yml build-native: guard 'Push to Attic' on $ATTIC_TOKEN so fork PRs
  (no secrets) build verify-only instead of failing with AccessError, matching
  build-emulated.
- build.yml update-manifest: drop 'ref: <head_ref>' checkout — a fork PR's head
  branch doesn't exist here, so it 404'd; default ref resolves for both.
A trusted build (secrets + write token) is the only thing that can push to
the cache and write the nixos-manifest branch — a fork PR never can. Restore
the push trigger but gate the build jobs on github.repository so it acts only
in brickbots: a mrosseel push still builds nothing, while a merge to brickbots
builds and publishes (attic push + update-manifest, whose fork-PR skip doesn't
apply to pushes).
Adopt brickbots' canonical CI + the trusted testable-PR builder from brickbots#493
instead of the nixos branch's divergent workflows:
  + nixos-pr-build.yml  (pull_request_target, label-gated, trusted)
  + nox.yml             (brickbots standard lint/test)
  ~ web-integration-tests.yml  (match brickbots main)
  - build.yml, lint.yml, release.yml  (nixos-only; superseded)
publish_manifest.sh / update_manifest.py already match brickbots#493.
@mrosseel mrosseel added testable Ready for testing via PiFinder software update and removed testable Ready for testing via PiFinder software update labels Jun 25, 2026
mrosseel and others added 10 commits June 26, 2026 00:21
…over rotation

The cutover recreated the Attic cache with a fresh key (Vkem), but nothing
deployed trusted it — the whole 8UU fleet got stranded. Attic can't dual-sign
(serves only its own key), so the fix is to put the cache back on 8UU (done
server-side) and revert the config to match. pifinder-release key kept (new
cache, never previously trusted).
It holds always-on device config (avahi, hostname, substituters/keys, sudo),
used long after the Debian→NixOS migration — the name was misleading. Update
the flake import and the RELEASE.md reference. Pure rename, no content change.
…ostname

nixos_upgrade.py:
- Download progress now shows a size bar that moves *within* a path and names
  the package being copied, parsed live from nix's internal-json (resProgress
  byte events summed over copyPath activities; denominator = the dry-run
  'unpacked' total, since Attic narinfos omit a compressed FileSize). All
  accounting is throttled and wrapped so a counter bug can never stall the
  stream or abort the upgrade; only a short log tail is kept (not the
  ~800k-line stream). Dropped the now-dead per-path path-info query, the size
  map, and write_sizes_file/UPGRADE_SIZES_FILE.
- key-proof: fetch each cache's current signing key from its anonymous Attic
  cache-config endpoint and trust it for the pull (extra-trusted-public-keys;
  verification stays on) so a cache key rotation can't strand the fleet.

networking.nix: own avahi here (the module shared by the running system and the
migration build); fix the boot race (NM dispatcher re-scans avahi on connect)
and make the PiFinder_data hostname stick — hostname-mode=none plus the
dispatcher re-asserting hostname + avahi-set-host-name (NixOS bakes
host-name=<static> into avahi's config, which a restart would otherwise revert).

sys_utils.py / software.py: parse and show the package label under the bar.
…crashing

The external observing-lists feature added `list_descriptions` to
CompositeObject, but CACHE_VERSION was not bumped. Devices upgrading from the
prior release keep their existing composite_objects.pkl, whose unpickled
objects lack the new field (dataclass defaults are not applied on unpickle),
so opening any object's details crashes the whole app:

    AttributeError: 'CompositeObject' object has no attribute 'list_descriptions'

The main process hosts the multiprocessing shared-state manager, so its death
cascades BrokenPipe/connection-reset into every worker — the symptom seen in
the logs; the real cause was masked because main()'s handler logs via the
multiprocess queue and then os._exit()s before the record is written.

- catalog_cache: bump CACHE_VERSION 1 -> 2 so pre-list_descriptions caches are
  rebuilt on upgrade (the real fix for deployed devices).
- composite_object: getattr guard in composed_sections so a stale-cached object
  degrades gracefully instead of taking down the process.
- main: print + flush the traceback before os._exit so a fatal exception in
  main() lands in the journal instead of being lost to the log queue.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
Camera switching only changed the persisted choice and ran
switch-to-configuration boot, but the generic-extlinux builder always writes
DEFAULT=nixos-default (the base camera) and device-tree overlays load only at
boot. So a device set to imx477 kept booting the base imx462 DTB, the imx290
driver bound to absent hardware (Error writing reg 0x3038), and the camera
never worked.

Fix A keeps the one-image/no-rebuild specialisation design and just makes the
chosen specialisation the boot default:
- set-extlinux-default: fail-safe helper that repoints extlinux DEFAULT to the
  latest-generation nixos-<gen>-<camera> entry (base camera -> nixos-default);
  leaves a bootable DEFAULT untouched if the entry is missing.
- pifinder-switch-camera: repoint DEFAULT to the chosen camera, then reboot
  (DT overlays are boot-only).
- nixos_upgrade: re-apply the persisted camera's DEFAULT after activation,
  before the upgrade reboot (every rebuild resets DEFAULT to base).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
The ADR said device update downloads ship only the genuinely-new chunks
(~80 MB for a 1.5 GB closure). That conflates server-side storage / CI-upload
dedup with the device download. Attic serves whole NARs over the standard
binary-cache protocol: a device fetches the full compressed NAR of every
changed store path, with no chunk-delta against the previous version. The only
device-side saving is path-level (unchanged paths are not refetched). True
client-side chunk-delta needs a casync/desync client with a local chunk store.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
pifinder-src is rebuilt (new store hash) on every code change, and Attic ships
whole NARs — so each change re-downloads ~43MB and rewrites ~70MB to the SD,
even for a one-line edit. 56MB of that is stable: fonts (~31MB) and the pinned
tetra3/cedar-solve solver (~25MB, ~15MB after trimming examples/tests/docs).

Move both into their own derivations (like astro-data) and symlink them into
pifinder-src. A routine code change now rewrites only the ~16MB code path;
fonts/tetra3 are distributed once and shared across changes. tetra3 keeps
cedar_detect_pb2 (ships in the repo) and is pre-compiled; it is symlinked after
compileall so bytecode isn't written into the read-only store path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
Adds a 'Rollback' channel next to stable/beta/unstable that lists the on-disk
system generations you can roll back to (all but the current one).

- sys_utils.list_rollback_targets(): reads ONLY immutable generation data (the
  /nix/var/nix/profiles symlinks + store-path labels) — no sidecar JSON state to
  evolve or corrupt across up/downgrades. Entry = label + generation + date.
- software.py: the Rollback channel is built locally, so it's available even
  when the manifest fetch fails — i.e. exactly when you're stranded on a bad
  build. Select -> confirm -> reuses update_software() (the entry's ref is the
  generation's store path, so it activates + reboots with no download).
- nixos_upgrade.cleanup_old_generations: keep +3 (was +2) -> 2 rollback targets.

Labels currently come from the store-path name; a follow-up can set
system.nixos.label at build time for 'PR-379'-style names (and matching
bootloader entries).
Re-applied the deepchart feature (Gaia star charts) as a clean delta on top
of current nixos, replacing the cat_images image backend with the new
object_images package (POSS images + generated Gaia charts via
chart_provider / poss_provider / gaia_chart / star_catalog). Adds the
limiting-magnitude entry UI (ui/lm_entry.py) and the Obj Chart / Set LM menus.

Merge decisions (theirs=deepchart, ours=nixos):
- object_details.py: object_images.get_display_image replaces
  cat_images.get_display_image; nixos Contrast Reserve, telemetry and the
  DM_CONTRAST page preserved; deepchart chart-generator and progressive-image
  handling integrated.
- menu_structure.py: keep both UITelemetryList and UILMEntry imports; union
  the nixos Image menu with deepchart Obj Chart / Set LM menus.
- log / solver / main / camera_debug / sqm conflicts resolved to nixos
  (deepchart changes there were dead code, disabled test hacks, or already
  upstream).
- cat_images.py removed (object_images fully replaces it); obsolete
  test_cat_images.py removed and test_equipment.py decoupled from it.

Carried nixos NSEW-label / object-size-box overlays onto the object_images
backend: ported the cardinal/size/vertex geometry helpers and an
add_orientation_overlays() into object_images/image_utils.py, called from
poss_provider so the image_nsew / image_bbox settings work again.

Replaced the obsolete star_catalog tests (which targeted the removed
_read_binary_index API) with real coverage of the v3 CompressedIndex reader:
header/tile-count parsing, cumulative per-run tile offsets, second-run
offset_base, missing-tile lookups (before/after/gap/past-run-length),
single-run, and version validation. Made CompressedIndex.close() idempotent
to fix a double-close ValueError the new tests exposed. The healpy-dependent
_parse_records / _apply_proper_motion paths still need on-device coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gaia deep charts load from utils.data_dir/gaia_stars at runtime, but nothing
provisioned it. Add a fixed-output derivation that fetches the ~454 MB catalog
tarball and symlink it into PiFinder_data from the activation script, mirroring
the pifinder-src / astro_data pattern so both fresh flashes and in-place
upgrades deliver it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testable Ready for testing via PiFinder software update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant