[WIP] Gaia charts#362
Open
mrosseel wants to merge 252 commits into
Open
Conversation
Required for NixOS module system to accept devMode setting. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required when module has both options and config sections. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaces FIXME placeholders with actual SRI hashes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Uses Pi5 runner when RUNNER_LABELS variable is set, falls back to ubuntu with QEMU emulation otherwise. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Filter to only Pi 4B device tree (CM4 incompatible with our overlays) - Use shorthand DTS syntax for PWM overlay Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Untracked file was excluded from Nix flake source tree, causing "No module named 'PiFinder.sys_utils_base'" on SD card boot. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add camera overlay (imx477) to netboot config.txt via flake.nix - Fix sys_utils import in main.py to use utils.get_sys_utils() - Add hip_main.dat fetch to pifinder-src.nix for starfield plotting - Add dma_heap udev rule for libcamera/picamera2 access - Fix shared memory naming in solver.py (remove leading /) - Add DNS nameservers for netboot environment - Document power control scripts in CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add runtimeCameraSelection option to hardware.nix (default: true) - SD image includes config.txt with "include camera.txt" directive - Users can edit camera.txt and reboot to switch cameras - Supported cameras: imx296, imx290 (imx462), imx477 - Fix cameraDriver scope in hardware.nix (moved to top-level let) - Add sudoers rules for systemctl stop/start pifinder.service - Add DMA heap udev rule for libcamera video group access - Netboot config sets cameraType = "imx477" for HQ camera dev Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Refactor sys_utils modules to use common base class - Add sys_utils_nixos.py for NixOS-specific implementations - Add get_sys_utils() detection in utils.py for platform selection - Add flake.lock for reproducible builds - Add NetworkManager config to networking.nix - Add deploy-image-to-nfs.sh for netboot development workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update build.yml CI workflow - Fix fonts.py import - Fix marking_menus.py formatting - Add missing import to preview.py - Simplify objects_db.py - Add catalog_imports improvements - Update pifinder_objects.db Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Switch to NFSv4 with caching disabled (noac, actimeo=0) - Disable auto-optimise-store in devMode (hard links fail on NFS) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ServerAliveInterval/CountMax to prevent timeout during transfers - Use rsync -R (relative) to preserve directory structure correctly Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comets.txt is downloaded at runtime and must be in a writable location, not the read-only Nix store. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extend eth0 wait to 30 seconds with debug output - Wait for link carrier before DHCP - Add DHCP retries (3 attempts) - Add LIBCAMERA_IPA_MODULE_PATH to pifinder service environment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restore SUBSYSTEM=="pwm" udev rule that was accidentally removed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Turns on keypad LEDs during sysinit for early visual boot feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- boot-splash.c: displays welcome image with scanning animation - Starts at sysinit, stops when pifinder.service starts - Much faster than Python splash Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove nixos-hardware module (saves 659MB linux-firmware) - Fetch nixos-rebuild at runtime (saves ~500MB llvm/nix deps) - Remove git from systemPackages (nix has built-in git for flakes) Target: ~150MB vs current 1.7GB
- Remove default packages (vim, nano, etc) - Disable polkit, udisks2, speechd - Should reduce closure significantly
NetworkManager-vpnc alone has 1.1GB closure (webkitgtk, llvm, etc). Disable all NM plugins for bootstrap - we just need WiFi.
- Disable xdg.mime/icons/sounds (pulls xdg-utils -> perl 112MB) - Disable command-not-found (pulls perl) - Disable fuse (86MB) - Disable initrd extra filesystems
The update-manifest job branched nixos-manifest off the full source checkout and only git-added the manifest, so the branch inherited the entire 1400+ file source tree and history. Concurrent trunk/PR stamps also clobbered or failed each other on the single update-manifest.json. Extract publishing into .github/scripts/publish_manifest.sh, shared by build.yml and release.yml: - rebuild nixos-manifest as a single-file orphan tree every run - fetch -> re-apply this run's entry -> retry on a rejected push, so concurrent writers cannot lose an update (a git ref update is CAS) Add a manifest-write concurrency group on the build update-manifest job as a coarse serializer in front of the retry loop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
…ollapse nixos-manifest is metadata-only (just update-manifest.json); the find-prune + add -A was one-time cleanup of the already-collapsed branch. Keep only: fetch tip, rewrite the entry, push, with the concurrency retry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
cache.pifinder.eu's chunk store moved from local disk to AWS S3 (bucket pifinder-nix-cache, us-west-1). The cutover used a fresh atticd DB, which regenerated both caches' NAR signing keypairs. Rotate the pinned dev key and wire in the now-provisioned pifinder-release key. pifinder: 8UU/...jBmE= -> Vkem...3gck= pifinder-release: (newly pinned) WG/F...KpoM= ATTIC_TOKEN is unchanged (RS256 server secret preserved), so CI keeps working once these land.
A push to main/nixos triggered the whole build pipeline (build-native → build-emulated → update-manifest → migration-tarball). By design mrosseel commits must build nothing; only labeled PRs (brickbots contributions) and pre-releases/releases (release.yml) should build. Drop the `push:` trigger and the event_name=='push' arms in build-native/native-wait; update-manifest still follows the build jobs, so it runs on labeled PRs.
…n fork PRs - software.py: read update-manifest.json from brickbots/PiFinder (canonical), not the mrosseel fork — devices were pointed at the wrong repo. - build.yml build-native: guard 'Push to Attic' on $ATTIC_TOKEN so fork PRs (no secrets) build verify-only instead of failing with AccessError, matching build-emulated. - build.yml update-manifest: drop 'ref: <head_ref>' checkout — a fork PR's head branch doesn't exist here, so it 404'd; default ref resolves for both.
A trusted build (secrets + write token) is the only thing that can push to the cache and write the nixos-manifest branch — a fork PR never can. Restore the push trigger but gate the build jobs on github.repository so it acts only in brickbots: a mrosseel push still builds nothing, while a merge to brickbots builds and publishes (attic push + update-manifest, whose fork-PR skip doesn't apply to pushes).
Adopt brickbots' canonical CI + the trusted testable-PR builder from brickbots#493 instead of the nixos branch's divergent workflows: + nixos-pr-build.yml (pull_request_target, label-gated, trusted) + nox.yml (brickbots standard lint/test) ~ web-integration-tests.yml (match brickbots main) - build.yml, lint.yml, release.yml (nixos-only; superseded) publish_manifest.sh / update_manifest.py already match brickbots#493.
…over rotation The cutover recreated the Attic cache with a fresh key (Vkem), but nothing deployed trusted it — the whole 8UU fleet got stranded. Attic can't dual-sign (serves only its own key), so the fix is to put the cache back on 8UU (done server-side) and revert the config to match. pifinder-release key kept (new cache, never previously trusted).
It holds always-on device config (avahi, hostname, substituters/keys, sudo), used long after the Debian→NixOS migration — the name was misleading. Update the flake import and the RELEASE.md reference. Pure rename, no content change.
…ostname nixos_upgrade.py: - Download progress now shows a size bar that moves *within* a path and names the package being copied, parsed live from nix's internal-json (resProgress byte events summed over copyPath activities; denominator = the dry-run 'unpacked' total, since Attic narinfos omit a compressed FileSize). All accounting is throttled and wrapped so a counter bug can never stall the stream or abort the upgrade; only a short log tail is kept (not the ~800k-line stream). Dropped the now-dead per-path path-info query, the size map, and write_sizes_file/UPGRADE_SIZES_FILE. - key-proof: fetch each cache's current signing key from its anonymous Attic cache-config endpoint and trust it for the pull (extra-trusted-public-keys; verification stays on) so a cache key rotation can't strand the fleet. networking.nix: own avahi here (the module shared by the running system and the migration build); fix the boot race (NM dispatcher re-scans avahi on connect) and make the PiFinder_data hostname stick — hostname-mode=none plus the dispatcher re-asserting hostname + avahi-set-host-name (NixOS bakes host-name=<static> into avahi's config, which a restart would otherwise revert). sys_utils.py / software.py: parse and show the package label under the bar.
…crashing
The external observing-lists feature added `list_descriptions` to
CompositeObject, but CACHE_VERSION was not bumped. Devices upgrading from the
prior release keep their existing composite_objects.pkl, whose unpickled
objects lack the new field (dataclass defaults are not applied on unpickle),
so opening any object's details crashes the whole app:
AttributeError: 'CompositeObject' object has no attribute 'list_descriptions'
The main process hosts the multiprocessing shared-state manager, so its death
cascades BrokenPipe/connection-reset into every worker — the symptom seen in
the logs; the real cause was masked because main()'s handler logs via the
multiprocess queue and then os._exit()s before the record is written.
- catalog_cache: bump CACHE_VERSION 1 -> 2 so pre-list_descriptions caches are
rebuilt on upgrade (the real fix for deployed devices).
- composite_object: getattr guard in composed_sections so a stale-cached object
degrades gracefully instead of taking down the process.
- main: print + flush the traceback before os._exit so a fatal exception in
main() lands in the journal instead of being lost to the log queue.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
Camera switching only changed the persisted choice and ran switch-to-configuration boot, but the generic-extlinux builder always writes DEFAULT=nixos-default (the base camera) and device-tree overlays load only at boot. So a device set to imx477 kept booting the base imx462 DTB, the imx290 driver bound to absent hardware (Error writing reg 0x3038), and the camera never worked. Fix A keeps the one-image/no-rebuild specialisation design and just makes the chosen specialisation the boot default: - set-extlinux-default: fail-safe helper that repoints extlinux DEFAULT to the latest-generation nixos-<gen>-<camera> entry (base camera -> nixos-default); leaves a bootable DEFAULT untouched if the entry is missing. - pifinder-switch-camera: repoint DEFAULT to the chosen camera, then reboot (DT overlays are boot-only). - nixos_upgrade: re-apply the persisted camera's DEFAULT after activation, before the upgrade reboot (every rebuild resets DEFAULT to base). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
The ADR said device update downloads ship only the genuinely-new chunks (~80 MB for a 1.5 GB closure). That conflates server-side storage / CI-upload dedup with the device download. Attic serves whole NARs over the standard binary-cache protocol: a device fetches the full compressed NAR of every changed store path, with no chunk-delta against the previous version. The only device-side saving is path-level (unchanged paths are not refetched). True client-side chunk-delta needs a casync/desync client with a local chunk store. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
pifinder-src is rebuilt (new store hash) on every code change, and Attic ships whole NARs — so each change re-downloads ~43MB and rewrites ~70MB to the SD, even for a one-line edit. 56MB of that is stable: fonts (~31MB) and the pinned tetra3/cedar-solve solver (~25MB, ~15MB after trimming examples/tests/docs). Move both into their own derivations (like astro-data) and symlink them into pifinder-src. A routine code change now rewrites only the ~16MB code path; fonts/tetra3 are distributed once and shared across changes. tetra3 keeps cedar_detect_pb2 (ships in the repo) and is pre-compiled; it is symlinked after compileall so bytecode isn't written into the read-only store path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjeCZ17KqhKzhKikWGwBDo
Adds a 'Rollback' channel next to stable/beta/unstable that lists the on-disk system generations you can roll back to (all but the current one). - sys_utils.list_rollback_targets(): reads ONLY immutable generation data (the /nix/var/nix/profiles symlinks + store-path labels) — no sidecar JSON state to evolve or corrupt across up/downgrades. Entry = label + generation + date. - software.py: the Rollback channel is built locally, so it's available even when the manifest fetch fails — i.e. exactly when you're stranded on a bad build. Select -> confirm -> reuses update_software() (the entry's ref is the generation's store path, so it activates + reboots with no download). - nixos_upgrade.cleanup_old_generations: keep +3 (was +2) -> 2 rollback targets. Labels currently come from the store-path name; a follow-up can set system.nixos.label at build time for 'PR-379'-style names (and matching bootloader entries).
Re-applied the deepchart feature (Gaia star charts) as a clean delta on top of current nixos, replacing the cat_images image backend with the new object_images package (POSS images + generated Gaia charts via chart_provider / poss_provider / gaia_chart / star_catalog). Adds the limiting-magnitude entry UI (ui/lm_entry.py) and the Obj Chart / Set LM menus. Merge decisions (theirs=deepchart, ours=nixos): - object_details.py: object_images.get_display_image replaces cat_images.get_display_image; nixos Contrast Reserve, telemetry and the DM_CONTRAST page preserved; deepchart chart-generator and progressive-image handling integrated. - menu_structure.py: keep both UITelemetryList and UILMEntry imports; union the nixos Image menu with deepchart Obj Chart / Set LM menus. - log / solver / main / camera_debug / sqm conflicts resolved to nixos (deepchart changes there were dead code, disabled test hacks, or already upstream). - cat_images.py removed (object_images fully replaces it); obsolete test_cat_images.py removed and test_equipment.py decoupled from it. Carried nixos NSEW-label / object-size-box overlays onto the object_images backend: ported the cardinal/size/vertex geometry helpers and an add_orientation_overlays() into object_images/image_utils.py, called from poss_provider so the image_nsew / image_bbox settings work again. Replaced the obsolete star_catalog tests (which targeted the removed _read_binary_index API) with real coverage of the v3 CompressedIndex reader: header/tile-count parsing, cumulative per-run tile offsets, second-run offset_base, missing-tile lookups (before/after/gap/past-run-length), single-run, and version validation. Made CompressedIndex.close() idempotent to fix a double-close ValueError the new tests exposed. The healpy-dependent _parse_records / _apply_proper_motion paths still need on-device coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gaia deep charts load from utils.data_dir/gaia_stars at runtime, but nothing provisioned it. Add a fixed-output derivation that fetches the ~454 MB catalog tarball and symlink it into PiFinder_data from the activation script, mirroring the pifinder-src / astro_data pattern so both fresh flashes and in-place upgrades deliver it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a gaia achart view to all objects, down to mag 17 but with an auto limiting magnitude taking into account SQM and equipment. This auto limit is overridable.
TODO: