Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
4cc3700
feat(demos/ota): scaffold pack_artifact CLI
bburda Apr 26, 2026
0a3ddaf
feat(demos/ota): pack_artifact argparse + dispatcher signature
bburda Apr 26, 2026
3a5b41e
feat(demos/ota): build SOVD-shaped catalog entry
bburda Apr 26, 2026
3d22b16
feat(demos/ota): merge_catalog with id-based replace
bburda Apr 26, 2026
876b960
feat(demos/ota): tarball creation from install dir
bburda Apr 26, 2026
f86c765
feat(demos/ota): pack_artifact end-to-end run() with kind dispatch
bburda Apr 26, 2026
161b707
fix(demos/ota): version default to 0.0.0 + cleanup unused symbols + p…
bburda Apr 26, 2026
398842b
test(demos/ota): cover colcon_build, install e2e, version-required guard
bburda Apr 26, 2026
216acc7
feat(demos/ota): ota_update_server scaffold
bburda Apr 26, 2026
e42af7e
test(ota_server): /catalog endpoint coverage
bburda Apr 26, 2026
ab10c29
test(ota_server): /artifacts endpoint + path traversal guard
bburda Apr 26, 2026
9b71e5a
feat(demos/ota): ota_update_server Dockerfile
bburda Apr 26, 2026
1f838de
chore(ota_server): pyright config to silence venv import warnings
bburda Apr 26, 2026
653902b
fix(ota_server): mark /artifacts route response_class=FileResponse
bburda Apr 26, 2026
248ef32
feat(demos/ota): broken_lidar node with phantom /scan return
bburda Apr 26, 2026
c8e10eb
feat(demos/ota): fixed_lidar (clean /scan, no phantom)
bburda Apr 26, 2026
a14243b
feat(demos/ota): broken_lidar_legacy do-nothing node (uninstall target)
bburda Apr 26, 2026
01e5fc8
feat(demos/ota): obstacle_classifier_v2 (install target, /scan -> /sa…
bburda Apr 26, 2026
9defc1a
feat(demos/ota): build_artifacts.sh + gitignore generated tarballs
bburda Apr 26, 2026
706c62c
fix(demos/ota): use array for pack_artifact invocation in build script
bburda Apr 26, 2026
1a9f20a
feat(demos/ota): ota_update_plugin C++ gateway plugin
bburda Apr 26, 2026
78816db
fix(ota_plugin): double-fork to avoid zombies, init catalog client in…
bburda Apr 26, 2026
3517bb8
feat(demos/ota): thread x_medkit_replaces_executable for update kind
bburda Apr 26, 2026
3bb6b1b
fix(ota_plugin): honor x_medkit_replaces_executable when killing old …
bburda Apr 26, 2026
f088c7e
feat(demos/ota): docker compose stack + gateway config + entrypoint +…
bburda Apr 26, 2026
2f7d817
fix(ota_plugin): __has_include compat for older gateway updates/ head…
bburda Apr 26, 2026
ec5070f
fix(ota_plugin): cmdline-based pgrep + UpdateProvider C export + runt…
bburda Apr 26, 2026
80e4af1
fix(demos/ota): use SOVD spec field names update_name + x_medkit_version
bburda Apr 26, 2026
8f48af1
test(demos/ota): committable Playwright e2e smoke driving web UI agai…
bburda Apr 26, 2026
a1726c4
feat(demos/ota): run-demo / stop-demo / check-demo / trigger-* scripts
bburda Apr 27, 2026
bd7a54f
test(demos/ota): smoke test + CI job mirroring sensor_diagnostics pat…
bburda Apr 27, 2026
5baa086
fix(demos/ota): rename unused loop var in run-demo to satisfy shellch…
bburda Apr 27, 2026
bf6d540
ci(ota): build artifacts inside ros:jazzy container instead of instal…
bburda Apr 27, 2026
e89628e
docs: list ota_nav2_sensor_fix in top-level README + smoke test catalog
bburda Apr 27, 2026
43a66e3
feat(demos/ota): bake foxglove_bridge into gateway image (port 8765)
bburda Apr 27, 2026
26e61f4
fix(demos/ota): launch fault_manager_node so /faults endpoint responds
bburda Apr 27, 2026
add3bb4
feat(demos/ota): bake TurtleBot3 + Nav2 + headless Gazebo into demo
bburda Apr 27, 2026
1dbabd4
docs(demos/ota): refresh README + run-demo for the TB3 / Nav2 / gz-si…
bburda Apr 27, 2026
d82fc94
test(demos/ota): add Uninstall + SetRemap smoke checks; fix flaky log…
bburda Apr 27, 2026
5067ff9
fix(demos/ota): drop runtime function auto-gen + pin gateway to logs-…
bburda Apr 28, 2026
08b0c21
feat(demos/ota): add SOVD manifest with logical functions
bburda Apr 28, 2026
dab79c9
feat(demos/ota): reactive fault narrative + reproducible artefact build
bburda Apr 29, 2026
7bd90be
feat(demos/ota): drop areas from manifest - components are the boundary
bburda Apr 29, 2026
6b9e084
refactor(demos/ota): collapse to one component, keep apps + functions
bburda Apr 29, 2026
ca0232d
fix(demos/ota): subscribe only TwistStamped on /cmd_vel
bburda Apr 29, 2026
602ad90
fix(demos/ota): move robot spawn into the map + revert phantom to sin…
bburda Apr 29, 2026
f48cbba
fix(demos/ota): broadcast map -> odom continuously when robot is idle
bburda Apr 29, 2026
17325d5
fix(demos/ota): bypass turtlebot3_gazebo's hardcoded frame_prefix slash
bburda Apr 29, 2026
af571b4
fix(ota_plugin): forward use_sim_time to OTA-spawned processes
bburda Apr 29, 2026
e7cb757
feat(ota_plugin): write manifest fragments + notify gateway on instal…
bburda Apr 29, 2026
d411a7b
ci(ota): drop "Build artifacts on host" step from narrative job
bburda Apr 29, 2026
041a7a6
fix(ota_demo): report fault source_id with leading slash so per-entit…
bburda Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,93 @@ jobs:
if: always()
working-directory: demos/multi_ecu_aggregation
run: docker compose --profile ci down

build-and-test-ota:
needs: lint
runs-on: ubuntu-24.04
steps:
- name: Show triggering source
if: github.event_name == 'repository_dispatch'
run: |
SHA="${{ github.event.client_payload.sha }}"
RUN_URL="${{ github.event.client_payload.run_url }}"
echo "## Triggered by ros2_medkit" >> "$GITHUB_STEP_SUMMARY"
echo "- Commit: \`${SHA:-unknown}\`" >> "$GITHUB_STEP_SUMMARY"
if [ -n "$RUN_URL" ]; then
echo "- Run: [View triggering run]($RUN_URL)" >> "$GITHUB_STEP_SUMMARY"
else
echo "- Run: (URL not provided)" >> "$GITHUB_STEP_SUMMARY"
fi

- name: Checkout repository
uses: actions/checkout@v4

- name: Build and start OTA demo
working-directory: demos/ota_nav2_sensor_fix
run: docker compose up -d --build

- name: Run smoke tests
run: ./tests/smoke_test_ota.sh

- name: Show gateway logs on failure
if: failure()
working-directory: demos/ota_nav2_sensor_fix
run: docker compose logs gateway --tail=200

- name: Show update server logs on failure
if: failure()
working-directory: demos/ota_nav2_sensor_fix
run: docker compose logs ota_update_server --tail=200

- name: Teardown
if: always()
working-directory: demos/ota_nav2_sensor_fix
run: docker compose down

# Separate job from build-and-test-ota: this one publishes /goal_pose,
# waits for the controller to actually try to drive, asserts the
# reactive SCAN_PHANTOM_RETURN fault appears, and only then runs the
# OTA swap. Catches regressions in the demo narrative (broken_lidar
# subscribing /cmd_vel, fault_manager debounce, fixed_lidar not
# reporting). Slower than the API-only smoke job because it has to
# wait for nav2 lifecycle to settle and for /cmd_vel to actually fire,
# so it's split out and can fail in isolation without blocking the
# quick OTA-endpoint check.
ota-demo-narrative:
needs: lint
runs-on: ubuntu-24.04
steps:
- name: Show triggering source
if: github.event_name == 'repository_dispatch'
run: |
SHA="${{ github.event.client_payload.sha }}"
RUN_URL="${{ github.event.client_payload.run_url }}"
echo "## Triggered by ros2_medkit" >> "$GITHUB_STEP_SUMMARY"
echo "- Commit: \`${SHA:-unknown}\`" >> "$GITHUB_STEP_SUMMARY"
if [ -n "$RUN_URL" ]; then
echo "- Run: [View triggering run]($RUN_URL)" >> "$GITHUB_STEP_SUMMARY"
fi

- name: Checkout repository
uses: actions/checkout@v4

- name: Build and start OTA demo
working-directory: demos/ota_nav2_sensor_fix
# docker compose up --build runs the multi-stage build for
# ota_update_server which produces the catalog + tarballs
# internally - no separate "build artifacts on host" step
# needed (and the host wouldn't have ros2_medkit_msgs anyway).
run: docker compose up -d --build

- name: Run demo narrative smoke
run: ./tests/smoke_test_demo_narrative.sh

- name: Show gateway logs on failure
if: failure()
working-directory: demos/ota_nav2_sensor_fix
run: docker compose logs gateway --tail=300

- name: Teardown
if: always()
working-directory: demos/ota_nav2_sensor_fix
run: docker compose down
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ All demos support:
| [TurtleBot3 Integration](demos/turtlebot3_integration/) | Full ros2_medkit integration with TurtleBot3 and Nav2 | SOVD-compliant API, manifest-based discovery, fault management | ✅ Ready |
| [MoveIt Pick-and-Place](demos/moveit_pick_place/) | Panda 7-DOF arm with MoveIt 2 manipulation and ros2_medkit | Planning fault detection, controller monitoring, joint limits | ✅ Ready |
| [Multi-ECU Aggregation](demos/multi_ecu_aggregation/) | Multi-ECU peer aggregation with 3 ECUs (perception, planning, actuation), mDNS discovery, cross-ECU functions | Peer aggregation, mDNS discovery, cross-ECU functions | ✅ Ready |
| [OTA over SOVD - nav2 sensor fix](demos/ota_nav2_sensor_fix/) | Dev-grade OTA plugin showing the SOVD `/updates` lifecycle - update a broken lidar node, install a new safety classifier, uninstall a deprecated package | SOVD-spec update / install / uninstall, native binary swap, fork+exec process management, Foxglove panel + curl scripts | ✅ Ready |

### Quick Start

Expand Down Expand Up @@ -150,6 +151,32 @@ cd demos/multi_ecu_aggregation
- Unified SOVD-compliant REST API spanning all ECUs
- Web UI for browsing aggregated entity hierarchy

#### OTA over SOVD Demo (Dev-grade Update / Install / Uninstall)

End-to-end demo of the SOVD `/updates` resource: a broken lidar node is
swapped with a fixed version over HTTP, an extra safety classifier is
installed from scratch, and a deprecated package is uninstalled - all
without SSH, all spec-compliant.

```bash
cd demos/ota_nav2_sensor_fix
./run-demo.sh # build artifacts + bring up gateway/plugin/update server
./check-demo.sh # show registered updates + per-id status + live process state
./trigger-update.sh # broken_lidar -> fixed_lidar (the headline)
./trigger-install.sh # install obstacle_classifier_v2
./trigger-uninstall.sh # remove broken_lidar_legacy
./stop-demo.sh
```

**Features:**

- Dev-grade `ota_update_plugin` C++ gateway plugin (UpdateProvider + GatewayPlugin)
- SOVD ISO 17978-3 compliant `/updates` resource: kind derived from
`updated_components` / `added_components` / `removed_components` metadata
- Native binary swap + `fork+exec` process management (no containers, no signing)
- Foxglove Studio panel mirrors the same SOVD client patterns as the web UI
- Pairs with the [`ros2_medkit_foxglove_extension`](https://github.com/selfpatch/ros2_medkit_foxglove_extension) Updates panel

## Getting Started

### Prerequisites
Expand Down Expand Up @@ -209,9 +236,11 @@ Each demo has automated smoke tests that verify the gateway starts and the REST
./tests/smoke_test.sh # Sensor diagnostics (full API coverage + fault injection + beacons)
./tests/smoke_test_turtlebot3.sh # TurtleBot3 (discovery, data, operations, scripts, triggers, logs)
./tests/smoke_test_moveit.sh # MoveIt pick-and-place (discovery, data, operations, scripts, triggers, logs)
./tests/smoke_test_multi_ecu.sh # Multi-ECU aggregation (per-ECU discovery + aggregated view)
./tests/smoke_test_ota.sh # OTA over SOVD (catalog, /updates spec shape, prepare/execute, process swap)
```

CI runs all 4 demos in parallel - each job builds the Docker image, starts the container, and runs the smoke tests against it. See [CI workflow](.github/workflows/ci.yml).
CI runs all demos in parallel - each job builds the Docker image, starts the container, and runs the smoke tests against it. See [CI workflow](.github/workflows/ci.yml).

## Related Projects

Expand Down
119 changes: 119 additions & 0 deletions demos/ota_nav2_sensor_fix/Dockerfile.gateway
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Copyright 2026 bburda
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Builds the ros2_medkit gateway, the ota_update_plugin, and the four demo
# ROS 2 packages into a single ROS 2 Jazzy image. Plugin loads at gateway
# startup via /etc/ros2_medkit/gateway_config.yaml and the entrypoint also
# launches the broken_lidar demo nodes that get swapped/uninstalled at
# runtime by the plugin.

FROM ros:jazzy AS builder

ARG GATEWAY_REPO=https://github.com/selfpatch/ros2_medkit.git
# Pin to the component-logs aggregation fix branch until the upstream PR
# lands on main; without it the per-component Logs tab in the Foxglove
# extension shows zero entries because the synthetic component prefix
# match returns empty for fqn-less components.
ARG GATEWAY_REF=fix/component-logs-aggregation

RUN apt-get update && apt-get install -y --no-install-recommends \
git \
python3-colcon-common-extensions \
python3-rosdep \
build-essential \
cmake \
curl \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*

RUN rosdep init || true
RUN rosdep update --rosdistro=jazzy

WORKDIR /ws/src
RUN git clone --depth=1 --branch ${GATEWAY_REF} ${GATEWAY_REPO} ros2_medkit

# Copy demo packages (broken_lidar, fixed_lidar, broken_lidar_legacy,
# obstacle_classifier_v2) and the OTA plugin from the build context.
COPY ros2_packages /tmp/ros2_packages
RUN cp -r /tmp/ros2_packages/. /ws/src/ && rm -rf /tmp/ros2_packages
COPY ota_update_plugin /ws/src/ota_update_plugin

WORKDIR /ws
# rosdep needs the apt cache populated to install gateway dependencies
# (nlohmann-json3-dev, libcpp-httplib-dev, etc.).
RUN apt-get update
RUN . /opt/ros/jazzy/setup.sh && \
rosdep install --from-paths src --ignore-src -r -y --rosdistro=jazzy && \
colcon build \
--cmake-args -DCMAKE_BUILD_TYPE=Release && \
rm -rf /var/lib/apt/lists/*


FROM ros:jazzy

# Runtime dependencies. Beyond the gateway/plugin bare minimum we also pull in
# TurtleBot3, Nav2, and gz-sim so the container can self-host the visual demo
# (TB3 + headless Gazebo + Nav2) - no external sim required, the OTA story
# becomes "Foxglove sees a stuck robot, run an update, robot unsticks".
RUN apt-get update && apt-get install -y --no-install-recommends \
ros-jazzy-rclcpp \
ros-jazzy-rclcpp-lifecycle \
ros-jazzy-sensor-msgs \
ros-jazzy-visualization-msgs \
ros-jazzy-launch-ros \
ros-jazzy-test-msgs \
ros-jazzy-foxglove-bridge \
ros-jazzy-turtlebot3-gazebo \
ros-jazzy-turtlebot3-msgs \
ros-jazzy-turtlebot3-description \
ros-jazzy-turtlebot3-navigation2 \
ros-jazzy-nav2-bringup \
ros-jazzy-nav2-bt-navigator \
ros-jazzy-nav2-controller \
ros-jazzy-nav2-planner \
ros-jazzy-nav2-behaviors \
ros-jazzy-nav2-costmap-2d \
ros-jazzy-nav2-lifecycle-manager \
ros-jazzy-nav2-map-server \
ros-jazzy-nav2-amcl \
ros-jazzy-ros-gz-sim \
ros-jazzy-ros-gz-bridge \
ros-jazzy-rmw-cyclonedds-cpp \
libcpp-httplib-dev \
libsystemd-dev \
nlohmann-json3-dev \
curl \
procps \
&& rm -rf /var/lib/apt/lists/*

COPY --from=builder /ws/install /ws/install
COPY gateway_config.yaml /etc/ros2_medkit/gateway_config.yaml
COPY manifest.yaml /etc/ros2_medkit/manifest.yaml

# Pre-create the fragments directory so the gateway's manifest manager
# scans an existing (empty) dir at boot rather than logging "missing
# fragments_dir" warnings. Plugin writes / removes yaml files here at
# OTA install / uninstall time.
RUN mkdir -p /etc/ros2_medkit/manifest_fragments
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh

# Default TB3 model + gz-sim resource path so spawn_turtlebot3 + gz can find
# the burger URDF/world models without the launch file having to set them.
# RMW: jazzy's apt-shipped nav2_msgs fastrtps typesupport pulls
# eprosima::fastcdr::Cdr::serialize(uint32_t), which the bundled
# ros-jazzy-fastcdr 2.2.5 does NOT export - amcl/controller_server segfault
# at startup. Switch to cyclonedds, which doesn't use the broken typesupport.
ENV ROS_DOMAIN_ID=42 \
TURTLEBOT3_MODEL=burger \
GAZEBO_MODEL_PATH=/opt/ros/jazzy/share/turtlebot3_gazebo/models \
HEADLESS=true \
RMW_IMPLEMENTATION=rmw_cyclonedds_cpp

EXPOSE 8080 8765
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
109 changes: 109 additions & 0 deletions demos/ota_nav2_sensor_fix/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# OTA over SOVD - nav2 sensor fix demo

End-to-end demo: a `ros2_medkit` gateway with a dev-grade OTA plugin that
demonstrates the full Update / Install / Uninstall lifecycle on ROS 2 nodes
without SSH-ing into the robot.

## What this shows

Three things you can do to a ROS 2 robot over the air:

1. **Update** - swap a running sensor node with a fixed version (the
`broken_lidar` -> `fixed_lidar` flip).
2. **Install** - pull and start a new ROS 2 package
(`obstacle_classifier_v2`).
3. **Uninstall** - stop and remove a deprecated package
(`broken_lidar_legacy`).

All three operations are SOVD ISO 17978-3 compliant - the kind is derived
from `updated_components` / `added_components` / `removed_components` in the
update package metadata.

## Quickstart

```bash
# Build artifacts + start gateway, plugin, demo nodes, update server.
./run-demo.sh
```

The first run pulls `ros:jazzy`, installs the TurtleBot3 + Nav2 + gz-sim
runtime (~3 GB) and builds the gateway from source - takes ~15-20 minutes
on a fresh cache. Subsequent runs reuse the layer cache.

In another terminal, drive the demo:

```bash
./check-demo.sh # show registered updates + live process state
./trigger-update.sh # broken_lidar -> fixed_lidar (the headline scene)
./trigger-install.sh # install obstacle_classifier_v2 from scratch
./trigger-uninstall.sh # remove broken_lidar_legacy
./stop-demo.sh # tear down
```

Each trigger script issues SOVD `PUT /updates/{id}/prepare` then `/execute`
and prints the resulting status plus the live process list.

Port overrides (set as env vars before `./run-demo.sh`):

- `OTA_GATEWAY_PORT` - gateway HTTP API (default `8080`)
- `OTA_FOXGLOVE_BRIDGE_PORT` - foxglove_bridge WebSocket (default `8765`)

Tear down: `docker compose down`.

## Foxglove Studio visualization

The gateway container bakes in a TurtleBot3 burger + Nav2 stack running on
top of headless Gazebo. `foxglove_bridge` runs on port `8765` and exposes
the full topic set: `/tf`, `/tf_static`, `/scan`, `/odom`, `/map`,
`/cmd_vel`, `/global_costmap/costmap`, `/local_costmap/costmap`, etc. - so a
Foxglove **3D** panel renders the actual robot in the world out of the box.

1. Open Foxglove Studio -> **Open connection** -> **Foxglove WebSocket** ->
`ws://localhost:8765`. The Topics panel should list all of the topics
above.
2. Drop in a **3D** panel. You should see the TB3 burger sitting in the
default `turtlebot3_world.world` map, with the laser scan cone visible.
Before the OTA update, ray index 180 reports a phantom 1 m return - the
"obstacle" the demo's narrative pivots on.
3. Install the [`ros2_medkit_foxglove_extension`](https://github.com/selfpatch/ros2_medkit_foxglove_extension)
(`npm run local-install` in that repo, or drag-and-drop the `.foxe`
onto Foxglove). It ships three panels: Entity Browser, Faults Dashboard,
and **ros2_medkit Updates**.
4. Add the **ros2_medkit Updates** panel and set its `baseUrl` to
`http://localhost:8080/api/v1` (or the port you picked via
`OTA_GATEWAY_PORT`). Click **Prepare** then **Execute** for
`fixed_lidar_2_1_0`. The 3D panel should show the phantom return
disappearing as `broken_lidar` is killed and `fixed_lidar` starts.

### Driving the robot to make the narrative reproducible

The demo doesn't auto-publish a navigation goal - that keeps it deterministic
for CI. To trigger the "robot stuck on phantom obstacle" beat manually:

```bash
# From the host (or any container on the same ROS_DOMAIN_ID=42):
ros2 topic pub --once /goal_pose geometry_msgs/PoseStamped \
'{header: {frame_id: map}, pose: {position: {x: 1.5, y: 0.0, z: 0.0}, orientation: {w: 1.0}}}'
```

Foxglove's **3D** panel also has a built-in "Publish" tool - select pose
mode, click a point ahead of the robot, and Foxglove publishes `/goal_pose`
for you. Before the update, Nav2 refuses to drive through the phantom return;
after `trigger-update.sh`, the robot completes the goal.

## Disclosures

This is **dev-grade** OTA. Deliberately missing for production:

- No artifact signing or signature verification
- No atomic swap (in-place overwrite)
- No A/B partition rollout
- No fleet-wide staged rollout
- No persistent update state across gateway restarts
- No automated health-gated rollback policy
- No audit log

Perfect for: prototypes, lab robots, internal demos, dev environments.

For production-grade OTA (rollout safety, signing, A/B partitions,
fleet-aware staging), reach out.
2 changes: 2 additions & 0 deletions demos/ota_nav2_sensor_fix/artifacts/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.tar.gz
catalog.json
Loading
Loading