feat: Depth perception from mono camera by MJohnson459 · Pull Request #18 · ClachDev/Mote

MJohnson459 · 2026-06-29T11:10:37Z

Turns the single RGB camera into a PointCloud2 of obstacles (/camera_obstacles)
for a Nav2 voxel/obstacle layer — catching the low/thin things the 2D lidar plane
misses (cables, thresholds, table & chair legs, a robot vacuum). The lidar stays
the primary, low-latency obstacle and clearing source; this is a slower
supplementary marker.

Pipeline: Depth Anything V2 (metric, indoor) gives a dense depth map; its raw
metres are not accurate for our lens, so every frame is metrically rescaled
against the known floor plane (depth_rescale.py, RANSAC affine-in-disparity —
the camera's fixed height/pose is dense per-frame ground truth). The rescaled depth
is back-projected to 3D; points more than z_obstacle (default 0.02 m) above the
floor become the cloud, stamped at image-capture time so Nav2 places it via tf
at the moment it was seen (this is how the off-board latency is absorbed).

Depth inference is too heavy for the Pi CPU (~0.5 s/frame), so it runs off-board
as two processes:

tools/depth_server.py — keeps the model resident and serves depth over a
socket. Runs in a torch venv (kept out of the ROS/robot env on purpose):
```
pixi run depth-server
```
depth_obstacle_node — light rclpy node (no torch); forwards each compressed
frame to the server, rescales, and publishes the cloud. Runs anywhere (robot or
workstation):
```
pixi run -- ros2 run mote_perception depth_obstacle_node \
  --ros-args -r image/compressed:=/image_raw/compressed -p server_host:=<workstation>
```

Key params: z_obstacle (height deadband, default 0.02 m — below ~1.5 cm floor
noise false-positives), range_min/range_max, server_host/server_port.

Everything is developed and validated offline against recorded bags (pixi run record): tools/depth_obstacles.py overlays the obstacle decision and compares
the cloud to lidar; other tools/*.py are the spike harnesses behind the design.

Monocular obstacle-detection spike under mote_perception (offline; no ROS nodes or deps added yet): - ground_projection.py: shared pixel<->floor geometry (camera->base via static TF) - free_space.py: classical appearance floor segmentation (spike — fast but false-positive prone under variable lighting) - depth_rescale.py: robust per-frame metric rescaling of learned mono-depth against the known floor plane (RANSAC affine-in-disparity) — the chosen L1 direction; inlier fraction gates seed contamination - tools/: offline bag harnesses (geometry overlay, classical/BEV/depth eval, segmentation video) for evaluating approaches against recorded bags Depth + floor-rescale gives ~0.19 m median range vs lidar and is lighting-robust; findings drove the decision to pursue learned depth off-board. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The L1 obstacle pipeline as two processes so torch stays out of the ROS/robot env: - tools/depth_server.py: keeps Depth Anything V2 (metric, indoor) resident and serves depth over a socket. Runs in a throwaway torch venv on the workstation. - depth_obstacle_node (rclpy, no torch; runs anywhere): forwards each compressed frame to the server, metrically rescales the returned depth against the known floor plane (depth_rescale), back-projects, keeps points above z_obstacle (default 0.02 m), and publishes /camera_obstacles. The cloud is stamped at image capture time so Nav2 places it via tf at the moment it was seen — how the off-board (~0.6 s, inference-bound) latency is absorbed without inflation. Lidar stays the primary, low-latency obstacle/clearing source; this is a supplementary marker for the low/thin things the 2D scan misses. z_obstacle=0.02 chosen from a floor-noise sweep across the bag: floor height noise is ~1.5 cm p99, so <1.5 cm false-positives on the floor; 2 cm is clean and the lidar already covers >=6 cm. depth_obstacles.py gains an obstacle-tint overlay. Validated end-to-end against a recorded bag via ros2 bag play. README documents it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The workstation depth task runs in the default/ROS env, so the nested `pixi run depth-server` inherited a PYTHONPATH pointing at the ROS Python 3.12 site-packages, and the depth env's Python 3.14 then loaded those incompatible numpy C-extensions. Drop PYTHONPATH for the server child only; the ROS node still needs it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Diagnostics for the L1 monocular-depth obstacle pipeline: - depth_obstacle_node gains a publish_debug param (default true) that publishes the rescaled metric depth as a 32FC1 Image (/camera_depth) and the unfiltered, floor-inclusive cloud (/camera_cloud_full) for geometry checks. - mote.rviz: Camera Obstacles + Camera Cloud (full) PointCloud2 displays (AxisColor by height) and a Depth image display. - tools/measure_camera_pitch.py: lays the calibration checkerboard on the floor and solvePnPs its plane to read the camera's pitch/roll/height relative to the floor it sits on, folding in chassis tilt and local floor slope. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

The floor-plane rescale fits a narrow near-floor band and extrapolates to far walls, and depends on the camera->floor angle, which was measured to wander ~1.5 deg across rest positions (floor slope + how the robot rests) — so the obstacle cloud over-ranged past the walls even stationary. Lidar gives metric range on the walls themselves through the body-fixed lidar->camera transform, which is immune to chassis/floor tilt. LidarDepthRescaler matches each scan return to its camera pixel, samples the model depth there, and fits the shared affine-in-disparity correction on those pairs. The node buffers scans and matches the one nearest the image *capture* stamp (absorbing the ~0.6 s off-board latency). rescale_source = auto|lidar|floor; auto holds the last good lidar (a,b) when a scan can't constrain it rather than falling back to the floor fit it replaces. Logs scale source, pair count, and scan dt. Scale only; the cloud is still back-projected through the level-URDF transform, so residual pitch can still skew floor-point z-classification — a follow-up plane-fit on the lidar-scaled floor points will recover that. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

The single-threaded executor was monopolized by the ~0.5 s blocking inference in _on_image, so _on_scan was starved, the scan buffer went stale, and no scan landed within scan_max_dt of an image's capture stamp — the node fit lidar once at startup then held that one (bad) (a,b) forever. Run on a MultiThreadedExecutor with the scan subscription in its own callback group so scans keep buffering during inference; snapshot the deque when matching (read on the image thread, appended on the scan thread). Diagnostics for chasing fit quality: log scan-buffer depth, matched dt, pair count, and the fitted (a,b); publish the raw pre-rescale model depth (/camera_depth_raw) next to the rescaled one to separate model noise from a runaway rescale. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Reconcile the lock with the merged pixi.toml so it carries both the depth env (rebased) and the bag-recorder deps (from main #17). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

MJohnson459 and others added 9 commits June 29, 2026 17:31

PR feedback

8425581

Remove old CV solution. Cleanup

f55886b

MJohnson459 force-pushed the l1-depth-perception branch from 66a032a to d6c1feb Compare June 29, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Depth perception from mono camera#18

feat: Depth perception from mono camera#18
MJohnson459 wants to merge 9 commits into
mainfrom
l1-depth-perception

MJohnson459 commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

MJohnson459 commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MJohnson459 commented Jun 29, 2026 •

edited

Loading