Skip to content

feat: Depth perception from mono camera#18

Open
MJohnson459 wants to merge 9 commits into
mainfrom
l1-depth-perception
Open

feat: Depth perception from mono camera#18
MJohnson459 wants to merge 9 commits into
mainfrom
l1-depth-perception

Conversation

@MJohnson459

@MJohnson459 MJohnson459 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Turns the single RGB camera into a PointCloud2 of obstacles (/camera_obstacles)
for a Nav2 voxel/obstacle layer — catching the low/thin things the 2D lidar plane
misses (cables, thresholds, table & chair legs, a robot vacuum). The lidar stays
the primary, low-latency obstacle and clearing source; this is a slower
supplementary marker.

Pipeline: Depth Anything V2 (metric, indoor) gives a dense depth map; its raw
metres are not accurate for our lens, so every frame is metrically rescaled
against the known floor plane
(depth_rescale.py, RANSAC affine-in-disparity —
the camera's fixed height/pose is dense per-frame ground truth). The rescaled depth
is back-projected to 3D; points more than z_obstacle (default 0.02 m) above the
floor become the cloud, stamped at image-capture time so Nav2 places it via tf
at the moment it was seen (this is how the off-board latency is absorbed).

Depth inference is too heavy for the Pi CPU (~0.5 s/frame), so it runs off-board
as two processes:

  • tools/depth_server.py — keeps the model resident and serves depth over a
    socket. Runs in a torch venv (kept out of the ROS/robot env on purpose):
    pixi run depth-server
  • depth_obstacle_node — light rclpy node (no torch); forwards each compressed
    frame to the server, rescales, and publishes the cloud. Runs anywhere (robot or
    workstation):
    pixi run -- ros2 run mote_perception depth_obstacle_node \
      --ros-args -r image/compressed:=/image_raw/compressed -p server_host:=<workstation>

Key params: z_obstacle (height deadband, default 0.02 m — below ~1.5 cm floor
noise false-positives), range_min/range_max, server_host/server_port.

Everything is developed and validated offline against recorded bags (pixi run record): tools/depth_obstacles.py overlays the obstacle decision and compares
the cloud to lidar; other tools/*.py are the spike harnesses behind the design.

MJohnson459 and others added 9 commits June 29, 2026 17:31
Monocular obstacle-detection spike under mote_perception (offline; no ROS nodes
or deps added yet):
- ground_projection.py: shared pixel<->floor geometry (camera->base via static TF)
- free_space.py: classical appearance floor segmentation (spike — fast but
  false-positive prone under variable lighting)
- depth_rescale.py: robust per-frame metric rescaling of learned mono-depth
  against the known floor plane (RANSAC affine-in-disparity) — the chosen L1
  direction; inlier fraction gates seed contamination
- tools/: offline bag harnesses (geometry overlay, classical/BEV/depth eval,
  segmentation video) for evaluating approaches against recorded bags

Depth + floor-rescale gives ~0.19 m median range vs lidar and is lighting-robust;
findings drove the decision to pursue learned depth off-board.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The L1 obstacle pipeline as two processes so torch stays out of the ROS/robot env:
- tools/depth_server.py: keeps Depth Anything V2 (metric, indoor) resident and
  serves depth over a socket. Runs in a throwaway torch venv on the workstation.
- depth_obstacle_node (rclpy, no torch; runs anywhere): forwards each compressed
  frame to the server, metrically rescales the returned depth against the known
  floor plane (depth_rescale), back-projects, keeps points above z_obstacle
  (default 0.02 m), and publishes /camera_obstacles. The cloud is stamped at image
  capture time so Nav2 places it via tf at the moment it was seen — how the
  off-board (~0.6 s, inference-bound) latency is absorbed without inflation. Lidar
  stays the primary, low-latency obstacle/clearing source; this is a supplementary
  marker for the low/thin things the 2D scan misses.

z_obstacle=0.02 chosen from a floor-noise sweep across the bag: floor height noise
is ~1.5 cm p99, so <1.5 cm false-positives on the floor; 2 cm is clean and the
lidar already covers >=6 cm. depth_obstacles.py gains an obstacle-tint overlay.
Validated end-to-end against a recorded bag via ros2 bag play. README documents it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The workstation depth task runs in the default/ROS env, so the nested `pixi run depth-server` inherited a PYTHONPATH pointing at the ROS Python 3.12 site-packages, and the depth env's Python 3.14 then loaded those incompatible numpy C-extensions. Drop PYTHONPATH for the server child only; the ROS node still needs it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Diagnostics for the L1 monocular-depth obstacle pipeline:
- depth_obstacle_node gains a publish_debug param (default true) that publishes
  the rescaled metric depth as a 32FC1 Image (/camera_depth) and the unfiltered,
  floor-inclusive cloud (/camera_cloud_full) for geometry checks.
- mote.rviz: Camera Obstacles + Camera Cloud (full) PointCloud2 displays
  (AxisColor by height) and a Depth image display.
- tools/measure_camera_pitch.py: lays the calibration checkerboard on the floor
  and solvePnPs its plane to read the camera's pitch/roll/height relative to the
  floor it sits on, folding in chassis tilt and local floor slope.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The floor-plane rescale fits a narrow near-floor band and extrapolates to far
walls, and depends on the camera->floor angle, which was measured to wander ~1.5
deg across rest positions (floor slope + how the robot rests) — so the obstacle
cloud over-ranged past the walls even stationary. Lidar gives metric range on the
walls themselves through the body-fixed lidar->camera transform, which is immune
to chassis/floor tilt.

LidarDepthRescaler matches each scan return to its camera pixel, samples the model
depth there, and fits the shared affine-in-disparity correction on those pairs.
The node buffers scans and matches the one nearest the image *capture* stamp
(absorbing the ~0.6 s off-board latency). rescale_source = auto|lidar|floor; auto
holds the last good lidar (a,b) when a scan can't constrain it rather than falling
back to the floor fit it replaces. Logs scale source, pair count, and scan dt.

Scale only; the cloud is still back-projected through the level-URDF transform, so
residual pitch can still skew floor-point z-classification — a follow-up plane-fit
on the lidar-scaled floor points will recover that.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The single-threaded executor was monopolized by the ~0.5 s blocking inference in
_on_image, so _on_scan was starved, the scan buffer went stale, and no scan
landed within scan_max_dt of an image's capture stamp — the node fit lidar once
at startup then held that one (bad) (a,b) forever. Run on a MultiThreadedExecutor
with the scan subscription in its own callback group so scans keep buffering
during inference; snapshot the deque when matching (read on the image thread,
appended on the scan thread).

Diagnostics for chasing fit quality: log scan-buffer depth, matched dt, pair
count, and the fitted (a,b); publish the raw pre-rescale model depth
(/camera_depth_raw) next to the rescaled one to separate model noise from a
runaway rescale.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Reconcile the lock with the merged pixi.toml so it carries both the depth env
(rebased) and the bag-recorder deps (from main #17).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
@MJohnson459 MJohnson459 force-pushed the l1-depth-perception branch from 66a032a to d6c1feb Compare June 29, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant