feat: Depth perception from mono camera#18
Open
MJohnson459 wants to merge 9 commits into
Open
Conversation
Monocular obstacle-detection spike under mote_perception (offline; no ROS nodes or deps added yet): - ground_projection.py: shared pixel<->floor geometry (camera->base via static TF) - free_space.py: classical appearance floor segmentation (spike — fast but false-positive prone under variable lighting) - depth_rescale.py: robust per-frame metric rescaling of learned mono-depth against the known floor plane (RANSAC affine-in-disparity) — the chosen L1 direction; inlier fraction gates seed contamination - tools/: offline bag harnesses (geometry overlay, classical/BEV/depth eval, segmentation video) for evaluating approaches against recorded bags Depth + floor-rescale gives ~0.19 m median range vs lidar and is lighting-robust; findings drove the decision to pursue learned depth off-board. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The L1 obstacle pipeline as two processes so torch stays out of the ROS/robot env: - tools/depth_server.py: keeps Depth Anything V2 (metric, indoor) resident and serves depth over a socket. Runs in a throwaway torch venv on the workstation. - depth_obstacle_node (rclpy, no torch; runs anywhere): forwards each compressed frame to the server, metrically rescales the returned depth against the known floor plane (depth_rescale), back-projects, keeps points above z_obstacle (default 0.02 m), and publishes /camera_obstacles. The cloud is stamped at image capture time so Nav2 places it via tf at the moment it was seen — how the off-board (~0.6 s, inference-bound) latency is absorbed without inflation. Lidar stays the primary, low-latency obstacle/clearing source; this is a supplementary marker for the low/thin things the 2D scan misses. z_obstacle=0.02 chosen from a floor-noise sweep across the bag: floor height noise is ~1.5 cm p99, so <1.5 cm false-positives on the floor; 2 cm is clean and the lidar already covers >=6 cm. depth_obstacles.py gains an obstacle-tint overlay. Validated end-to-end against a recorded bag via ros2 bag play. README documents it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The workstation depth task runs in the default/ROS env, so the nested `pixi run depth-server` inherited a PYTHONPATH pointing at the ROS Python 3.12 site-packages, and the depth env's Python 3.14 then loaded those incompatible numpy C-extensions. Drop PYTHONPATH for the server child only; the ROS node still needs it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Diagnostics for the L1 monocular-depth obstacle pipeline: - depth_obstacle_node gains a publish_debug param (default true) that publishes the rescaled metric depth as a 32FC1 Image (/camera_depth) and the unfiltered, floor-inclusive cloud (/camera_cloud_full) for geometry checks. - mote.rviz: Camera Obstacles + Camera Cloud (full) PointCloud2 displays (AxisColor by height) and a Depth image display. - tools/measure_camera_pitch.py: lays the calibration checkerboard on the floor and solvePnPs its plane to read the camera's pitch/roll/height relative to the floor it sits on, folding in chassis tilt and local floor slope. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The floor-plane rescale fits a narrow near-floor band and extrapolates to far walls, and depends on the camera->floor angle, which was measured to wander ~1.5 deg across rest positions (floor slope + how the robot rests) — so the obstacle cloud over-ranged past the walls even stationary. Lidar gives metric range on the walls themselves through the body-fixed lidar->camera transform, which is immune to chassis/floor tilt. LidarDepthRescaler matches each scan return to its camera pixel, samples the model depth there, and fits the shared affine-in-disparity correction on those pairs. The node buffers scans and matches the one nearest the image *capture* stamp (absorbing the ~0.6 s off-board latency). rescale_source = auto|lidar|floor; auto holds the last good lidar (a,b) when a scan can't constrain it rather than falling back to the floor fit it replaces. Logs scale source, pair count, and scan dt. Scale only; the cloud is still back-projected through the level-URDF transform, so residual pitch can still skew floor-point z-classification — a follow-up plane-fit on the lidar-scaled floor points will recover that. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The single-threaded executor was monopolized by the ~0.5 s blocking inference in _on_image, so _on_scan was starved, the scan buffer went stale, and no scan landed within scan_max_dt of an image's capture stamp — the node fit lidar once at startup then held that one (bad) (a,b) forever. Run on a MultiThreadedExecutor with the scan subscription in its own callback group so scans keep buffering during inference; snapshot the deque when matching (read on the image thread, appended on the scan thread). Diagnostics for chasing fit quality: log scan-buffer depth, matched dt, pair count, and the fitted (a,b); publish the raw pre-rescale model depth (/camera_depth_raw) next to the rescaled one to separate model noise from a runaway rescale. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Reconcile the lock with the merged pixi.toml so it carries both the depth env (rebased) and the bag-recorder deps (from main #17). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
66a032a to
d6c1feb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Turns the single RGB camera into a
PointCloud2of obstacles (/camera_obstacles)for a Nav2 voxel/obstacle layer — catching the low/thin things the 2D lidar plane
misses (cables, thresholds, table & chair legs, a robot vacuum). The lidar stays
the primary, low-latency obstacle and clearing source; this is a slower
supplementary marker.
Pipeline: Depth Anything V2 (metric, indoor) gives a dense depth map; its raw
metres are not accurate for our lens, so every frame is metrically rescaled
against the known floor plane (
depth_rescale.py, RANSAC affine-in-disparity —the camera's fixed height/pose is dense per-frame ground truth). The rescaled depth
is back-projected to 3D; points more than
z_obstacle(default 0.02 m) above thefloor become the cloud, stamped at image-capture time so Nav2 places it via tf
at the moment it was seen (this is how the off-board latency is absorbed).
Depth inference is too heavy for the Pi CPU (~0.5 s/frame), so it runs off-board
as two processes:
tools/depth_server.py— keeps the model resident and serves depth over asocket. Runs in a torch venv (kept out of the ROS/robot env on purpose):
depth_obstacle_node— light rclpy node (no torch); forwards each compressedframe to the server, rescales, and publishes the cloud. Runs anywhere (robot or
workstation):
Key params:
z_obstacle(height deadband, default 0.02 m — below ~1.5 cm floornoise false-positives),
range_min/range_max,server_host/server_port.Everything is developed and validated offline against recorded bags (
pixi run record):tools/depth_obstacles.pyoverlays the obstacle decision and comparesthe cloud to lidar; other
tools/*.pyare the spike harnesses behind the design.