Skip to content

Decouple gateway managers from rclcpp via provider injection for testability and reduced flakiness #396

@bburda

Description

@bburda

Summary

Today every gateway test pulls rclcpp and the ament environment because the manager classes (DataAccessManager, OperationManager, ConfigurationManager, FaultManager facade, LogManager, TriggerManager) take rclcpp::Node * directly and route I/O inline. That coupling makes unit tests heavy (multi-second startup), forces every test fixture through MultiThreadedExecutor lifecycle, and makes test failures timing-sensitive - flaky process spawned but graph not yet seen and subscription destroyed mid-callback patterns appear repeatedly in CI history.

This issue tracks the next pass after #391: each manager keeps its routing and business logic but gains a register_provider(...) injection point, with the ROS-specific I/O extracted into Ros2*Provider classes living in the adapter layer. After the change, manager unit tests compose with mock providers and link only against gateway_core (the neutral library introduced in #391), which removes rclcpp from the unit-test link line and eliminates the executor-lifecycle setup that is the root cause of intermittent failures.


Proposed solution (optional)

Concretely:

  • Six managers gain provider injection: Data, Operation, Configuration, Fault, Log, Trigger. Each manager moves to core/managers/. The ROS-specific default behaviour is extracted into Ros2*Provider implementations under src/ros2/providers/ (or equivalent path in the adapter layer), registered statically by gateway_node at startup. Plugins continue to register per-entity providers via PluginManager exactly as today.
  • RuntimeDiscoveryStrategy becomes Ros2RuntimeIntrospection : IntrospectionProvider. The discovery framework in core/ then routes through the existing IntrospectionProvider chain and stops referencing rclcpp::Node. HybridDiscoveryStrategy collapses into the standard MergePipeline configuration.
  • TypeIntrospection moves to ros2_medkit_serialization, where rosidl_typesupport_cpp and rosidl_typesupport_introspection_cpp already live. The gateway then depends on the serializer for type schemas instead of duplicating the rosidl bridge inside the gateway tree.
  • Runtime discovery trigger switches from cyclic wall_timer(refresh_interval_ms_) polling to event-driven graph_event->check_and_clear(). The polling timer remains as a low-frequency safety backstop. Reduces idle CPU when the graph is stable and improves new-node detection latency.

Out of scope (intentionally deferred): folding core/ into a separate colcon package - that is a layout change and does not affect the test-link-line goal.


Additional context (optional)

Outcome metrics expected after the work:

  • Manager unit tests link only gateway_core + GTest (no ament_target_dependencies), proven by extending test_gateway_core_smoke to include a manager headers and have a representative manager test compile against the mock-provider variant.
  • Idle gateway process CPU usage drops measurably once the discovery refresh stops firing on a stable graph (sample with top before/after on the demo nodes scenario).
  • Existing test suite (~2500 unit, ~3200 integration, ~2600 clang-tidy) stays green without timing-sensitive sleeps in tests we touch.
  • New-node detection latency (time from ros2 run to gateway /apps listing the new entity) drops below 500 ms in steady-state operation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions