refactor(gateway): pure-C++ managers via Transport adapters + discovery split + graph-event refresh#397
Draft
bburda wants to merge 18 commits intofeat/medkit-core-scaffoldfrom
Draft
Conversation
Extracts ServiceCallResult, the ActionGoalStatus enum + free action_status_to_string helper, the four Action*Result structs, and ActionGoalInfo into a dedicated header under core/operations/. The types were already neutral C++ but lived behind the rclcpp-pulling operation_manager.hpp include path, which prevented the neutral build layer from consuming them. The OperationManager header forwards to the new location; the helper definition moves alongside into a small core translation unit so the symbol is provided by gateway_core.
…ation Extracts ParameterErrorCode and ParameterResult into a dedicated header under core/configuration/. The structured-error contract becomes available to the neutral build layer; the ConfigurationManager header forwards to the new location.
Extracts FaultResult, the JSON-bag outcome shared by seven of the eight fault-manager methods, into a dedicated header under core/faults/. The existing FaultWithEnvResult still exposes raw message types and stays inside fault_manager.hpp until the transport extraction moves the message-to-JSON conversion to the adapter layer.
…sports Defines seven abstract ports - TopicTransport, ServiceTransport, ActionTransport, ParameterTransport, FaultServiceTransport, LogSource, TopicSubscriptionTransport - that the manager refactors will route through. Each port carries the same operations the corresponding manager performs today, expressed in pure C++ types (already-neutral result structs from core/operations, core/configuration, core/faults). The smoke test extends to assert each interface is abstract and reachable from the gateway_core build layer with no ament dependency on the link line. Adapter implementations land under src/ros2/transports/ in subsequent phases. Two name choices avoid collisions with structs that currently live in the ROS-coupled headers: * TopicTransport::sample returns TopicSample rather than TopicSampleResult, because core/data/data_types.hpp already defines a same-named struct with a different shape (per-topic batch errors, endpoint QoS) for the topic-data-provider plugin surface. * FaultServiceTransport::get_fault_with_env returns FaultWithEnvJsonResult rather than FaultWithEnvResult, because the legacy ros2_medkit_msgs-typed FaultWithEnvResult still lives next to the FaultManager facade. The two will be reconciled when the FaultManager migration lands. TopicSubscriptionHandle is a polymorphic RAII base (virtual destructor only) rather than a fully abstract port, so the smoke assertion uses sizeof > 0 instead of std::is_abstract_v.
…pter The data-access manager body becomes pure C++. All rclcpp use - generic publisher cache, JSON serializer instance, native sample backend, type introspection - moves into the new Ros2TopicTransport adapter under src/ros2/transports/. The manager now takes a shared_ptr<TopicTransport> plus the sample timeout and routes publish / sample / native-sample through it. The adapter exposes the type-introspection helper through the transport interface so handlers retain their existing accessor. The handler-facing public API (publish_to_topic, get_topic_sample_with_ fallback, get_topic_sample_native, get_type_introspection, set_topic_data_provider, get_topic_data_provider, get_topic_sample_ timeout) is preserved verbatim. The legacy include path ros2_medkit_gateway/data_access_manager.hpp is preserved as a forwarding shim. New mock-transport tests link only against gateway_core + GTest, proving the manager logic compiles without rclcpp on the link line.
…sports OperationManager retains all goal-tracking, UUID generation, validation, and component-namespace resolution logic in pure C++. The rclcpp side - GenericServiceClient cache, action-internal-service clients, the GoalStatusArray subscription cache - moves into two new adapters, Ros2ServiceTransport and Ros2ActionTransport, under src/ros2/transports/. The manager registers a per-goal status callback at subscribe time so the action transport pushes goal-status updates back into the manager's tracking map on its own thread. Goal-status array decoding moves into the transport; the manager only consumes the typed (path, goal_id, status) tuples. The handler-facing public API (call_service, call_component_service, send_action_goal, send_component_action_goal, cancel_action_goal, get_action_result, list_tracked_goals, update_goal_status, update_goal_feedback, cleanup_old_goals, subscribe_to_action_status, unsubscribe_from_action_status, the four static is_*-type helpers, the UUID helpers) is preserved. The component-name resolver dependency moves behind a new ServiceActionResolver port that DiscoveryManager now implements; this keeps the manager translation unit out of discovery_manager.hpp's rclcpp transitive include chain. Mock-transport tests link only against gateway_core + GTest, exercising 13 routing / lifecycle scenarios.
ConfigurationManager loses its rclcpp::SyncParametersClient cache, the rclcpp::Parameter defaults cache, the negative-cache for unreachable nodes, and the spin_mutex serialising parameter-client spins. All of these move into the new Ros2ParameterTransport adapter under src/ros2/transports/. The adapter also owns the JSON <-> rclcpp:: ParameterValue conversion helpers and the gateway-own-FQN check that previously lived as private members of the manager. Manager retains: per-entity reset orchestration (reset_parameter and reset_all_parameters compose set_parameter with transport-supplied defaults via the new get_default and list_defaults methods on ParameterTransport), the self-node short circuit (delegated to the transport's is_self_node), and the shutdown ordering contract (idempotent manager.shutdown() fans out to transport.shutdown() before rclcpp::shutdown()). The handler-facing public API (list_parameters, get_parameter, set_parameter, reset_parameter, reset_all_parameters, shutdown) is preserved verbatim; the legacy include path ros2_medkit_gateway/configuration_manager.hpp is preserved as a forwarding shim. Parameter-service tuning parameters (parameter_service_timeout_sec, parameter_service_negative_cache_sec) are declared at gateway_node wiring time and passed to the transport constructor. ParameterTransport gains two new pure-virtual methods (get_default, list_defaults) that surface the cached defaults as neutral JSON for manager-level reset orchestration. The gateway_core link-time smoke test now also pins ConfigurationManager. Mock-transport tests link exclusively against gateway_core + GTest, with no rclcpp on the link line, covering CRUD delegation, self-node short-circuit, reset via cached defaults, partial-failure aggregation in reset_all_parameters, and shutdown idempotency.
FaultManager loses its seven rclcpp::Client members and their seven
per-client mutexes; all move into the new Ros2FaultServiceTransport
adapter under src/ros2/transports/. The adapter performs the
ros2_medkit_msgs to JSON conversion internally and returns the neutral
FaultResult / FaultWithEnvJsonResult structures, so the manager body now
lives in the ROS-free build layer (gateway_core).
The handler-facing public API (report_fault, list_faults, get_fault,
get_fault_with_env, clear_fault, get_snapshots, get_rosbag,
list_rosbags, is_available, wait_for_services) is preserved verbatim.
The single behaviour change is that get_fault_with_env now returns
the response body as JSON: data carries { "fault": {...},
"environment_data": {...} } already converted by the transport. The
single handler call site (build_sovd_fault_response) is updated to
consume the JSON, post-processing rosbag snapshots to add the per-
request bulk_data_uri and freeze_frame snapshots to extract the
primary value.
The previously static FaultManager::fault_to_json helper is removed.
Two external call sites (SSEFaultHandler, TriggerFaultSubscriber)
subscribe directly to the FaultEvent topic and convert the message
themselves; they now use the small ros2/conversions/fault_msg_conversions
module that lives at the ROS-coupled boundary alongside the transport.
The legacy include path ros2_medkit_gateway/fault_manager.hpp is
preserved as a forwarding shim. New mock-transport tests link only
against gateway_core + GTest, proving the manager logic compiles
without rclcpp on the link line.
LogManager keeps its ring-buffer storage, per-entity config map, monotonic id counter, plugin observer pattern, and the manages_ingestion short-circuit - all pure C++. The /rosout subscription and the rcl_interfaces::msg::Log to LogEntry conversion move into the new Ros2LogSource adapter under src/ros2/transports/. The adapter emits LogEntry through a callback the manager registers at start() time; when the primary LogProvider declares full ingestion, the manager never calls start() (matching today's behaviour of not creating the subscription at all in that mode). To let the manager body live in gateway_core alongside the other managers, the PluginManager pointer is replaced with a thin LogProviderRegistry port (primary_log_provider + log_observers). PluginManager implements the port inline, so production wiring is unchanged. The handler-facing public API (get_logs, get_config, update_config, add_log_entry, set_notifier, set_node_to_entity_resolver, the static helpers, inject_entry_for_testing) is preserved verbatim. Mock-source tests link only against gateway_core + GTest, proving the manager logic compiles without rclcpp on the link line. The legacy include path ros2_medkit_gateway/log_manager.hpp is preserved as a forwarding shim.
…sport TriggerManager keeps all its condition-evaluation, retry-unresolved, sweep-orphaned, persistence, dispatch-index, and entity-cache logic in pure C++. The pointer to TriggerTopicSubscriber is replaced with a shared TopicSubscriptionTransport interface; the new Ros2TopicSubscriptionTransport adapter wraps the existing TriggerTopicSubscriber, preserving its subscription-destructor pattern. Per-trigger subscription handles live in the manager's tracking map - destruction unsubscribes - so trigger lifetime fully drives subscription lifetime. The trigger manager source moves to src/core/managers/ so it is picked up by gateway_core's GLOB. TriggerTopicSubscriber moves to include/ros2_medkit_gateway/ros2/ and src/ros2/, and a forwarding shim at the old header path keeps existing includes working. TriggerTopicSubscriber itself becomes a generic per-handle subscription executor: subscribe(topic, type, handle_key, callback) creates one GenericSubscription per key, unsubscribe(handle_key) tears it down, and pending/retry semantics are preserved with the same 60s timeout. The handler-facing public API (create / get / list / update / remove, wait_for_event, consume_pending_event, set_on_removed, set_entity_children_fn, set_entity_exists_fn, set_resolve_topic_fn, retry_unresolved_triggers, sweep_orphaned_triggers, load_persistent_triggers) is preserved verbatim. Mock-transport tests link only against gateway_core + GTest, proving the manager body compiles without rclcpp on the link line.
RuntimeDiscoveryStrategy is replaced by Ros2RuntimeIntrospection, which implements the existing IntrospectionProvider interface used by the merge pipeline for plugin-driven entity discovery. Built-in ROS graph queries are now treated identically to plugin-provided introspection - one chain, one merge policy. HybridDiscoveryStrategy is removed; the equivalent runtime + manifest + plugin merge is the default MergePipeline configuration. The discovery_mode parameter (runtime_only / manifest_only / hybrid) now controls which layers the merge pipeline activates rather than which strategy class to instantiate.
The class is the rosidl_typesupport_cpp + rosidl_typesupport_introspection_cpp
bridge built on top of JsonSerializer; it semantically belongs alongside the
rest of the rosidl glue rather than in the gateway. Gateway packages now
depend on the serialization library for type schemas instead of duplicating
the introspection backend.
- Move type_introspection.{hpp,cpp} to ros2_medkit_serialization (preserves
Apache 2.0 header verbatim).
- Migrate the namespace from ros2_medkit_gateway to ros2_medkit_serialization;
qualify all consumer call sites (gateway managers, transports, providers,
handlers, discovery, tests).
- Update forward declarations in topic_transport.hpp,
data_access_manager.hpp, discovery_manager.hpp.
- Move the TypeIntrospection unit suite from
src/ros2_medkit_gateway/test/test_data_access_manager.cpp to
src/ros2_medkit_serialization/test/test_type_introspection.cpp; register
the new test target.
- Drop src/type_introspection.cpp from the gateway gateway_ros2 source list.
The rosidl_typesupport_cpp / rosidl_typesupport_introspection_cpp deps remain
in the gateway because the GenericClient compat shim still uses them.
Replace the cyclic wall-timer loop that called refresh_cache() every refresh_interval_ms with an event-driven design: - A 100 ms wall timer polls rclcpp::Node::get_graph_event()->check_and_clear() and runs refresh_cache() only when the ROS 2 graph actually changed (node up/down, topic/service/action add or remove). On a stable graph the callback is a single atomic check. - A second wall timer at refresh_interval_ms acts as a safety backstop, forcing an unconditional refresh so liveness is preserved if a graph event is ever missed. Semantics of the existing refresh_interval_ms parameter shift from "primary refresh interval" to "safety-backstop interval", and the default moves from 10 s (gateway_params.yaml) / 2 s (gateway.launch.py) to 30 s. The validation range stays 100-60000 ms; existing test overrides at 1000 ms continue to work, just as a tighter backstop. Test impact: - New integration test verifies that a node spawned mid-run appears in /apps within 5 s while the gateway runs with a 30 s backstop, proving the graph-event poll path is what drives the refresh. - test_operation_handlers seeds the entity cache directly and used to rely on refresh_cache() never firing during the 1 s settle period. Graph events from the test executor now do trigger refreshes; seed after the settle so the test's manually-injected component wins. Design intent on idle CPU: the previous timer woke and ran the full refresh_cache() pipeline every refresh_interval_ms regardless of whether the graph had changed. With this change an idle gateway runs only the 100 ms graph-event check (a single atomic load) plus the 30 s backstop. Documentation updated: README.md parameter tables, gateway_params.yaml comments, design/aggregation.rst.
Persistent triggers loaded at startup were being removed before DDS discovery had populated the entity cache. The graph-event refresh path fires on every node up/down event, including the cold-start window where the cache reflects only a partial view of the ROS 2 graph. Running the orphan sweep on every graph event treated restored triggers as orphans because their target entities had not yet been seen by this gateway, removing them from both memory and the persistent SQLite store. Move the sweep call to the backstop timer only. The backstop runs at the configured refresh cadence (default 30 s, 1 s in tests), giving DDS time to converge before any orphan check. The graph-event path keeps doing the cheap cache refresh, so spawn-detection latency is unchanged.
Each test in the suite connects to a non-existent OPC UA host and waits about 3.8 s for the DNS / TCP failure path. With 13 tests the wallclock run requires roughly 90 s. The ament_add_gtest default timeout of 60 s truncated the run mid-suite, which CTest then surfaced as a "missing_result" error under heavy parallelism (visible in concurrent colcon test invocations on multi-core hosts). Set TIMEOUT 240 to give a comfortable margin without hiding genuine hangs.
The neutral-managers refactor moved DataAccessManager, OperationManager, ConfigurationManager, FaultManager, LogManager, and TriggerManager into ``gateway_core`` and gave each one a Transport interface for ROS interaction. Update the layered-library note to reflect that placement and the new neutral interfaces. TriggerTopicSubscriber became a generic per-handle subscription executor wrapped by Ros2TopicSubscriptionTransport. Update its class summary, component list, and the TriggerManager arrow on the class diagram so they describe the new per-trigger handle ownership rather than the old reference-counted shared-subscription model.
Apply clang-tidy fixes across the files this branch already modifies so
the incremental clang-tidy job (which scans every file changed in a PR)
stays clean.
- ``fault_handlers.cpp``: replace temp-allocating ``operator+`` chain in
``bulk_data_uri`` construction with explicit ``operator+=`` /
``std::move`` (performance-inefficient-string-concatenation).
- ``test_configuration_manager.cpp``: pre-allocate ``threads.reserve(...)``
before ``emplace_back`` loops (performance-inefficient-vector-operation).
- ``test_configuration_manager_routing.cpp``,
``test_data_access_manager_routing.cpp``,
``test_operation_manager_routing.cpp``,
``test_operation_handlers.cpp``: name unused parameters in mock and
override signatures (readability-named-parameter,
misc-unused-parameters).
- ``test_fault_handlers.cpp``: switch single-character
``std::string::find("X")`` calls to ``find('X')``
(performance-faster-string-find).
- ``test_operation_handlers.cpp``: replace ``std::bind(...)`` with
forwarding lambdas (modernize-avoid-bind), and pass shared_ptr
arguments by const-reference instead of by value
(performance-unnecessary-value-param).
- ``test_trigger_manager_routing.cpp``: use ``empty()`` instead of
``size()`` in a boolean context (readability-container-size-empty).
After these changes ``clang-tidy -p build`` is clean on every cpp this
branch touches.
Follow-up to the earlier clang-tidy fixup. After clang-format wrapped the goal-callback lambda onto its own line, clang-tidy re-evaluated the parameter list and re-flagged ``goal`` as a copied ``std::shared_ptr`` only used as a const reference. Switch the parameter to ``const ...&`` to match the rest of the action lambdas in the file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Summary
This PR is the second pass after #391: each SOVD manager keeps its routing and
business logic, but the ROS 2 binding is extracted behind neutral Transport
adapters. Manager unit tests can now compose with mock transports and link only
against
gateway_core, which removesrclcppfrom the unit-test link line andeliminates the executor-lifecycle setup that was the root cause of intermittent
test failures (
process spawned but graph not yet seen,subscription destroyed mid-callback).The change also folds
RuntimeDiscoveryStrategyinto the existingIntrospectionProviderchain (built-in graph queries become just anotherprovider), relocates
TypeIntrospectionnext to the rest of therosidl_typesupportglue inros2_medkit_serialization, and switches theruntime-discovery refresh from cyclic polling to rclcpp graph-event driven, with
the timer kept only as a low-frequency safety backstop.
This PR stacks on #392 (
feat/medkit-core-scaffold). When #392 merges, GitHubauto-rebases this branch onto
main.Issue
Link the related issue (required):
Type
Testing
How was this tested / how should reviewers verify it?
The changes were exercised through the full local test suite. Reviewers can
reproduce by running, from the gateway worktree:
colcon build --symlink-install- whole workspace compiles cleanly(15 packages, no errors)
./scripts/test.sh unit- 2727 unit tests, 0 failures, 0 errors. Thenew mock-transport tests link only against
gateway_core+ GTest withno
ament_target_dependencies../scripts/test.sh lint --packages-select ros2_medkit_gateway- linterspass (clang-format, copyright, cmake-lint, flake8, pep257, xmllint)
./scripts/test.sh integ --packages-select ros2_medkit_integration_tests- 4501 tests, 0 failures, including the
test_graph_event_discoveryand
test_triggers_persistentfeaturescolcon build --packages-select ros2_medkit_gateway --cmake-args -DENABLE_CLANG_TIDY=ONthen
./scripts/test.sh tidy --packages-select ros2_medkit_gateway-clang-tidy clean on every changed file
colcon test --packages-select ros2_medkit_gateway --ctest-args -R "gateway_core_smoke"- the smoke test links only
gateway_core+ GTest and proves all eightprovider interfaces, seven Transport ports, and six manager class names
are reachable without rclcpp
To verify the discovery latency improvement, launch the gateway, then
ros2 runa demo node and observe/appslisting the new entity below 500 ms.Checklist
Commits
13 commits build up the main change in self-contained, reviewable steps. Each
step keeps the suite green and ends with the project compiling cleanly.
refactor(gateway): relocate operation result types into core/operationsrefactor(gateway): relocate parameter result types into core/configurationrefactor(gateway): relocate FaultResult into core/faultsfeat(gateway): introduce neutral Transport interfaces under core/transportsrefactor(gateway): route DataAccessManager through TopicTransport adapterrefactor(gateway): route OperationManager through Service/Action transportsrefactor(gateway): route ConfigurationManager through ParameterTransportrefactor(gateway): route FaultManager through FaultServiceTransportrefactor(gateway): route LogManager through LogSource adapterrefactor(gateway): route TriggerManager through TopicSubscriptionTransportrefactor(gateway): convert runtime discovery to IntrospectionProviderrefactor: relocate TypeIntrospection to ros2_medkit_serializationrefactor(gateway): drive discovery refresh from rclcpp graph eventsTwo functional fixes, one doc sweep, and a small clang-tidy / formatting
follow-up were folded into the branch during verification:
fix(gateway): only sweep orphan triggers on the backstop refresh- thegraph-event refresh path was running the orphan-trigger sweep during the
cold-start window, before DDS discovery had populated the entity cache.
Restored persistent triggers were then treated as orphans and removed from
the SQLite store. Sweep now runs only from the backstop tick, where the
cache is guaranteed to reflect a settled DDS view.
fix(opcua): bump test_opcua_plugin TIMEOUT to 240s- the suite legitimatelyneeds ~90 s wallclock for its 13 OPC UA connection-failure tests; the
default 60 s timeout truncated the run mid-suite under heavy parallel
colcon test load.
docs(gateway): refresh design index for managers-in-core layoutupdatesthe layered-library note, the TriggerTopicSubscriber class summary, and
the TriggerManager arrow on the class diagram so the design document
matches the new manager / transport split.
fix(gateway): resolve clang-tidy findings on touched filesand a smallfollow-up commit address every clang-tidy warning the incremental PR
job emits on the cpp files this branch modifies (named-parameter,
std::bind-> lambda, single-characterfind(), copiedshared_ptrparameters,vector::reservebeforeemplace_back,and one
operator+=rewrite infault_handlers).