fix(gateway): aggregate component /logs and /bulk-data from hosted apps' fqns#394
fix(gateway): aggregate component /logs and /bulk-data from hosted apps' fqns#394
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes empty results for synthetic/runtime-discovered components on /logs and /bulk-data/rosbags by aggregating from hosted apps’ effective FQNs (matching existing AREA/FUNCTION aggregation behavior) and updates docs/tests accordingly.
Changes:
- Update
LogHandlers::handle_get_logsto aggregate component logs from hosted apps and emit component-level aggregation metadata. - Update
BulkDataHandlers::get_source_filtersto aggregate component bulk-data sources from hosted apps with manifest fallback behavior. - Add integration tests for component aggregation behavior and update REST API documentation for component logs aggregation.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ros2_medkit_gateway/src/http/handlers/log_handlers.cpp | Aggregate component logs from hosted apps’ FQNs + add x-medkit aggregation metadata. |
| src/ros2_medkit_gateway/src/http/handlers/bulkdata_handlers.cpp | Extend hosted-app aggregation to component source filters with fallback behavior. |
| src/ros2_medkit_integration_tests/test/features/test_logging_api.test.py | Add integration test validating non-empty aggregated component logs + metadata. |
| src/ros2_medkit_integration_tests/test/features/test_bulk_data_api.test.py | Add integration test validating aggregated component bulk-data descriptors. |
| docs/api/rest.rst | Document component logs aggregation and fallback behavior. |
Synthetic / runtime-discovered components have an empty fqn (no manifest
namespace), so the existing GET /components/{id}/logs path - which fed
entity.fqn to LogManager::get_logs with prefix_match=true - silently
returned zero items even when 20+ active nodes were mapped to that
component. Manifest-defined components with a real fqn worked because
their fqn matched the per-node logger name buffers.
Mirror the AREA / FUNCTION semantics already in this handler: look up
hosted apps via cache.get_apps_for_component(), build their effective
fqns, and call get_logs(host_fqns, prefix_match=false). Fall through to
the original namespace prefix path only when the component has no
hosted apps but still has a non-empty fqn (manifest deployments where
the component groups topics rather than nodes).
Response now also carries x-medkit.aggregation_level=component plus
aggregation_sources / app_count, parity with how AREA and FUNCTION
already advertise their aggregation. Adds an integration test verifying
component logs aggregate child app entries (verifies REQ_INTEROP_061).
…fqns
The same synthetic-component bug already fixed for /components/{id}/logs
also applies to BulkDataHandlers::get_source_filters. Synthetic /
runtime-discovered components have empty fqn AND empty namespace_path,
so the FUNCTION-only aggregation arm was not enough: components silently
returned zero source filters, producing empty rosbag descriptor lists
and failing ownership checks on download.
Extend the aggregation arm to handle COMPONENT identically - resolve
hosted apps via cache.get_apps_for_component() and build their
effective_fqn() list. Manifest deployments where a component groups
topics fall through to the existing namespace-prefix path.
Strengthen the integration test for the original logs fix: assert items
is non-empty (the bug returned 0 items with valid x-medkit metadata, so
app_count alone is insufficient) and match aggregation_sources by FQN
substring instead of bare app id. Add an integration test for component
bulk-data aggregation, and update docs/api/rest.rst to describe the
actual aggregation behavior for /components/{id}/logs.
…olve_app_host_fqns
Address review feedback on the component log/bulk-data aggregation fix:
- Extract the per-app FQN resolution loop (cache lookup + effective_fqn +
empty-skip) to HandlerContext::resolve_app_host_fqns and call it from
log_handlers (FUNCTION/AREA/COMPONENT branches) and bulkdata_handlers
(FUNCTION/COMPONENT). Removes three duplicate copies of the same loop
and gives both handlers a single unit-tested helper.
- Split FUNCTION and COMPONENT branches in BulkDataHandlers::get_source_filters.
The previous combined arm fell through to entity.fqn / namespace_path for
both types when host_fqns was empty, which silently changed FUNCTION
semantics: functions are pure aggregated views and must return an empty
filter list when no apps host them. COMPONENT keeps the fall-through for
manifest-only deployments.
- Make BulkDataHandlers::get_source_filters public (was private). The
branching logic is non-trivial and needs unit coverage; calling it
through the REST surface forces a real FaultManager service which is
not viable in unit tests.
- Add unit tests:
* 7 ResolveAppHostFqnsTest cases pinning the silent-skip semantics
(missing apps, empty effective_fqn, bound_fqn precedence).
* 9 BulkDataSourceFiltersTest cases covering APP / AREA / FUNCTION /
COMPONENT branching, including the regression guard that FUNCTION
without hosts returns empty even if the entity carried a fqn.
* 3 ComponentLogs* cases on the existing AreaAggregationTest fixture
that exercise the COMPONENT branch in handle_get_logs end-to-end via
the REST server: synthetic component aggregates from hosted apps,
manifest-only component falls through, empty component returns empty
items without crashing.
- Set ROS_DOMAIN_ID for test_bulkdata_handlers (now stands up a real
GatewayNode, needs domain isolation per CMakeLists.txt convention).
- Strengthen the integration test for component logs aggregation:
* Wrap the assertion in poll_endpoint_until - the /rosout buffer fills
asynchronously after node startup, the prior immediate assertion was
timing-dependent.
* Match aggregation_sources by FQN substring instead of bare app id
(the response carries full FQNs like /powertrain/engine/temp_sensor).
- Clarify docs/api/rest.rst: app_count and aggregation_sources are
populated only when hosted-app aggregation is active and are omitted
under the namespace-prefix fallback.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/ros2_medkit_gateway/test/test_handler_context.cpp:57
jsonis used throughout this test file (e.g.,json::parse(...)) but theusing json = nlohmann::json;alias was removed and there is no replacement alias/namespace import. This will not compile. Re-introduce the alias (or fully-qualifynlohmann::json) in this file.
using namespace ros2_medkit_gateway;
using namespace ros2_medkit_gateway::handlers;
// =============================================================================
// HandlerContext static method tests (don't require GatewayNode)
// =============================================================================
TEST(HandlerContextStaticTest, SendErrorSetsStatusAndBody) {
httplib::Response res;
HandlerContext::send_error(res, 400, ERR_INVALID_REQUEST, "Test error message");
EXPECT_EQ(res.status, 400);
EXPECT_EQ(res.get_header_value("Content-Type"), "application/json");
auto body = json::parse(res.body);
EXPECT_EQ(body["error_code"], ERR_INVALID_REQUEST);
EXPECT_EQ(body["message"], "Test error message");
| /** | ||
| * @brief Get source filters for rosbag queries based on entity type. | ||
| * | ||
| * For apps/components/areas: returns the entity's FQN or namespace path. | ||
| * For functions: aggregates FQNs from all hosting apps (read-only | ||
| * aggregated view - upload/delete are blocked at the route level). | ||
| * - APP / AREA: returns the entity's FQN or namespace path (single filter). | ||
| * - FUNCTION: aggregates non-empty effective_fqn() values across all hosted apps | ||
| * (no fallback - functions are pure aggregated views). | ||
| * - COMPONENT: aggregates from hosted apps; falls back to FQN/namespace_path only | ||
| * when the component has no hosted apps (manifest deployments where the component | ||
| * groups topics rather than nodes). This avoids the synthetic-component bug where | ||
| * empty fqn + empty namespace_path produced zero source filters. | ||
| * | ||
| * Public to enable unit testing the entity-type branching directly. | ||
| * | ||
| * @param entity Entity information | ||
| * @return Vector of source filter strings (empty if no valid filters) | ||
| */ | ||
| std::vector<std::string> get_source_filters(const EntityInfo & entity) const; | ||
|
|
There was a problem hiding this comment.
get_source_filters() was moved into the public section specifically for unit testing, but this header is under include/ and is likely installed/exported. Exposing this internal helper expands the public API surface and makes future refactors harder. Prefer keeping it private and granting test access via FRIEND_TEST(...), or extracting the branching logic into a separate testable helper (e.g., a free function in a detail namespace) without changing the handler's public interface.
There was a problem hiding this comment.
Done. Moved get_source_filters back to private and extracted the entity-type branching to handlers::detail::compute_bulkdata_source_filters (a free function in a detail namespace, declared at the bottom of bulkdata_handlers.hpp). The instance method is now a one-line wrapper that fetches the cache from ctx_ and delegates. Tests target the free function directly with a bare ThreadSafeEntityCache - no BulkDataHandlers instance, no GatewayNode, no DDS context required.
| auto comp_fqns = HandlerContext::resolve_app_host_fqns(cache, cache.get_apps_for_component(comp_id)); | ||
| host_fqns.insert(host_fqns.end(), std::make_move_iterator(comp_fqns.begin()), | ||
| std::make_move_iterator(comp_fqns.end())); | ||
| } |
There was a problem hiding this comment.
This code uses std::make_move_iterator(...) but the file doesn't include <iterator>. Relying on indirect includes is non-portable and can break builds with different standard library implementations. Add #include <iterator> (or avoid make_move_iterator here).
There was a problem hiding this comment.
Done. Added #include <iterator> to log_handlers.cpp.
7f8884f to
778dd0f
Compare
Address two review findings: - Move ``get_source_filters`` back to ``private``. Extract the pure entity-type branching to ``handlers::detail::compute_bulkdata_source_filters``, a free function that takes a ``ThreadSafeEntityCache`` reference plus an ``EntityInfo``. The instance method becomes a one-line wrapper that fetches the cache from ``ctx_`` and delegates. This keeps the handler's public API unchanged while letting the unit tests target the helper directly without spinning up a ``GatewayNode`` (the previous fixture stood up a real ``GatewayNode`` purely to obtain a cache reference). - Add ``#include <iterator>`` to ``log_handlers.cpp`` for ``std::make_move_iterator`` (was relying on transitive inclusion). Test rewrite: ``BulkDataSourceFiltersTest`` now constructs a bare ``ThreadSafeEntityCache``, calls ``cache_.update_all(...)``, and invokes ``handlers::detail::compute_bulkdata_source_filters`` directly. No DDS context, no rclcpp init, no ``ros2_medkit_test_utils`` shared library required. Drops the ``medkit_set_test_domain`` line for the same reason - this fixture no longer creates a ROS node, so domain isolation is moot. All 9 entity-type cases (synthetic component, manifest fallback, function without hosts regression guard, etc.) still cover the same scenarios.
…-error The rosdep keys for ``ament_cmake_clang_tidy`` and ``ament_cmake_clang_format`` are not registered for noble (the Rolling runner image), so ``rosdep install`` hard-fails before the build can start. Linters / clang-tidy already run only in the Jazzy job in ``quality.yml`` (gated by ``ENABLE_CLANG_TIDY`` in ``ROS2MedkitLinting.cmake``), so the build-and-test matrix on Humble + Rolling does not need those CMake packages installed - skip the keys instead of trying to resolve them. ``opcua-plugin.yml`` already follows the same pattern. Drop ``continue-on-error`` for the Rolling job. Rolling is now treated as a required-green distro per CLAUDE.md - any Rolling-specific failure is a real breakage that needs a code-side fix or an explicit owner sign-off, not a silent soft-fail.
ci.yml skips the ``ament_cmake_clang_format`` rosdep key on Humble + Rolling (linters only run in the Jazzy quality job, the rosdep key isn't even registered for noble), so ``find_package(... REQUIRED)`` hard-fails the configure step on those runners. Switch all 8 affected packages to ``find_package(ament_cmake_clang_format QUIET)`` plus an ``if(_FOUND)`` guard around the corresponding ``ament_clang_format(...)`` call. On Jazzy (where the package is installed in the quality job's rosdep step) the behavior is unchanged - clang-format still runs and passes 310 tests on ``ros2_medkit_gateway``. On Humble + Rolling the configure step now proceeds without the package and the linter is silently skipped, which matches the project policy that linters only run on Jazzy. The companion ``ament_cmake_clang_tidy`` find_package in ``ROS2MedkitLinting.cmake`` is already gated by the ``ENABLE_CLANG_TIDY`` option (default OFF; only set ON in the Jazzy clang-tidy job in quality.yml), so no change needed there.
…test job The opcua-plugin Unit tests matrix runs on humble + jazzy + rolling. On noble (Rolling) the ament_cmake_clang_format / ament_cmake_clang_tidy rosdep keys are not registered, so the previous skip-keys list (which covered only nav2_msgs) caused rosdep install to fail before any test could run. Add the linter keys to skip-keys, mirroring the change just made in ci.yml. The upstream medkit packages have already been switched to QUIET + FOUND for those find_package calls so the build degrades gracefully when the linter packages aren't installed.
The first RUN clears ``/var/lib/apt/lists/*`` to keep the builder image small. The next RUN then calls ``rosdep install``, which in turn invokes ``apt-get install`` under the hood without first refreshing the apt cache. On cold Docker builds this fails with ``E: Unable to locate package python3-jsonschema`` (declared as a ``<test_depend>`` by ``ros2_medkit_integration_tests`` and pulled in by rosdep even with ``BUILD_TESTING=OFF``). The OPC-UA plugin Integration jobs were masking this on main via warm BuildKit layer caches and only blew up after the cache was evicted. Add an explicit ``apt-get update`` at the start of the rosdep RUN, and move the ``rm -rf /var/lib/apt/lists/*`` cleanup to the end of the same layer so the runtime image still inherits no apt cache.
Summary
Fix two handlers that silently returned empty results for synthetic / runtime-discovered components, by mirroring the AREA / FUNCTION aggregation pattern already present in the same files.
LogHandlers::handle_get_logs- the COMPONENT branch fedentity.fqntoLogManager::get_logs(..., prefix_match=true). Synthetic components have an emptyfqn, so the prefix match selected nothing and the response was an emptyitemslist even when the component grouped N active apps. Resolve hosted apps viacache.get_apps_for_component()and callget_logs(host_fqns, prefix_match=false). Manifest deployments with a non-emptyfqnand no hosted apps fall through to the original namespace-prefix path. Response now also carriesx-medkit.aggregation_level=component,app_count, andaggregation_sourcesfor parity with AREA and FUNCTION.BulkDataHandlers::get_source_filters- identical pattern. The handler had a FUNCTION-only aggregation arm and fell through toentity.fqn/entity.namespace_pathfor COMPONENT. Both empty for synthetic components, so source filters were{}, descriptor lists came back empty, and ownership checks inhandle_downloadshort-circuited. Extended the aggregation arm to handle COMPONENT identically; manifest-only components still fall through to the namespace prefix path.docs/api/rest.rstupdated to describe the actual aggregation behavior for/components/{id}/logsinstead of the stalenamespace prefix matchwording.Tests
test_component_get_logs_aggregates_child_appsintest_logging_api.test.pyverifies a synthetic component returns non-emptyitemsplus the new aggregation metadata. Assertslen(items) > 0directly because the original bug returned 0 items with otherwise validx-medkitmetadata, so checkingapp_countalone is not sufficient. Matchesaggregation_sourcesby FQN substring instead of bare app id.test_bulk_data_component_aggregates_child_appsintest_bulk_data_api.test.pyverifies the equivalent fix for the bulk-data endpoint.SOVD API impact
x-medkitextensions on/components/{id}/logsonly - new optionalaggregation_level,aggregated,app_count,aggregation_sourcesfields, emitted only when local aggregation is active. Existing clients that ignore unknownx-medkit.*fields are unaffected./components/{id}/bulk-data/rosbagspayload schema is unchanged.Issue
A related but distinct symptom on
handle_get_faultfor synthetic components is a scope-leak that needs a different fix (changing the GetFault service contract), tracked separately - see comment on #380.Type
x-medkitaggregation metadata on/components/{id}/logs)docs/api/rest.rstlog endpoints description)Testing
All local runs green:
./scripts/test.sh unit --packages-select ros2_medkit_gateway- 2099 assertions across 81 tests, 0 failures../scripts/test.sh test_logging_api --packages-select ros2_medkit_integration_tests- 22 logging tests + shutdown, 0 failures. Includes the newtest_component_get_logs_aggregates_child_apps../scripts/test.sh test_bulk_data_api --packages-select ros2_medkit_integration_tests- 10 tests, 0 failures. Includes the newtest_bulk_data_component_aggregates_child_apps../scripts/test.sh lint --packages-select ros2_medkit_gateway ros2_medkit_integration_tests- clean (clang-format, cmake-lint, ament-copyright, flake8, pep257, cppcheck, xmllint).Checklist
x-medkitextensions on/components/{id}/logs)