Skip to content

fix(gateway): aggregate component /logs and /bulk-data from hosted apps' fqns#394

Open
bburda wants to merge 8 commits intomainfrom
fix/component-logs-aggregation
Open

fix(gateway): aggregate component /logs and /bulk-data from hosted apps' fqns#394
bburda wants to merge 8 commits intomainfrom
fix/component-logs-aggregation

Conversation

@bburda
Copy link
Copy Markdown
Collaborator

@bburda bburda commented Apr 28, 2026

Summary

Fix two handlers that silently returned empty results for synthetic / runtime-discovered components, by mirroring the AREA / FUNCTION aggregation pattern already present in the same files.

  • LogHandlers::handle_get_logs - the COMPONENT branch fed entity.fqn to LogManager::get_logs(..., prefix_match=true). Synthetic components have an empty fqn, so the prefix match selected nothing and the response was an empty items list even when the component grouped N active apps. Resolve hosted apps via cache.get_apps_for_component() and call get_logs(host_fqns, prefix_match=false). Manifest deployments with a non-empty fqn and no hosted apps fall through to the original namespace-prefix path. Response now also carries x-medkit.aggregation_level=component, app_count, and aggregation_sources for parity with AREA and FUNCTION.
  • BulkDataHandlers::get_source_filters - identical pattern. The handler had a FUNCTION-only aggregation arm and fell through to entity.fqn / entity.namespace_path for COMPONENT. Both empty for synthetic components, so source filters were {}, descriptor lists came back empty, and ownership checks in handle_download short-circuited. Extended the aggregation arm to handle COMPONENT identically; manifest-only components still fall through to the namespace prefix path.
  • docs/api/rest.rst updated to describe the actual aggregation behavior for /components/{id}/logs instead of the stale namespace prefix match wording.

Tests

  • New integration test test_component_get_logs_aggregates_child_apps in test_logging_api.test.py verifies a synthetic component returns non-empty items plus the new aggregation metadata. Asserts len(items) > 0 directly because the original bug returned 0 items with otherwise valid x-medkit metadata, so checking app_count alone is not sufficient. Matches aggregation_sources by FQN substring instead of bare app id.
  • New integration test test_bulk_data_component_aggregates_child_apps in test_bulk_data_api.test.py verifies the equivalent fix for the bulk-data endpoint.

SOVD API impact

  • No changes to SOVD-defined request paths, response schemas, or error codes.
  • Additive x-medkit extensions on /components/{id}/logs only - new optional aggregation_level, aggregated, app_count, aggregation_sources fields, emitted only when local aggregation is active. Existing clients that ignore unknown x-medkit.* fields are unaffected. /components/{id}/bulk-data/rosbags payload schema is unchanged.

Issue

A related but distinct symptom on handle_get_fault for synthetic components is a scope-leak that needs a different fix (changing the GetFault service contract), tracked separately - see comment on #380.


Type

  • Bug fix
  • New feature or tests (two new integration tests + x-medkit aggregation metadata on /components/{id}/logs)
  • Breaking change
  • Documentation only (docs/api/rest.rst log endpoints description)

Testing

All local runs green:

  • ./scripts/test.sh unit --packages-select ros2_medkit_gateway - 2099 assertions across 81 tests, 0 failures.
  • ./scripts/test.sh test_logging_api --packages-select ros2_medkit_integration_tests - 22 logging tests + shutdown, 0 failures. Includes the new test_component_get_logs_aggregates_child_apps.
  • ./scripts/test.sh test_bulk_data_api --packages-select ros2_medkit_integration_tests - 10 tests, 0 failures. Includes the new test_bulk_data_component_aggregates_child_apps.
  • ./scripts/test.sh lint --packages-select ros2_medkit_gateway ros2_medkit_integration_tests - clean (clang-format, cmake-lint, ament-copyright, flake8, pep257, cppcheck, xmllint).
  • pre-push clang-tidy on changed files - clean.

Checklist

  • Breaking changes are clearly described (there are none; only additive x-medkit extensions on /components/{id}/logs)
  • Tests were added or updated if needed
  • Docs were updated if behavior or public API changed

Copilot AI review requested due to automatic review settings April 28, 2026 16:22
@bburda bburda self-assigned this Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes empty results for synthetic/runtime-discovered components on /logs and /bulk-data/rosbags by aggregating from hosted apps’ effective FQNs (matching existing AREA/FUNCTION aggregation behavior) and updates docs/tests accordingly.

Changes:

  • Update LogHandlers::handle_get_logs to aggregate component logs from hosted apps and emit component-level aggregation metadata.
  • Update BulkDataHandlers::get_source_filters to aggregate component bulk-data sources from hosted apps with manifest fallback behavior.
  • Add integration tests for component aggregation behavior and update REST API documentation for component logs aggregation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/ros2_medkit_gateway/src/http/handlers/log_handlers.cpp Aggregate component logs from hosted apps’ FQNs + add x-medkit aggregation metadata.
src/ros2_medkit_gateway/src/http/handlers/bulkdata_handlers.cpp Extend hosted-app aggregation to component source filters with fallback behavior.
src/ros2_medkit_integration_tests/test/features/test_logging_api.test.py Add integration test validating non-empty aggregated component logs + metadata.
src/ros2_medkit_integration_tests/test/features/test_bulk_data_api.test.py Add integration test validating aggregated component bulk-data descriptors.
docs/api/rest.rst Document component logs aggregation and fallback behavior.

Comment thread src/ros2_medkit_gateway/src/http/handlers/log_handlers.cpp Outdated
Comment thread src/ros2_medkit_integration_tests/test/features/test_logging_api.test.py Outdated
Comment thread docs/api/rest.rst Outdated
Comment thread src/ros2_medkit_gateway/src/http/handlers/bulkdata_handlers.cpp Outdated
Comment thread src/ros2_medkit_gateway/src/http/handlers/bulkdata_handlers.cpp Outdated
bburda added 3 commits April 29, 2026 11:43
Synthetic / runtime-discovered components have an empty fqn (no manifest
namespace), so the existing GET /components/{id}/logs path - which fed
entity.fqn to LogManager::get_logs with prefix_match=true - silently
returned zero items even when 20+ active nodes were mapped to that
component. Manifest-defined components with a real fqn worked because
their fqn matched the per-node logger name buffers.

Mirror the AREA / FUNCTION semantics already in this handler: look up
hosted apps via cache.get_apps_for_component(), build their effective
fqns, and call get_logs(host_fqns, prefix_match=false). Fall through to
the original namespace prefix path only when the component has no
hosted apps but still has a non-empty fqn (manifest deployments where
the component groups topics rather than nodes).

Response now also carries x-medkit.aggregation_level=component plus
aggregation_sources / app_count, parity with how AREA and FUNCTION
already advertise their aggregation. Adds an integration test verifying
component logs aggregate child app entries (verifies REQ_INTEROP_061).
…fqns

The same synthetic-component bug already fixed for /components/{id}/logs
also applies to BulkDataHandlers::get_source_filters. Synthetic /
runtime-discovered components have empty fqn AND empty namespace_path,
so the FUNCTION-only aggregation arm was not enough: components silently
returned zero source filters, producing empty rosbag descriptor lists
and failing ownership checks on download.

Extend the aggregation arm to handle COMPONENT identically - resolve
hosted apps via cache.get_apps_for_component() and build their
effective_fqn() list. Manifest deployments where a component groups
topics fall through to the existing namespace-prefix path.

Strengthen the integration test for the original logs fix: assert items
is non-empty (the bug returned 0 items with valid x-medkit metadata, so
app_count alone is insufficient) and match aggregation_sources by FQN
substring instead of bare app id. Add an integration test for component
bulk-data aggregation, and update docs/api/rest.rst to describe the
actual aggregation behavior for /components/{id}/logs.
…olve_app_host_fqns

Address review feedback on the component log/bulk-data aggregation fix:

- Extract the per-app FQN resolution loop (cache lookup + effective_fqn +
  empty-skip) to HandlerContext::resolve_app_host_fqns and call it from
  log_handlers (FUNCTION/AREA/COMPONENT branches) and bulkdata_handlers
  (FUNCTION/COMPONENT). Removes three duplicate copies of the same loop
  and gives both handlers a single unit-tested helper.
- Split FUNCTION and COMPONENT branches in BulkDataHandlers::get_source_filters.
  The previous combined arm fell through to entity.fqn / namespace_path for
  both types when host_fqns was empty, which silently changed FUNCTION
  semantics: functions are pure aggregated views and must return an empty
  filter list when no apps host them. COMPONENT keeps the fall-through for
  manifest-only deployments.
- Make BulkDataHandlers::get_source_filters public (was private). The
  branching logic is non-trivial and needs unit coverage; calling it
  through the REST surface forces a real FaultManager service which is
  not viable in unit tests.
- Add unit tests:
  * 7 ResolveAppHostFqnsTest cases pinning the silent-skip semantics
    (missing apps, empty effective_fqn, bound_fqn precedence).
  * 9 BulkDataSourceFiltersTest cases covering APP / AREA / FUNCTION /
    COMPONENT branching, including the regression guard that FUNCTION
    without hosts returns empty even if the entity carried a fqn.
  * 3 ComponentLogs* cases on the existing AreaAggregationTest fixture
    that exercise the COMPONENT branch in handle_get_logs end-to-end via
    the REST server: synthetic component aggregates from hosted apps,
    manifest-only component falls through, empty component returns empty
    items without crashing.
- Set ROS_DOMAIN_ID for test_bulkdata_handlers (now stands up a real
  GatewayNode, needs domain isolation per CMakeLists.txt convention).
- Strengthen the integration test for component logs aggregation:
  * Wrap the assertion in poll_endpoint_until - the /rosout buffer fills
    asynchronously after node startup, the prior immediate assertion was
    timing-dependent.
  * Match aggregation_sources by FQN substring instead of bare app id
    (the response carries full FQNs like /powertrain/engine/temp_sensor).
- Clarify docs/api/rest.rst: app_count and aggregation_sources are
  populated only when hosted-app aggregation is active and are omitted
  under the namespace-prefix fallback.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/ros2_medkit_gateway/test/test_handler_context.cpp:57

  • json is used throughout this test file (e.g., json::parse(...)) but the using json = nlohmann::json; alias was removed and there is no replacement alias/namespace import. This will not compile. Re-introduce the alias (or fully-qualify nlohmann::json) in this file.
using namespace ros2_medkit_gateway;
using namespace ros2_medkit_gateway::handlers;

// =============================================================================
// HandlerContext static method tests (don't require GatewayNode)
// =============================================================================

TEST(HandlerContextStaticTest, SendErrorSetsStatusAndBody) {
  httplib::Response res;

  HandlerContext::send_error(res, 400, ERR_INVALID_REQUEST, "Test error message");

  EXPECT_EQ(res.status, 400);
  EXPECT_EQ(res.get_header_value("Content-Type"), "application/json");

  auto body = json::parse(res.body);
  EXPECT_EQ(body["error_code"], ERR_INVALID_REQUEST);
  EXPECT_EQ(body["message"], "Test error message");

Comment on lines 115 to 132
/**
* @brief Get source filters for rosbag queries based on entity type.
*
* For apps/components/areas: returns the entity's FQN or namespace path.
* For functions: aggregates FQNs from all hosting apps (read-only
* aggregated view - upload/delete are blocked at the route level).
* - APP / AREA: returns the entity's FQN or namespace path (single filter).
* - FUNCTION: aggregates non-empty effective_fqn() values across all hosted apps
* (no fallback - functions are pure aggregated views).
* - COMPONENT: aggregates from hosted apps; falls back to FQN/namespace_path only
* when the component has no hosted apps (manifest deployments where the component
* groups topics rather than nodes). This avoids the synthetic-component bug where
* empty fqn + empty namespace_path produced zero source filters.
*
* Public to enable unit testing the entity-type branching directly.
*
* @param entity Entity information
* @return Vector of source filter strings (empty if no valid filters)
*/
std::vector<std::string> get_source_filters(const EntityInfo & entity) const;

Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_source_filters() was moved into the public section specifically for unit testing, but this header is under include/ and is likely installed/exported. Exposing this internal helper expands the public API surface and makes future refactors harder. Prefer keeping it private and granting test access via FRIEND_TEST(...), or extracting the branching logic into a separate testable helper (e.g., a free function in a detail namespace) without changing the handler's public interface.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved get_source_filters back to private and extracted the entity-type branching to handlers::detail::compute_bulkdata_source_filters (a free function in a detail namespace, declared at the bottom of bulkdata_handlers.hpp). The instance method is now a one-line wrapper that fetches the cache from ctx_ and delegates. Tests target the free function directly with a bare ThreadSafeEntityCache - no BulkDataHandlers instance, no GatewayNode, no DDS context required.

Comment on lines +121 to 124
auto comp_fqns = HandlerContext::resolve_app_host_fqns(cache, cache.get_apps_for_component(comp_id));
host_fqns.insert(host_fqns.end(), std::make_move_iterator(comp_fqns.begin()),
std::make_move_iterator(comp_fqns.end()));
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code uses std::make_move_iterator(...) but the file doesn't include <iterator>. Relying on indirect includes is non-portable and can break builds with different standard library implementations. Add #include <iterator> (or avoid make_move_iterator here).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added #include <iterator> to log_handlers.cpp.

@bburda bburda force-pushed the fix/component-logs-aggregation branch from 7f8884f to 778dd0f Compare April 29, 2026 09:56
bburda added 5 commits April 29, 2026 13:01
Address two review findings:

- Move ``get_source_filters`` back to ``private``. Extract the pure
  entity-type branching to ``handlers::detail::compute_bulkdata_source_filters``,
  a free function that takes a ``ThreadSafeEntityCache`` reference plus an
  ``EntityInfo``. The instance method becomes a one-line wrapper that fetches
  the cache from ``ctx_`` and delegates. This keeps the handler's public API
  unchanged while letting the unit tests target the helper directly without
  spinning up a ``GatewayNode`` (the previous fixture stood up a real
  ``GatewayNode`` purely to obtain a cache reference).
- Add ``#include <iterator>`` to ``log_handlers.cpp`` for
  ``std::make_move_iterator`` (was relying on transitive inclusion).

Test rewrite: ``BulkDataSourceFiltersTest`` now constructs a bare
``ThreadSafeEntityCache``, calls ``cache_.update_all(...)``, and invokes
``handlers::detail::compute_bulkdata_source_filters`` directly. No DDS
context, no rclcpp init, no ``ros2_medkit_test_utils`` shared library
required. Drops the ``medkit_set_test_domain`` line for the same reason -
this fixture no longer creates a ROS node, so domain isolation is moot.
All 9 entity-type cases (synthetic component, manifest fallback, function
without hosts regression guard, etc.) still cover the same scenarios.
…-error

The rosdep keys for ``ament_cmake_clang_tidy`` and ``ament_cmake_clang_format``
are not registered for noble (the Rolling runner image), so ``rosdep install``
hard-fails before the build can start. Linters / clang-tidy already run only
in the Jazzy job in ``quality.yml`` (gated by ``ENABLE_CLANG_TIDY`` in
``ROS2MedkitLinting.cmake``), so the build-and-test matrix on Humble + Rolling
does not need those CMake packages installed - skip the keys instead of
trying to resolve them. ``opcua-plugin.yml`` already follows the same pattern.

Drop ``continue-on-error`` for the Rolling job. Rolling is now treated as a
required-green distro per CLAUDE.md - any Rolling-specific failure is a real
breakage that needs a code-side fix or an explicit owner sign-off, not a
silent soft-fail.
ci.yml skips the ``ament_cmake_clang_format`` rosdep key on Humble + Rolling
(linters only run in the Jazzy quality job, the rosdep key isn't even
registered for noble), so ``find_package(... REQUIRED)`` hard-fails the
configure step on those runners. Switch all 8 affected packages to
``find_package(ament_cmake_clang_format QUIET)`` plus an ``if(_FOUND)``
guard around the corresponding ``ament_clang_format(...)`` call. On Jazzy
(where the package is installed in the quality job's rosdep step) the
behavior is unchanged - clang-format still runs and passes 310 tests on
``ros2_medkit_gateway``. On Humble + Rolling the configure step now
proceeds without the package and the linter is silently skipped, which
matches the project policy that linters only run on Jazzy.

The companion ``ament_cmake_clang_tidy`` find_package in
``ROS2MedkitLinting.cmake`` is already gated by the ``ENABLE_CLANG_TIDY``
option (default OFF; only set ON in the Jazzy clang-tidy job in
quality.yml), so no change needed there.
…test job

The opcua-plugin Unit tests matrix runs on humble + jazzy + rolling. On
noble (Rolling) the ament_cmake_clang_format / ament_cmake_clang_tidy
rosdep keys are not registered, so the previous skip-keys list (which
covered only nav2_msgs) caused rosdep install to fail before any test
could run.

Add the linter keys to skip-keys, mirroring the change just made in
ci.yml. The upstream medkit packages have already been switched to
QUIET + FOUND for those find_package calls so the build degrades
gracefully when the linter packages aren't installed.
The first RUN clears ``/var/lib/apt/lists/*`` to keep the builder image
small. The next RUN then calls ``rosdep install``, which in turn invokes
``apt-get install`` under the hood without first refreshing the apt cache.
On cold Docker builds this fails with ``E: Unable to locate package
python3-jsonschema`` (declared as a ``<test_depend>`` by
``ros2_medkit_integration_tests`` and pulled in by rosdep even with
``BUILD_TESTING=OFF``). The OPC-UA plugin Integration jobs were
masking this on main via warm BuildKit layer caches and only blew up
after the cache was evicted.

Add an explicit ``apt-get update`` at the start of the rosdep RUN, and
move the ``rm -rf /var/lib/apt/lists/*`` cleanup to the end of the same
layer so the runtime image still inherits no apt cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Synthetic components return empty results from /logs and /bulk-data resource collections

2 participants