Skip to content

SOVD service interface: aggregate peer faults#419

Open
bburda wants to merge 2 commits into
mainfrom
feat/sovd-service-interface-aggregation
Open

SOVD service interface: aggregate peer faults#419
bburda wants to merge 2 commits into
mainfrom
feat/sovd-service-interface-aggregation

Conversation

@bburda

@bburda bburda commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

The ROS 2 SOVD service interface returned only local fault-manager faults, while the HTTP API already fans out to aggregation peers. This wires GatewayPluginContext::list_entity_faults and list_all_faults to the aggregation manager (fan_out_get), mirroring the HTTP fault handler, so a gateway with aggregation.enabled exposes peer faults over the ROS 2 service path too - consistent with the HTTP /api/v1/faults and /api/v1/<entity>/faults endpoints.

  • per-entity: list_entity_faults fans out /api/v1/<type>/<id>/faults to peers and merges.
  • all-faults: list_all_faults fans out /api/v1/faults and merges.
  • Aggregation is gated on get_aggregation_manager() != nullptr (no behavior change when aggregation is disabled).

Issue


Type

  • Bug fix

Testing

  • New test_plugin_context_aggregation (mock peer server): per-entity and all-faults paths surface a peer fault when aggregation is enabled; local-only behavior unchanged when disabled.
  • Full gateway unit suite: colcon test --packages-select ros2_medkit_gateway - 0 failures.
  • clang-tidy clean on the changed files.

bburda added 2 commits June 15, 2026 15:36
… via SOVD service interface

list_entity_faults and list_all_faults now call fan_out_get on AggregationManager
(when enabled) after collecting local faults, mirroring the HTTP fault handlers.
get_entity_snapshot returns the same local-cache snapshot; peer entities are
already merged into the cache during discovery refresh.

Peer paths used:
  - list_entity_faults: /api/v1/<type>/<id>/faults (apps/components/areas/functions)
  - list_all_faults: /api/v1/faults

Add test_plugin_context_aggregation with 7 tests covering the no-aggregation
(local-only) baseline and live-mock-peer fan-out cases.
Copilot AI review requested due to automatic review settings June 15, 2026 15:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns the ROS 2 SOVD service interface fault listing behavior with the HTTP API by fanning out fault queries to aggregation peers (when aggregation.enabled is on), so peer faults become visible via the ROS 2 service path as well.

Changes:

  • Updated GatewayPluginContext::list_entity_faults() to optionally fan out /api/v1/<type>/<id>/faults to peers and merge results.
  • Updated GatewayPluginContext::list_all_faults() to optionally fan out /api/v1/faults to peers and merge results.
  • Added a new GTest (test_plugin_context_aggregation) plus CMake wiring.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/ros2_medkit_gateway/src/plugins/plugin_context.cpp Adds aggregation-manager fan-out + merge behavior for plugin-context fault listing APIs.
src/ros2_medkit_gateway/test/test_plugin_context_aggregation.cpp New unit test covering peer fan-out behavior for per-entity and global faults.
src/ros2_medkit_gateway/CMakeLists.txt Registers the new test target and includes it in coverage configuration.

Comment on lines +112 to 124
if (fault_manager_ && fault_manager_->is_available()) {
std::string source_id;
if (entity->type == SovdEntityType::COMPONENT) {
source_id = entity->namespace_path;
} else if (entity->type == SovdEntityType::APP) {
source_id = entity->fqn;
}

auto result = fault_manager_->list_faults(source_id);
if (result.success && result.data.is_array()) {
local_faults = result.data;
}
}
Comment on lines +266 to +272
// Ensure local response has a "faults" array to append to.
if (!response.contains("faults") || !response["faults"].is_array()) {
response["faults"] = nlohmann::json::array();
}
for (const auto & item : fan_result.merged_items) {
response["faults"].push_back(item);
}
Comment on lines +115 to +121
int start() {
port_ = server_->bind_to_any_port("127.0.0.1");
thread_ = std::thread([this]() {
server_->listen_after_bind();
});
return port_;
}
Comment on lines +381 to +386
auto ctx = ros2_medkit_gateway::make_gateway_plugin_context(node_.get(), nullptr, nullptr);
ASSERT_NE(ctx, nullptr);

// "brakes" is in the local manifest. No local fault manager (null), but the
// peer returns a PROTECTIVE_STOP fault via /api/v1/apps/brakes/faults.
auto faults = ctx->list_entity_faults("brakes");
}

auto result = fault_manager_->list_faults(source_id);
if (result.success && result.data.is_array()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_entity_faults drops every local fault here: result.data.is_array() is always false because FaultManager::list_faults returns an object {"faults":[...],"count":N} (fault_manager.cpp:209), not an array. The old code had the same check and silently returned empty for COMPONENT/APP all along; now the method returns peer faults only, so the local+peer merge never happens. Read the array out: if (result.success && result.data.contains("faults") && result.data["faults"].is_array()) local_faults = result.data["faults"];

const std::string type_segment = entity_type_to_url_segment(entity->type);
if (!type_segment.empty()) {
const std::string peer_path = "/api/v1/" + type_segment + "/" + entity_id + "/faults";
auto fan_result = agg->fan_out_get(peer_path, "");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fan-out passes an empty Authorization header, but the HTTP mirror forwards the caller credential (fault_handlers.cpp:446). Against an authenticated peer every request 401s and the peer contributes nothing. Thread an auth/service token through the service request, or document this path as trusted/loopback-only. Reusing fan_out_helpers.hpp's merge_peer_items (which already handles auth + failed_peers) would avoid re-deriving this inline.

if (!type_segment.empty()) {
const std::string peer_path = "/api/v1/" + type_segment + "/" + entity_id + "/faults";
auto fan_result = agg->fan_out_get(peer_path, "");
if (fan_result.merged_items.is_array()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both methods use only merged_items and drop fan_result.is_partial / fan_result.failed_peers, so a peer that is down, 401s, or times out silently truncates the fault list with no field and no log - whereas the HTTP path surfaces both into x-medkit (fault_handlers.cpp:452). For a diagnostics gateway that's a real hazard. At least RCLCPP_WARN on is_partial listing the failed peers; ideally carry partial/failed_peers in the service response.

std::string source_id;
if (entity->type == SovdEntityType::COMPONENT) {
source_id = entity->namespace_path;
} else if (entity->type == SovdEntityType::APP) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_id is set only for COMPONENT and APP; FUNCTION and AREA fall through with source_id == "", so list_faults("") returns ALL local faults as that entity's faults (once the extraction bug above is fixed). The HTTP path filters FUNCTION faults by host-app FQNs and rejects types without a faults collection (validate_collection_access, SOVD Table 8). Please mirror that scoping/validation, and skip the peer fan-out for types with no faults collection.

response["faults"] = nlohmann::json::array();
}
for (const auto & item : fan_result.merged_items) {
response["faults"].push_back(item);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peer items are appended to response["faults"], but response is result.data which also carries "count" from the local query - so count is now stale (local-only) while faults includes peers. Recompute count = response["faults"].size() after the merge, or drop it.


// @verifies REQ_INTEROP_013
TEST_F(PluginContextWithPeerTest, ListEntityFaults_IncludesPeerFault) {
auto ctx = ros2_medkit_gateway::make_gateway_plugin_context(node_.get(), nullptr, nullptr);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every fixture builds the context with a null fault_manager (make_gateway_plugin_context(node_.get(), nullptr, nullptr)), so the local+peer MERGE is never exercised - only the peer-only path. That's exactly the gap that hides the list_entity_faults extraction bug. Please add a fixture with a real fault_manager returning local faults and assert both local and peer faults appear (plus FUNCTION/AREA, an authenticated peer, and a down peer).

}

int start() {
port_ = server_->bind_to_any_port("127.0.0.1");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start() dereferences server_ via bind_to_any_port, but server_ is only created lazily in server(). It works because the fixtures call server() first, but a caller that does start() first segfaults. Create the server here too (or assert it is set).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SOVD service interface returns only local faults, not aggregated peer faults

3 participants