Unbounded debounce counter of faults

## Summary

When using the ros2_medkit reporter, manager and gateway, I realized that some of my faults seem to be "stuck" in a PREPASSED state even when a fault was occurring. After some time, they appeared as CONFIRMED faults. I thus investigated the code and found something unexpected: the de-bounce counter is unbounded (except by the maximum integer value). Therefore, my periodic heal() calls increase the de-bounce counter to a very large value, so it then takes many fault reports to bring it back down. As a result, the system builds up inertia which, from my experience with integral controllers in control theory, could lead to undesired behavior.

Here are the debounce_counter increment lines that are only limited by the int maximum value:

- https://github.com/selfpatch/ros2_medkit/blob/be1b03a6219c7f076383972b9dbc6ac4bb1541c1/src/ros2_medkit_fault_manager/src/sqlite_fault_storage.cpp#L400
- https://github.com/selfpatch/ros2_medkit/blob/be1b03a6219c7f076383972b9dbc6ac4bb1541c1/src/ros2_medkit_fault_manager/src/sqlite_fault_storage.cpp#L453`

I then looked into the DEM AUTOSAR counter logic (https://www.autosar.org/fileadmin/standards/R20-11/CP/AUTOSAR_SWS_DiagnosticEventManager.pdf) and it appears that there should be thresholds on the PASSED and FAILED statuses that serve as their confirmation criteria, set at -128 and 128 in that standard. From my understanding, an ECU that reports a fault applies a multiplicative value to heal or fail faster. In summary, it is a system where a gain determines the number of occurrences before passing or failing, but with bounded thresholds.

I understand that confirmation_threshold and healing_threshold simulate that gain, but since the debounce counter is unbounded, the logic differs from the DEM AUTOSAR specification.

So I was wondering if there is a reason why you don't put thresholds on the de-bounce counter? If so, I might stop trying to heal constantly when the system is healthy. Though, I still feel there could be some race conditions on initialization that would prevent the system from healing normally. If not, I propose adding bounds equal to confirmation_threshold and healing_threshold.

---

## Proposed solution (optional)

Add bounds on the debounce counter equal to the confirmation and healing thresholds. A new hysteresis system should be designed to ensure that healing once from a fault does not put the system into an automatic PREPASSED state. I believe the DEM AUTOSAR standard has a solution for this.

---

## Additional context (optional)

Any extra details, links, screenshots, etc.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbounded debounce counter of faults #428

Summary

Proposed solution (optional)

Additional context (optional)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Unbounded debounce counter of faults #428

Description

Summary

Proposed solution (optional)

Additional context (optional)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions