Skip to content

Unbounded debounce counter of faults #428

@evTessellate

Description

@evTessellate

Summary

When using the ros2_medkit reporter, manager and gateway, I realized that some of my faults seem to be "stuck" in a PREPASSED state even when a fault was occurring. After some time, they appeared as CONFIRMED faults. I thus investigated the code and found something unexpected: the de-bounce counter is unbounded (except by the maximum integer value). Therefore, my periodic heal() calls increase the de-bounce counter to a very large value, so it then takes many fault reports to bring it back down. As a result, the system builds up inertia which, from my experience with integral controllers in control theory, could lead to undesired behavior.

Here are the debounce_counter increment lines that are only limited by the int maximum value:

I then looked into the DEM AUTOSAR counter logic (https://www.autosar.org/fileadmin/standards/R20-11/CP/AUTOSAR_SWS_DiagnosticEventManager.pdf) and it appears that there should be thresholds on the PASSED and FAILED statuses that serve as their confirmation criteria, set at -128 and 128 in that standard. From my understanding, an ECU that reports a fault applies a multiplicative value to heal or fail faster. In summary, it is a system where a gain determines the number of occurrences before passing or failing, but with bounded thresholds.

I understand that confirmation_threshold and healing_threshold simulate that gain, but since the debounce counter is unbounded, the logic differs from the DEM AUTOSAR specification.

So I was wondering if there is a reason why you don't put thresholds on the de-bounce counter? If so, I might stop trying to heal constantly when the system is healthy. Though, I still feel there could be some race conditions on initialization that would prevent the system from healing normally. If not, I propose adding bounds equal to confirmation_threshold and healing_threshold.


Proposed solution (optional)

Add bounds on the debounce counter equal to the confirmation and healing thresholds. A new hysteresis system should be designed to ensure that healing once from a fault does not put the system into an automatic PREPASSED state. I believe the DEM AUTOSAR standard has a solution for this.


Additional context (optional)

Any extra details, links, screenshots, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions