Skip to content

fix(metrics): align metric descriptions with updated docs#2017

Open
poroh wants to merge 1 commit into
NVIDIA:mainfrom
poroh:metrics-docs-nico-health-descriptions
Open

fix(metrics): align metric descriptions with updated docs#2017
poroh wants to merge 1 commit into
NVIDIA:mainfrom
poroh:metrics-docs-nico-health-descriptions

Conversation

@poroh
Copy link
Copy Markdown
Contributor

@poroh poroh commented May 29, 2026

Description

Docs were manually update. This PR fixes metrics name inside code to match docs.
Also it adds additional metrics that are generated during integration tests run.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@poroh poroh requested review from a team and Coco-Ben as code owners May 29, 2026 22:52
@poroh poroh enabled auto-merge (squash) May 29, 2026 22:57
@srinivasadmurthy
Copy link
Copy Markdown
Contributor

Should we add the core_metrics.md to .gitignore so that people don't check it in unnecessarily?

@poroh
Copy link
Copy Markdown
Contributor Author

poroh commented May 29, 2026

Should we add the core_metrics.md to .gitignore so that people don't check it in unnecessarily?

No. core_metrics.md should be checked in if changed. The problem was that this autogenerated file was manually changed without changing code-level source.

Signed-off-by: Dmitry Porokh <dporokh@nvidia.com>
@poroh poroh force-pushed the metrics-docs-nico-health-descriptions branch from d567271 to d9245f5 Compare May 29, 2026 23:38
<tr><td>nico_switches_total</td><td>gauge</td><td>The total number of nico_switches in the system</td></tr>
<tr><td>nico_total_ips_count</td><td>gauge</td><td>The total number of ips in the site</td></tr>
<tr><td>nico_unavailable_dpu_nic_firmware_update_count</td><td>gauge</td><td>The number of machines in the system that need a firmware update but are unavailable for update.</td></tr>
<tr><td>carbide_active_host_firmware_update_count</td><td>gauge</td><td>The number of host machines in the system currently working on updating their firmware.</td></tr>
Copy link
Copy Markdown
Contributor Author

@poroh poroh May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics were wrongly renamed by #1974 . Real metrics name hasn't been renamed yet. This file is auto-generated and should keep it this way, without manual editing. See line 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants