Skip to content

feat: add Prometheus metrics for backup recovery window#69

Open
ermakov-oleg wants to merge 1 commit into
operasoftware:mainfrom
ermakov-oleg:feat/prometheus-metrics
Open

feat: add Prometheus metrics for backup recovery window#69
ermakov-oleg wants to merge 1 commit into
operasoftware:mainfrom
ermakov-oleg:feat/prometheus-metrics

Conversation

@ermakov-oleg
Copy link
Copy Markdown

Summary

Port of upstream #459, #467

Problem: No observability into backup health — operators had no way to alert on stale backups or monitor recovery point objectives (RPO) without manually querying pgBackRest.

Fix: Implements the cnpg-i Metrics service, exposing two Prometheus gauges:

  • cnpg_pgbackrest_first_recoverability_point — unix timestamp of the earliest restore point (first successful backup stop time)
  • cnpg_pgbackrest_last_available_backup_timestamp — unix timestamp of the most recent completed backup (latest backup stop time)

These allow standard Prometheus alerts like "no backup in last 24h" or "RPO exceeds 1h".

Implementation:

  • New MetricsServiceImplementation in internal/cnpgi/instance/metrics.go
  • Registers TYPE_METRICS capability in plugin identity
  • Collect() calls pgbackrest info to get the backup catalog, then delegates to getRecoveryWindow() which uses catalog.FirstRecoverabilityPoint() and catalog.GetLastSuccessfulBackupTime() — these methods filter out errored backups (Start=0 or Stop=0) and use Time.Stop for recoverability
  • Returns 0 for both metrics if no backups exist or credentials fail (graceful degradation)

Unit tests in metrics_test.go cover: nil/empty catalog, single backup, multiple backups, errored backups filtering, all-errored catalog.

Related issues

Signed-off-by: ermakov-oleg <ermakovolegs@gmail.com>
@ermakov-oleg
Copy link
Copy Markdown
Author

Hi @Agalin, just following up on this PR - would you have a chance to review it when you have time? The changes from all my PRs have been running in our production for a while now without issues, but I’m happy to adjust anything if needed.

@Agalin
Copy link
Copy Markdown
Collaborator

Agalin commented May 20, 2026

Sorry for the late reply, my team was jumping from P1 to another P1 for the last few months. I'll discuss this solution with the team. Our original plan was to try to embed pgbackrest exporter in the plugin to get the same set of metrics it provides - and compatibility with its Grafana dashboards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants