Skip to content

fix(monitoring): Migrate Grafana to grafana-community chart and fix PVC deadlock#927

Merged
ybezsonov merged 2 commits into
mainfrom
fix/grafana-helm-chart-migration
Jul 3, 2026
Merged

fix(monitoring): Migrate Grafana to grafana-community chart and fix PVC deadlock#927
ybezsonov merged 2 commits into
mainfrom
fix/grafana-helm-chart-migration

Conversation

@ybezsonov

Copy link
Copy Markdown
Contributor

Summary

Fixes the workshop bootstrap failure in the performance-platform setup, where Grafana could not be installed/upgraded via Helm. Two independent root causes, both in the setup scripts (infra/scripts/setup/monitoring.sh and infra/scripts/setup/perf-platform.sh):

  1. Deprecated Helm chart. As of March 16, 2026 Grafana Labs forked their Helm charts to grafana-community/helm-charts. The grafana chart in the old grafana.github.io/helm-charts repo is now deprecated: true and frozen at chart 10.5.15 (Grafana 12.3.1), so installs printed WARNING: This chart is deprecated.

  2. PVC deadlock on upgrade. The chart defaults to deploymentStrategy: RollingUpdate, but the workshop enables persistence on a gp3 EBS volume (ReadWriteOnce). On the plugin upgrade in perf-platform.sh, RollingUpdate starts the new pod before deleting the old one; the new pod cannot attach the RWO volume still held by the old pod, so it hangs until helm --wait hits its deadline: UPGRADE FAILED: context deadline exceeded.

Changes

  • Point the grafana chart at the maintained community repo grafana-community/grafana (chart 12.7.2 / Grafana 13.1.0) in both scripts.
  • Keep the original grafana repo in perf-platform.sh for grafana/pyroscope, which was not moved to the community repo.
  • Set deploymentStrategy.type: Recreate in the Grafana values so the old pod is torn down first and releases the EBS volume. Persisted on the release, so perf-platform.sh inherits it via --reuse-values.
  • Raise the plugin-install --timeout from 5m to 10m to cover the EBS detach/reattach cycle plus plugin download on the Recreate path.

Verification

  • Confirmed the old grafana chart is deprecated: true and the community repo serves an active grafana chart (12.7.2).
  • Confirmed pyroscope still resolves only from the old repo.
  • Confirmed every values key the scripts pass (admin.existingSecret/userKey/passwordKey, service, persistence, resources, sidecar.dashboards/datasources, plugins, grafana.ini) exists in the community 12.7.2 chart.
  • bash -n passes on both scripts. CI green (the one flaky failure was a transient Maven Central 403 on spring-boot-starter-parent:4.1.0, unrelated to these changes; cleared on re-run).

Yuriy Bezsonov added 2 commits July 3, 2026 17:07
The grafana chart in grafana.github.io/helm-charts is now deprecated
(frozen at chart 10.5.15 / Grafana 12.3.1, marked deprecated: true) after
Grafana Labs forked the charts to grafana-community/helm-charts on
March 16, 2026. Installs emit "WARNING: This chart is deprecated".

Point the grafana chart at the maintained community repo
(grafana-community/grafana, chart 12.7.2 / Grafana 13.1.0) in both
monitoring.sh and perf-platform.sh. All values keys used by the scripts
(admin, service, persistence, resources, sidecar, plugins, grafana.ini)
remain valid in the new chart.

The pyroscope chart was not moved, so perf-platform.sh keeps the original
grafana repo alongside the new grafana-community repo for grafana/pyroscope.
The Grafana chart defaults to deploymentStrategy RollingUpdate, but the
workshop enables persistence on a gp3 EBS volume (ReadWriteOnce). On the
plugin upgrade in perf-platform.sh, RollingUpdate starts a new pod before
deleting the old one; the new pod cannot attach the RWO volume still held
by the old pod, so it hangs with a Multi-Attach error until helm --wait
hits its deadline (UPGRADE FAILED: context deadline exceeded at line 292).

Set deploymentStrategy.type: Recreate in the Grafana values so the old pod
is terminated first and releases the volume. The value is persisted on the
release, so perf-platform.sh picks it up via --reuse-values.

Also raise the plugin-install --timeout from 5m to 10m to allow for the
EBS detach/reattach cycle plus plugin download on the Recreate path.
@ybezsonov ybezsonov merged commit 70bb517 into main Jul 3, 2026
70 of 71 checks passed
@ybezsonov ybezsonov deleted the fix/grafana-helm-chart-migration branch July 3, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant