Skip to content

feat(health): add VelocityHealthCheck for K8s readiness probe (#35602)#35771

Open
dsolistorres wants to merge 1 commit into
mainfrom
issue-35602-velocity-health-check
Open

feat(health): add VelocityHealthCheck for K8s readiness probe (#35602)#35771
dsolistorres wants to merge 1 commit into
mainfrom
issue-35602-velocity-health-check

Conversation

@dsolistorres
Copy link
Copy Markdown
Member

Summary

Closes #35602 (follow-up implementation from spike #35329, companion to #35601 / #35768).

Adds a readiness-only health check that verifies the Velocity engine's global macro library is registered. When the library failed to load (see spike #35329), engine.init() can return successfully while every public page renders #renderMarks / #editContentlet as literal text. This check surfaces that state through /readyz so K8s can stop routing traffic to the affected pod.

Defense-in-depth alongside the fail-loud flag in #35601: that flag handles the common case (init throws → existing ServletContainerHealthCheck gates traffic via dotcms.started.up). This check covers the opt-out case (velocimacro.library.fail-on-missing=false) and any future post-init macro-registry corruption.

Changes

  • VelocityHealthCheck (new)com.dotcms.health.checks.cdi.VelocityHealthCheck extends HealthCheckBase. Evaluates the probe #renderMarks($null) through VelocityUtil.getEngine() and returns DOWN when the rendered output contains the literal #renderMarks( marker.
  • CoreHealthCheckProvider — register the new check alongside the existing CDI checks (database, cache, elasticsearch).
  • VelocityHealthCheckTest (new) — three unit tests covering the UP path, the literal-rendering DOWN path, and a propagated-exception DOWN path.

Design notes

  • isLivenessCheck() returns false — a one-time init failure should remove the pod from the load balancer, not trigger a restart loop.
  • isReadinessCheck() returns true.
  • getOrder() returns 110 — runs after database (default 100), cache (30), and elasticsearch (40).
  • Default mode is PRODUCTION (no monitor-mode downgrade) — when this is DOWN the system genuinely cannot render correctly.
  • Skips during shutdown.
  • The check is small and cheap — 0ms in practice on a healthy engine.

Test plan

  • VelocityHealthCheckTest — 3/3 pass:
    • UP when probe renders macro body output
    • DOWN when probe renders the literal directive (macro not registered)
    • DOWN when VelocityUtil.getEngine() throws
  • Local smoke test on a real dotCMS startup: just dev-run, then GET /dotmgt/health:
    {
      "name": "velocity",
      "status": "UP",
      "durationMs": 0,
      "message": "Velocity macro library registered (#renderMarks resolved) [0ms]",
      "monitorModeApplied": false
    }
    Confirmed alongside the other 10 health checks (database, cache, elasticsearch, etc.).

🤖 Generated with Claude Code

Adds a readiness-only health check that verifies the Velocity engine's global
macro library is registered by evaluating "#renderMarks(\$null)" through
VelocityUtil.getEngine() and detecting the literal directive marker in the
output. When the macro library failed to load (see spike #35329), the engine
reports init success while every public page renders macros as literal text;
this check surfaces that state through /readyz so K8s can stop routing traffic.

isLivenessCheck() returns false — a one-time init failure should remove the
pod from the load balancer, not trigger a restart loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 20, 2026

Claude finished @dsolistorres's task in 4m 28s —— View job


🔍 dotCMS Backend Review

  • Load dotCMS conventions
  • Get Java diff
  • Run 4 specialized reviewers (Security, Database, Java Standards, REST API) in parallel
  • Consolidate findings
  • Post review comment

Result

All four reviewers returned NO_FINDINGS. Consolidated review posted as a separate comment with the `` marker so subsequent pushes update it in place.

Reviewer Result
🔐 Security No issues — probe template is a hardcoded literal, no user input, no PII/secrets, error messages don't leak internals
🗄️ Database No issues — no SQL/DotConnect/Hibernate/CacheLocator activity in the change
☕ Java Standards No issues — throws Exception and throw new Exception(...) match the established pattern in sibling checks (CacheHealthCheck, ElasticsearchHealthCheck); all @Overrides present; Java 11 syntax; correct Logger.debug usage
🌐 REST API No issues — no JAX-RS surface in this PR; check is consumed by the pre-existing health endpoint infrastructure

Note on formal review

I was unable to submit a formal gh pr review --approve from this context (approval is outside my permissions — see the FAQ). The consolidated review comment is posted; a human approver still needs to formally approve the PR.
· Branch: issue-35602-velocity-health-check

@dsolistorres dsolistorres self-assigned this May 20, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 20, 2026

dotCMS Backend Review: no issues found.

@dsolistorres
Copy link
Copy Markdown
Member Author

Heads up — the intermittent Postman failures in Template_Resource and AI collections (with language_id = -1 FK violation) are unrelated to this PR. Tracked separately in #35780, which includes the root-cause analysis and three suggested fixes (preferred: fail-fast 400 in WorkflowResource instead of silent -1L default).

Integration Tests - Junit5 Suite 1 passed in the same run (it needs dotCMS up and serving requests), so startup isn't impacted by the new VelocityHealthCheck. A re-run should clear the matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Add VelocityHealthCheck for Kubernetes readiness probe (follow-up to spike #35329)

2 participants