Skip to content

ci: fast static analysis with early-fail lint + inline SARIF (PMD/Checkstyle/SpotBugs)#1396

Draft
joaodinissf wants to merge 1 commit into
dsldevkit:masterfrom
joaodinissf:ci/static-analysis-sarif
Draft

ci: fast static analysis with early-fail lint + inline SARIF (PMD/Checkstyle/SpotBugs)#1396
joaodinissf wants to merge 1 commit into
dsldevkit:masterfrom
joaodinissf:ci/static-analysis-sarif

Conversation

@joaodinissf
Copy link
Copy Markdown
Collaborator

Problem

Today a single PMD or Checkstyle nit only surfaces after the ~15-minute build: maven-verify runs clean verify and re-runs every static-analysis goal on top of it. So "I forgot a semicolon" feedback arrives ~14 minutes late, and the analysis work is duplicated.

What this does

Four independent parallel jobs (verify.yml), so feedback is early and findings show up inline:

Job Runs Gate
lint compile + pmd:pmd + checkstyle:checkstyle (SARIF) + pmd:cpd-check, -T 2C count merged SARIF (+ cpd.xml grep)
spotbugs compile + spotbugs:spotbugs (SARIF), -Xmx4g count merged SARIF
maven-verify clean verify (build + tests only) Tycho-surefire
line-endings git check shell
  • lint goes red in ~3–5 min on its own check, independent of the build — a lint issue no longer waits behind ~14 min of build+tests.
  • SpotBugs is isolated in its own lane (it's the slow analysis), so it never delays lint.
  • maven-verify drops the redundant analysis goals it used to re-run.
  • All three analyzers emit SARIF 2.1.0, merged per tool and uploaded to Code Scanning (security-events: write) → inline annotations on the diff + Security tab. No custom annotator script.

Why count-gate, not the :check goals

The pmd:check / spotbugs:check goals @Execute-fork a second analysis and can't emit SARIF; run without the full compile classpath they false-positive on type-resolving rules (e.g. InvalidLogMessageFormat on the SLF4J trailing-Throwable idiom). So each report goal runs once (full-reactor compile → correct results + SARIF) and the gate just counts. Full rationale, the report-vs-check tables, and the fidelity caveats are in docs/ci-static-analysis-design.md.

Optimizes / trades off

  • Optimizes: feedback latency (lint ~3–5 min vs ~14) and overall wall-clock (~10 min vs ~14.5, since the redundant analysis is gone).
  • Trades off: more total compute — compile runs once per parallel job. This is a deliberate, cheap trade here: the repo is public (CI is free) and low-activity, so we buy early + inline feedback with compute that costs nothing.

Validation

  • Clean run → all four jobs green; wall-clock ~10 min.
  • Planted one violation per tool → lint and spotbugs correctly went red (exact counts), and Code Scanning ingested PMD/Checkstyle/SpotBugs SARIF → annotations appeared on the PR diff + Security tab.

Notes

  • CPD gating is wired (pmd:cpd-check + <duplication> grep) but inert until chore: pmd.cpd.min set to 100000 effectively disables CPD #1339 lowers the token threshold (pmd.cpd.min=100000 currently disables it) — tracked separately. CPD has no SARIF renderer, so it gates but doesn't annotate inline.
  • Branch protection (admin, at merge): required checks change pmd + checkstylelint + spotbugs; keep line-endings, maven-verify.
  • See also docs/ci-measurement-protocol.md for how to gather trustworthy timing medians.

Supersedes #1333 (sequential, Python) and #1337 (parallel, Python).

🤖 Generated with Claude Code

Replace the pmd/checkstyle jobs with a parallel shape that gives early,
inline feedback and stops re-running analysis inside the build:

  - lint: compile + pmd:pmd + checkstyle:checkstyle (SARIF) + pmd:cpd-check,
    -T 2C, --fail-never; gates by counting the merged SARIF (+ cpd.xml grep).
    Fails in ~3-5 min on its own check, independent of the build.
  - spotbugs: compile + spotbugs:spotbugs (SARIF), -Xmx4g, own parallel lane
    (the slow analysis).
  - maven-verify: build + tests only; the redundant checkstyle/pmd/spotbugs
    goals are dropped (now owned by lint/spotbugs).
  - line-endings: unchanged.

All three emit SARIF 2.1.0, merged per tool and uploaded to Code Scanning
(security-events: write) for inline annotations on the PR diff + Security tab.
No custom Python annotator.

Count-gate rather than the *:check goals: the check goals @Execute-fork a
second analysis and cannot emit SARIF, and without the full compile classpath
they false-positive on type-resolving rules. Each report goal runs once
(full-reactor compile -> correct + SARIF) and the gate counts the result.
Rationale + tables in docs/ci-static-analysis-design.md; measurement protocol
in docs/ci-measurement-protocol.md.

CPD gating is wired but inert until dsldevkit#1339 lowers the token threshold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant