Skip to content

chore: enable CPD detection (pmd.cpd.min → 100), de-dup and baseline#1397

Draft
joaodinissf wants to merge 2 commits into
dsldevkit:masterfrom
joaodinissf:ci/cpd-min-threshold
Draft

chore: enable CPD detection (pmd.cpd.min → 100), de-dup and baseline#1397
joaodinissf wants to merge 2 commits into
dsldevkit:masterfrom
joaodinissf:ci/cpd-min-threshold

Conversation

@joaodinissf
Copy link
Copy Markdown
Collaborator

@joaodinissf joaodinissf commented May 30, 2026

Closes #1339.

Note

Stacked on #1396. The first commit (ci: fast static analysis…) belongs to #1396 and will drop out once that merges; review only the second commit here (chore: enable CPD…). I'll rebase to a clean single commit after #1396 lands.

Problem

ddk-parent/pom.xml set pmd.cpd.min=100000, which effectively disables PMD's copy/paste detector — pmd:cpd-check passes on virtually any duplication (#1339).

What this does

  • Lowers pmd.cpd.min to 100 (PMD's default), so CPD actually detects duplication going forward — and the lint job's CPD gate (from ci: fast static analysis with early-fail lint + inline SARIF (PMD/Checkstyle/SpotBugs) #1396) becomes meaningful.
  • At 100 the reactor surfaces 9 pre-existing duplications (5 file-groups). Each was assessed for whether it is genuinely extractable logic or incidental/deliberate similarity:
    • 2 refactored away (C, E) — genuine duplicated logic; behavior-neutral extraction.
    • 3 baselined (A, B, D) with // CPD-OFF markers carrying a per-site reason — incidental token overlap, or deliberately-explicit code where sharing would hurt readability.
  • Verified: check.ui and xtext compile, and pmd:cpd-check is clean across the full reactor at threshold 100 (0 duplications).

The 9 duplications — disposition

Group File(s) Kind Disposition Rationale
A AbstractValidationTest (test core) test scaffolding baseline error-message assembly + assert helpers in a test base; per-test clarity beats a shared extraction that would obscure intent.
B ParameterListMatcherTest (typesystem test) test cases baseline sibling testForceMatchByPosition* cases share an arrange-act-assert skeleton, kept explicit for readability.
C Check{Validator,Quickfix}ExtensionHelper (prod UI) helper pair refactored identical isExtensionUpdateRequired guard chain lifted to AbstractCheckExtensionHelper, keyed on the existing getExtensionPointId() plus a new protected getTargetClassName() hook. The documentation-helper subtree (no target class) declares that hook unsupported once in its shared parent.
D DispatchingCheckImpl (prod runtime) incidental baseline the matched spans are incidental token overlap (DI field declarations + the trace/try/run/catch/finally idiom), not an extractable unit.
E Abstract{,Streaming}FingerprintComputer (prod) helper pair refactored the identical private featureIterable helper extracted to a shared package-private FingerprintFeatures utility.

Notes

🤖 Generated with Claude Code

Replace the pmd/checkstyle jobs with a parallel shape that gives early,
inline feedback and stops re-running analysis inside the build:

  - lint: compile + pmd:pmd + checkstyle:checkstyle (SARIF) + pmd:cpd-check,
    -T 2C, --fail-never; gates by counting the merged SARIF (+ cpd.xml grep).
    Fails in ~3-5 min on its own check, independent of the build.
  - spotbugs: compile + spotbugs:spotbugs (SARIF), -Xmx4g, own parallel lane
    (the slow analysis).
  - maven-verify: build + tests only; the redundant checkstyle/pmd/spotbugs
    goals are dropped (now owned by lint/spotbugs).
  - line-endings: unchanged.

All three emit SARIF 2.1.0, merged per tool and uploaded to Code Scanning
(security-events: write) for inline annotations on the PR diff + Security tab.
No custom Python annotator.

Count-gate rather than the *:check goals: the check goals @Execute-fork a
second analysis and cannot emit SARIF, and without the full compile classpath
they false-positive on type-resolving rules. Each report goal runs once
(full-reactor compile -> correct + SARIF) and the gate counts the result.
Rationale + tables in docs/ci-static-analysis-design.md; measurement protocol
in docs/ci-measurement-protocol.md.

CPD gating is wired but inert until dsldevkit#1339 lowers the token threshold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joaodinissf joaodinissf force-pushed the ci/cpd-min-threshold branch from 5f1c381 to d45faf3 Compare May 30, 2026 21:30
@joaodinissf joaodinissf changed the title chore: enable CPD detection (pmd.cpd.min → 100) and baseline existing duplications chore: enable CPD detection (pmd.cpd.min → 100), de-dup and baseline May 30, 2026
…aseline

pmd.cpd.min was 100000, which effectively disabled PMD's copy/paste detector
(dsldevkit#1339). Lower it to PMD's default of 100 so CPD catches duplication going
forward.

At threshold 100 the reactor surfaces 9 pre-existing duplications (5 groups).
Two are genuinely extractable logic and are refactored away; the other three
are incidental or deliberately explicit and are baselined with // CPD-OFF
markers that each carry a per-site reason.

Refactored:
- check.ui ExtensionHelper guard chain: the identical isExtensionUpdateRequired
  body in CheckValidatorExtensionHelper and CheckQuickfixExtensionHelper is
  extracted to a shared isTargetClassExtensionUpdateRequired method on
  AbstractCheckExtensionHelper (keyed on the existing getExtensionPointId() and
  a new protected getTargetClassName() hook); the two helpers now delegate to
  it. The base isExtensionUpdateRequired keeps its coarse extension-point check,
  which the documentation helpers rely on via super; that subtree has no target
  class, so it declares the getTargetClassName hook unsupported once in its
  shared parent (AbstractCheckDocumentationExtensionHelper).
- xtext fingerprint featureIterable: the identical private helper in
  AbstractFingerprintComputer and AbstractStreamingFingerprintComputer is
  extracted to a shared package-private FingerprintFeatures utility.

Baselined (CPD-OFF, with reason):
- DispatchingCheckImpl: incidental token overlap (DI field + trace/try idiom),
  not an extractable unit.
- ParameterListMatcherTest: explicit parameterized test cases, kept readable.
- AbstractValidationTest: test scaffolding; per-test clarity beats extraction.

Verified: check.ui and xtext compile; pmd:cpd-check is clean across the full
reactor at threshold 100.

Closes dsldevkit#1339

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joaodinissf joaodinissf force-pushed the ci/cpd-min-threshold branch from d45faf3 to b4bfa9d Compare May 30, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: pmd.cpd.min set to 100000 effectively disables CPD

1 participant