Skip to content

test(menu): browser interaction + visual-regression testing (thin slice)#4147

Open
adrianschmidt wants to merge 3 commits into
mainfrom
test/menu-visual-interaction-tests
Open

test(menu): browser interaction + visual-regression testing (thin slice)#4147
adrianschmidt wants to merge 3 commits into
mainfrom
test/menu-visual-interaction-tests

Conversation

@adrianschmidt

@adrianschmidt adrianschmidt commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What this adds

A first slice of browser-based component testing for lime-elements, proven end-to-end on limel-menu. It covers two complementary things our example-tests didn't do before:

  • Interaction tests — drive the real docs example (#/debug/limel-example-menu-basic) in a real browser: click the trigger, confirm the menu opens, pick an item, and verify the selection is actually emitted out to the consumer (and that the menu closes). Also covers opening via keyboard (Enter). These run anywhere, including locally on macOS.
  • Visual-regression test — a pixel screenshot of the open menu, compared against a committed baseline, so an unintended visual change (layout, styling, a menu that silently stops opening) is caught automatically.

This is intentionally a thin slice on one component. The goal is to establish the reusable pattern and the CI/baseline plumbing; rolling it out to more components is follow-up work.

How the visual baselines stay reliable

Screenshots render differently on macOS vs. the Linux CI machines (fonts, anti-aliasing), which normally makes pixel comparison flaky. To avoid that, baselines are only ever generated and compared inside one pinned Docker image (mcr.microsoft.com/playwright:v1.60.0-jammy) that matches the CI environment exactly — so the same pixels are produced on a developer's Mac and in CI.

  • Locally: npm run test:examples:visual:update regenerates baselines (needs Docker); npm run test:examples:visual compares.
  • On a plain host run (npm run test:examples:components), the interaction tests run and the pixel test is automatically skipped rather than failing on a font mismatch.
  • A new CI job runs the suite inside that same image on every PR.

See example-tests/components/README.md for the full developer workflow.

Commits

  1. test(menu) — interaction tests + the per-component suite wiring
  2. test(menu) — the gated visual-regression baseline + Docker/config plumbing
  3. ci — the PR job that runs it all in the pinned container

Verification

  • Interaction suite: deterministic locally (ran repeatedly, consistently green).
  • Visual test: generated and compared cleanly inside the Docker image.
  • Host run correctly skips the visual test.

The one thing only CI can confirm is the new job's Node setup inside the container — this draft PR's checks are the gate for that. Opening as a draft until those pass.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added browser-based component checks for example components, including interaction and visual snapshot coverage.
    • Added a containerized test runner for consistent snapshot results in CI.
  • Bug Fixes

    • Improved screenshot stability by reducing animation- and cursor-related flakiness.
    • Standardized snapshot storage to keep example test baselines predictable.
  • Documentation

    • Added guidance for running example component tests and updating visual snapshots.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@adrianschmidt, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 18 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0afc82f3-2123-4cbd-8774-eacc0d015dd1

📥 Commits

Reviewing files that changed from the base of the PR and between 08b4a33 and 33dcb29.

⛔ Files ignored due to path filters (1)
  • example-tests/components/menu.spec.ts-snapshots/menu-open.png is excluded by !**/*.png
📒 Files selected for processing (6)
  • .github/workflows/pr-checks.yml
  • example-tests/components/README.md
  • example-tests/components/menu.spec.ts
  • package.json
  • playwright.example-tests.config.ts
  • scripts/visual-tests-docker.sh
📝 Walkthrough

Walkthrough

Adds a Playwright component test suite for limel-menu with interaction and visual regression tests, a digest-pinned Docker script for snapshot generation, Playwright config overrides for stable baselines, four new npm scripts, and a component-tests CI job that depends on the prebuilt docs artifact.

Changes

Component Example Playwright Tests

Layer / File(s) Summary
Playwright snapshot config and npm scripts
playwright.example-tests.config.ts, package.json
Adds snapshotPathTemplate, animation/caret-freeze options to the Playwright config, and four npm scripts (test:examples:components, test:examples:components:ci, test:examples:visual, test:examples:visual:update).
Docker visual test runner script
scripts/visual-tests-docker.sh
New Bash script that runs tests inside a digest-pinned Playwright container, isolates node_modules via anonymous volume, sets RUN_VISUAL_SNAPSHOTS=1, and forwards CLI arguments.
limel-menu component test suite
example-tests/components/menu.spec.ts, example-tests/components/README.md
Adds scoped locator helpers, click and keyboard interaction tests, cross-component select-result assertions, a gated visual snapshot test, and README documentation for the test area.
CI component-tests job
.github/workflows/pr-checks.yml
Adds the component-tests job depending on build-example-docs, running in a digest-pinned Playwright container with Node version verification, artifact download, test execution, and failure artifact upload.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Lundalogik/lime-elements#4101: Extends the same playwright.example-tests.config.ts and pr-checks.yml infrastructure for a different Playwright suite (runtime-tests) also consuming the built www artifact.
  • Lundalogik/lime-elements#4103: Modifies pr-checks.yml to add/consume the build-example-docs/www artifact that the new component-tests job also depends on.

Suggested reviewers

  • lime-opensource
  • Kiarokh
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: a thin slice of menu browser interaction and visual-regression tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/menu-visual-interaction-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown

Documentation has been published to https://lundalogik.github.io/lime-elements/versions/PR-4147/

Drive the real limel-example-menu-basic docs example to assert the menu
opens on click and via keyboard, emits the selected item to the consumer,
and closes after selection. New per-component example-tests suite selected
by the `components` Playwright filter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adrianschmidt adrianschmidt force-pushed the test/menu-visual-interaction-tests branch from dc78e6b to d3c1082 Compare June 29, 2026 13:50
@adrianschmidt

Copy link
Copy Markdown
Contributor Author

Consolidated PR Review

🤖 6-agent review (7 dimensions incl. Lime-platform). Posted by Claude Code on behalf of @adrianschmidt.

PR Summary

Adds a thin slice of browser-based component testing for lime-elements, proven on limel-menu: Playwright interaction tests that drive the real docs example (click/keyboard, assert the select event reaches the consumer) plus a Docker-gated visual-regression screenshot. 7 files, +196/−0 — a new example-tests/components/ suite, a pinned-image CI job, 4 npm scripts, a shared playwright-config addition, a Docker wrapper script, and a committed PNG baseline.

Merge Readiness — MERGE WITH CAVEATS ⚠️

This PR is strictly better than main — it adds test coverage and CI where there was none, is cleanly additive, and introduces no regressions. No blockers. Two [Medium] follow-ups are worth resolving before this graduates from a thin slice to a rolled-out pattern, both rooted in the same fact: the pinned Playwright image ships Node 20 while .nvmrc pins Node 24.

  • Blockers: None. Every change is additive (new job/script/file or an inert config key); existing suites, scripts, and CI jobs are verified unaffected.
  • Non-blocking but worth addressing: the Node-version mismatch between the container and .nvmrc (drives both a CI-cost finding in Performance and an overstated-"identical-environment" claim in Architecture), plus a handful of [Low] polish items — see Top Recommendations.

1. Backward Compatibility — SAFE ✅

Purely additive test infrastructure; no published or consumed surface changes.

  • [Low] The Playwright version (v1.60.0-jammy) is duplicated across three sites (CI container.image, scripts/visual-tests-docker.sh, and implicitly @playwright/test in package.json) with comments demanding manual lockstep. Not a regression; documented intent. See Architecture for the scaling angle.
  • Positives: The added expect.toHaveScreenshot/snapshotPathTemplate config keys are inert for the existing accessibility/runtime suites (verified — neither calls toHaveScreenshot), the 4 new scripts don't collide with existing names, and the new CI job adds to no existing job's needs:. Nothing pre-existing is touched.

2. Code Quality — GOOD ✅

High-quality test code; coupling assumptions are documented and verified against the component source. Findings are all polish.

  • [Low] Visual test captures the full viewport (menu.spec.ts:89, expect(page).toHaveScreenshot()). The comment correctly explains the surface is a body portal so the example locator can't be used — but the conclusion doesn't follow: Playwright can screenshot a Locator, so expect(surface).toHaveScreenshot(...) would capture just the portal'd surface and be immune to unrelated chrome/layout shifts elsewhere in the viewport.
  • [Low] The keyboard test (menu.spec.ts:64-73) only asserts the surface opens; it doesn't exercise keyboard selection or the emitted-event contract. Accurate to its name, but the PR description's "also keyboard Enter" reads broader than the coverage.
  • [Low] README calls npm run test:examples:components a "Fast dev loop" (README.md:16), but the script runs docs:rebuild (full Stencil rebuild) every invocation — not fast on a cold/changed tree.
  • Positives: The trigger-locator scoping (inner native button of limel-button to dodge the double-role="button" hydration race), the toBeVisible/toBeHidden choice over attach/detach, and the animations: 'disabled' / caret: 'hide' flake controls all reflect real understanding of the component internals and Playwright semantics.

3. Architecture — GOOD ✅

The interaction-vs-visual separation and the gated-Docker baseline strategy are the right cuts. The findings are about how the pattern scales beyond one component.

  • [Medium] Local-vs-CI environment divergence and an overstated claim. CI runs set-up-node (Node 24, per .nvmrc) inside the container before npm ci, while scripts/visual-tests-docker.sh runs the image's bundled Node (20) directly and does no npm ci — it bind-mounts $PWD, relying on the host's node_modules. On a macOS dev box (the documented primary use case) those node_modules can contain macOS-native binaries that won't run in the Linux container. The script header and README claim the local path is the "exact same environment as CI" / produces "byte-identical pixels"; the pixel claim holds (rendering is browser-driven, identical in both paths) but the "same environment" claim is overstated. Worth a doc correction or adding npm ci inside the container. Not a regression (no prior tooling existed) and CI — the actual gate — is unaffected, hence Medium.
  • [Low] Three-way version coupling (see Backward Compat) has no enforcement; a Dependabot @playwright/test bump that auto-merges wouldn't touch the two string literals. A desync would most likely surface as a loud CI failure rather than silently, so this is a maintainability sharp-edge, not a correctness hole — but a single source of truth (derive the tag, or a CI assert that the literal matches package.json) is worth it as the pattern scales.
  • [Low] Suite separation relies on filename-prefix positional filters (playwright test ... components) rather than configured Playwright projects. Works and follows the established convention, but is load-bearing: a future runtime-*.spec.ts under components/ would be swept into the runtime filter too.
  • [Low] The new component-tests job is single, un-sharded, with no fail-fast: false, unlike its runtime/accessibility siblings. Correct for one spec, but adopting more components later means revisiting the job shape, not just dropping in files.
  • Positives: The single-Linux-baseline decision, the __screenshots__/-vs-tracked-dir distinction, the snapshotPathTemplate resolution math, and the per-component coupling contracts are all documented in-code, so the trade-offs are deliberate rather than accidental.

4. Security — SAFE ✅

Least-privilege CI job, no secret exposure, no injection surface.

  • [Low] The Docker image is tag-pinned (v1.60.0-jammy), not digest-pinned, in both pr-checks.yml:175 and visual-tests-docker.sh:12 — the one supply-chain input that escapes the SHA-pinning discipline every uses: in this file follows. Mitigated by it being an official Microsoft image and the job running contents: read only. Pinning by digest (tag in a comment) would close the gap.
  • [Low] visual-tests-docker.sh bind-mounts the full repo read-write and runs the container as root — intended (it regenerates baselines/docs), local-only, against a pinned first-party image; CI doesn't use this script. Noted, not actionable.
  • Positives: New job sets permissions: contents: read, runs on pull_request (not pull_request_target, so fork code never sees secrets), all uses: are SHA-pinned, and the "$@" passthrough is correctly quoted/argv-delimited with no eval/sh -c injection surface.

5. Observability — GOOD ✅

Failures are diagnosable; materially better than main (which had no component-test job). All findings are improvements, not regressions.

  • [Low] use sets only trace: 'retain-on-failure' (playwright.example-tests.config.ts:49-52) — no video or screenshot. Traces are the richest artifact and largely sufficient, but a video can occasionally beat a partial trace for a hang/timeout. This is the pre-existing project convention.
  • [Low] The job uploads test-results/ (which correctly includes the visual *-actual/*-expected/*-diff.png triplet and trace zips) but configures no html reporter, so a developer must open traces via npx playwright show-trace rather than a one-click diff viewer.
  • [Low] README documents the reproduce-locally path well but not how to consume a CI failure artifact (download → unzip → show-trace / open *-diff.png).
  • Positives: trace: 'retain-on-failure' + if: failure() upload (7-day retention, uniquely-named) gives a working trace-viewer path for both interaction and visual failures, and the default outputDir is left intact so the visual-diff triplet is captured.

6. Performance — GOOD ✅

Cache- and work-conscious by design; the one real inefficiency is the Node-version mismatch.

  • [Medium] set-up-node re-downloads a full Node toolchain inside the container. The image bundles Node 20, but set-up-node installs Node 24 (.nvmrc) on every run — pure overhead unique to this containerized job (host jobs hit a warm runner-tool-cache). Combined with this, the cache: 'npm' restore is unlikely to hit inside the container, so npm ci over the large devDependency tree (prosemirror/react/tabulator/codemirror) runs closer to cold each time. Real recurring cost, but small relative to npm ci itself and not a regression (no prior job), hence Medium. Same root cause as the Architecture [Medium].
  • [Low] retries: 1 in CI (playwright.example-tests.config.ts:45) doubles the cost of a flaky run; the visual toHaveScreenshot is the most flake-prone and most expensive to retry. Matches existing suite policy; fine within the 20-min budget.
  • Positives: Correctly needs: build-example-docs and downloads the prebuilt artifact instead of rebuilding; the :ci script skips both docs:rebuild and playwright install (browsers preinstalled in the image); no hardcoded sleeps — all assertions are auto-waiting. The 26 KB PNG baseline is negligible.

7. Lime Platform Issues — SAFE ✅

No findings. The lime-issue-detector catalog (fetched fresh) is overwhelmingly backend-Python-oriented (UoW hooks, events, tasks, DB); the only potentially-applicable group is LPID-FE-*, which targets shipped Stencil runtime code. This PR adds only test infrastructure and CI — no component runtime code, nothing reachable from production.

  • Cleared concretely: the for...of await loop in menu.spec.ts:49-53 is not LPID-FE-1 (3 hardcoded visibility assertions on an already-loaded page, not data-scaling network fetches); visual-tests-docker.sh is not LPID-SEC-13 (quoted argv passthrough, trusted invocation, no $GITHUB_ENV interpolation).
  • Positives: Defensively written, well-commented test code and solid CI hygiene (SHA-pinned actions, least-privilege permissions, failure-only artifact upload).

Overall Verdicts

Dimension Verdict
Backward Compatibility SAFE
Code Quality GOOD
Architecture GOOD
Security SAFE
Observability GOOD
Performance GOOD
Lime Platform Issues SAFE
Merge Readiness MERGE WITH CAVEATS

Top Recommendations

  1. [Medium from Architecture + Performance] Resolve the Node-version mismatch (container ships Node 20; .nvmrc pins Node 24). Either confirm Node 20 is acceptable for this job and drop set-up-node (removing the per-run Node re-download and the cache miss), or accept the cost and fix the docs — the local visual-tests-docker.sh does not reproduce CI's Node 24, so the "exact same environment as CI" claim in the script header/README should be softened to "same browser/OS, pixel-identical." Consider adding npm ci inside the container in the local script so it doesn't depend on macOS-built node_modules.
  2. [Low from Code Quality] Switch the visual capture to the surface locator — expect(surface).toHaveScreenshot('menu-open.png') — so the baseline tracks the menu, not unrelated example chrome. (Will require regenerating the baseline in Docker.)
  3. [Low from Architecture] Add a guard for the three-way version lockstep — a tiny CI step asserting the container tag matches @playwright/test in package.json, so a future (possibly auto-merged) Playwright bump can't desync the baseline environment.
  4. [Low from Security] Pin the Playwright Docker image by @sha256: digest (keep the human-readable tag in a comment), matching the SHA-pinning discipline the rest of the workflow already follows.
  5. [Low from Code Quality] Soften the README's "Fast dev loop" wording for test:examples:components since it runs a full docs:rebuild each time, and either broaden the keyboard test to assert selection/the emitted event or tighten the PR description's "also keyboard Enter".
  6. [Low from Observability] Optional: add an html reporter (and document the "download artifact → show-trace / open *-diff.png" path in the README) so CI failures are one click to diagnose.

@adrianschmidt

Copy link
Copy Markdown
Contributor Author

Addressed the review. Two fixup commits pushed (fixup! test(menu): add gated visual-regression baseline, fixup! ci: run component example-tests in the Playwright container).

Fixed

Node toolchain re-download (Medium). The review's premise was off: mcr.microsoft.com/playwright:v1.60.0-jammy actually ships Node 24.15.0 (verified with docker run ... node -v), not Node 20 — Microsoft moved these images to Node 24. That bundled version already satisfies .nvmrc (v24.13, same major), so the set-up-node step in component-tests was re-downloading an equivalent toolchain every run for no benefit. Dropped it; npm ci now runs under the bundled Node. The job only does npm ci + playwright test against a pre-built docs artifact (no Node-24-specific build, no install lifecycle scripts), so this is safe.

Local script now self-contained + "same environment" claim accurate. scripts/visual-tests-docker.sh now runs npm ci inside the container against an anonymous node_modules volume, so it no longer depends on the host's macOS-native node_modules and doesn't clobber it either. With set-up-node gone, CI and the local script both use the image-bundled Node 24 + a fresh npm ci — genuinely the same runtime now, so the header wording is honest. Verified end-to-end with Docker; host node_modules confirmed intact afterward.

Visual capture scope (Low). Switched expect(page)expect(surface) so the baseline tracks just the portal'd menu surface, immune to unrelated example-chrome shifts. Regenerated the committed baseline inside Docker (test:examples:visual:update) — the PNG dropped from 26 KB to 2.3 KB and now shows only the Copy/Cut/Paste surface. All 3 tests pass.

Digest-pin the image (Low). Both pr-checks.yml and the shell script now pin by @sha256: with the v1.60.0-jammy tag in a comment as the human-readable handle. README updated to match.

README "Fast dev loop" (Low). Reworded — test:examples:components runs a full docs:rebuild, so "fast" was misleading.

Deliberately not changed (thin-slice scope)

  • Version-lockstep CI guard / sharding / projects-based suite separation / fail-fast — scaling-oriented; correct for a single spec and out of scope for this slice. A tag/version desync fails CI loudly anyway.
  • Keyboard test breadth — kept as a scoped "Enter opens the surface" smoke check. The click test already exercises the full select → emitted-event → close contract, so duplicating that on the keyboard path adds little here.
  • html reporter / CI-artifact-consumption docs — pre-existing convention (trace: 'retain-on-failure' matches the runtime/accessibility siblings); not worth diverging in this PR.

@adrianschmidt

Copy link
Copy Markdown
Contributor Author

Consolidated PR Review — Round 2 (after fixups)

🤖 6-agent re-review (7 dimensions incl. Lime-platform). Posted by Claude Code on behalf of @adrianschmidt. Re-run after the two fixup commits addressing round 1.

PR Summary

Thin slice of browser-based component testing for lime-elements, proven on limel-menu: Playwright interaction tests plus a Docker-gated visual-regression screenshot, with a pinned-image CI job. 7 files, +213/−0. This round reviews the state after the round-1 fixups: set-up-node dropped from the CI job (the pinned image already bundles Node 24, matching .nvmrc), the Docker image digest-pinned in both the workflow and the local script, the local script made self-contained (npm ci in an anonymous node_modules volume), and the visual capture scoped to the menu surface.

Merge Readiness — READY TO MERGE ✅

This PR is now strictly better than main with no regressions and, after the fixups, no [Medium] findings remain. The two round-1 [Medium]s (local-vs-CI Node divergence; set-up-node re-downloading a redundant toolchain) and the round-1 security [Low] (tag-pinned image) are all resolved. Every remaining finding is [Low] — either explicitly deferred with sound thin-slice rationale or a nice-to-have.

  • Blockers: None.
  • Non-blocking but worth addressing: a few [Low] items, mostly carried-over documented deferrals (version-lockstep CI guard, sharding, projects-based suite separation, broader keyboard coverage) plus two minor new ones (CI comment asserts a Node version not checkable from the repo; the local Docker run doesn't set CI=1 so it can't reproduce CI's retry/worker scheduling). See Top Recommendations.

1. Backward Compatibility — SAFE ✅

No regression vs main from the fixups; purely additive.

  • [Low] (carried) Three/four-way version lockstep — the digest in pr-checks.yml and visual-tests-docker.sh, the v1.60.0-jammy tag in comments, @playwright/test, and .nvmrc — is documented but unenforced. Now slightly tighter (byte-identical digest in both run paths). Author deferred a CI guard; remains [Low].
  • Positives: The fixups strictly improve posture (digest pinning enforced identically in both paths, redundant toolchain download removed, local script decoupled from host-native modules, baseline scope tightened) without touching any consumer-facing API, the CI job dependency graph, or shared test config — verified component-tests is not in automerge's needs:.

2. Code Quality — GOOD ✅

Round-1 [Low]s correctly resolved; the fixups themselves are high-quality. Comments were verified against the actual component source.

  • [Low] (new) The CI comment asserts the image "already bundles Node 24 (matches .nvmrc)" (pr-checks.yml), which is true (.nvmrc is v24.13) but not checkable from the repo — if a future image bump changes the bundled major, the comment silently rots. Minor doc-rot risk, not blocking.
  • [Low] (carried, deferred) The keyboard test asserts only that the surface opens, not keyboard selection/the emitted-event contract. Explicit thin-slice deferral.
  • Resolved: visual capture is now expect(surface).toHaveScreenshot('menu-open.png') (the portal-on-body justification verified against menu.tsx/portal.tsx), baseline regenerated (2321 bytes); README's "Fast dev loop" wording removed and the Running table matches the real scripts.
  • Positives: The new bash -c 'npm ci && npm run …:ci -- "$@"' -- "$@" argv-passthrough was empirically confirmed correct under set -euo pipefail (no-arg, --update-snapshots, and args-with-spaces all forward intact); every load-bearing comment (portal, locator ambiguity, snapshot-path resolution, digest lockstep) checks out against source.

3. Architecture — GOOD ✅

The round-1 [Medium] is fully resolved at the source; no [High], no regression.

  • [Low] (new) The local Docker run sets RUN_VISUAL_SNAPSHOTS=1 but not CI=1, so it uses different retries/workers/reporter than CI. Does not affect the byte-identical-pixel guarantee (screenshots are deterministic and animations/caret are frozen), so it's purely a nuance — local can't reproduce CI's flaky-retry scheduling.
  • [Low] (carried, deferred) Version coupling unenforced; components is a positional filename filter rather than a Playwright project; the component-tests job is single/un-sharded unlike its sharded siblings. All documented thin-slice deferrals.
  • Resolved: Both runners now use the identical digest-pinned image and the identical npm script, so local and CI genuinely share Node/OS/Chromium — the round-1 "exact same environment" claim is now accurate. Dropping set-up-node is design-sound: an in-container npm ci against the committed lockfile is hermetic.
  • Positives: The fix collapses both runners onto one digest-pinned image rather than papering over the divergence; the anonymous-volume node_modules isolation is the right pattern; version-coupling intent is unusually well-documented across all three touch points.

4. Security — SAFE ✅

Round-1 [Low] resolved; no new concerns.

  • Resolved: Image is now digest-pinned (@sha256:e1529a04…) identically in pr-checks.yml and visual-tests-docker.sh, tag retained only as a comment/handle — closing the one supply-chain input that previously escaped the SHA-pinning discipline.
  • The new bash -c '…' wrapper introduces no injection surface: the payload is a single-quoted literal (no host-shell interpolation) and args flow through the canonical … -- "$@" positional pattern, never concatenated into the command string. The fixup actually narrowed the local RW-mount exposure by giving the container its own node_modules volume. Bare npm ci (no set-up-node) is lockfile-deterministic and changes no trust posture.
  • Positives: Digest pin consistent across both files; injection-safe argv idiom; contents: read on every container/test job under a pull_request (not pull_request_target) trigger; all third-party actions SHA-pinned.

5. Observability — GOOD ✅

No regression; the visual-capture change is artifact-neutral.

  • [Low] (carried) Only trace: 'retain-on-failure' (no video/screenshot — pre-existing convention); no html reporter; README doesn't document consuming a CI failure artifact. The latter two were explicitly deferred.
  • Verified the expect(surface) change does not affect failure attachments: a toHaveScreenshot mismatch still writes the menu-open-expected/-actual/-diff.png triplet into the default test-results/, which the job uploads on if: failure() (7-day retention).
  • Positives: Trace upload on failure across all test jobs with explanatory comments; the visual-diff triplet path is confirmed intact and uploaded; the skip message is precise and actionable; animations: 'disabled' + caret: 'hide' actively reduce flake.

6. Performance — GOOD ✅

Both round-1 [Medium]s effectively resolved; net CI cost is better than both main and round 1.

  • [Low] (downgraded from round-1 [Medium]) Dropping set-up-node also dropped its cache: 'npm', so the job now does npm ci with no GHA npm cache. Assessed as a wash-to-slight-win, not a regression: the npm cache only accelerates the tarball-download phase (npm ci rebuilds node_modules regardless), and in-container the round-1 cache was paying restore/save overhead for an unlikely hit. Optionally, actions/cache on ~/.npm keyed by the lockfile could shave registry-download time later — explicitly optional.
  • [Low] (note) The local script does a cold npm ci in a fresh anonymous volume each run — fine for an occasional opt-in visual workflow; a named volume could speed iteration if it ever gets painful.
  • [Low] (carried) retries: 1 in CI can ~2× a genuinely-failing run; bounded by timeout-minutes: 20 and standard Playwright practice.
  • Resolved: The Node re-download (round 1's larger cost) is genuinely eliminated.
  • Positives: Node re-download removed with clear rationale; tiny 2.3 KB surface-scoped baseline; poll-based web-first assertions with no fixed sleeps; single non-sharded job with a sane timeout backstop.

7. Lime Platform Issues — SAFE ✅

No findings (round 1 also clean). The lime-issue-detector catalog (fetched fresh) targets Stencil component runtime code and Python lime-core backends; this PR is purely additive test infrastructure with no component lifecycle, no state, no Python. The bash -c … -- "$@" script forwards only its own literal --update-snapshots flag (no external input), so LPID-SEC-13 does not apply; the 2321-byte tracked baseline is an intentional fixture, not LPID-RH build output.

  • Positives: Strong supply-chain hygiene (digest-pinned container, SHA-pinned actions), injection-safe argv idiom, correctly-gated and de-flaked visual tests, explicit fail-loud coupling to limel-example-menu-basic.

Overall Verdicts

Dimension Verdict
Backward Compatibility SAFE
Code Quality GOOD
Architecture GOOD
Security SAFE
Observability GOOD
Performance GOOD
Lime Platform Issues SAFE
Merge Readiness READY TO MERGE

Top Recommendations

All findings are [Low] — none gate merge. In rough priority:

  1. [Low from Architecture, optional] Add a tiny CI guard asserting the container digest/tag matches @playwright/test in package.json (and .nvmrc), so a future (possibly auto-merged) Playwright bump can't silently desync the baseline environment. This also addresses the Code-Quality doc-rot [Low] (the unverifiable "bundles Node 24" comment) by making the relationship machine-checked.
  2. [Low from Architecture, optional] If you want the local Docker run to mirror CI's retry/worker scheduling for debugging flake, pass -e CI=1 in visual-tests-docker.sh (pixels are unaffected either way, so leave it if local-fast is preferred).
  3. [Low from Performance, optional] If npm ci registry-download time on the component-tests job ever becomes noticeable, wire actions/cache on ~/.npm keyed by package-lock.json; not worth it at current scale.
  4. [Low from Code Quality / carried, deferred] When the pattern rolls out beyond this thin slice, broaden the keyboard test to assert selection + the emitted event, and revisit projects-based suite separation and CI sharding — all explicitly deferred here, which is reasonable.
  5. [Low from Observability / carried, deferred] Optionally add an html reporter and a short "download artifact → show-trace / open -diff.png" note to the README to make CI failures one-click to diagnose.

@adrianschmidt

Copy link
Copy Markdown
Contributor Author

Reviewed the round-2 feedback. All items are [Low] / optional / explicitly deferred, and the round-2 verdict was READY TO MERGE with no blockers. I'm declining/deferring all of them and pushing nothing — keeping this thin slice thin. Reasons per item:

  1. CI version-lockstep guard — declined (deferred to rollout). I prototyped the cheap, network-free version of this and it works: it can string-check # v1.60.0-jammy against @playwright/test in package.json and confirm the digest is identical in pr-checks.yml and scripts/visual-tests-docker.sh. But the assertions the guard would most want to make — that the digest actually resolves to the v1.60.0-jammy tag, and that the image bundles Node 24 matching .nvmrc — require docker pull + registry access, which a lightweight guard can't do. So a cheap guard would only police a 3-occurrence string surface while giving a false impression of "machine-checked Node 24". Not worth standing CI machinery on a 213-line, single-component thin slice. This belongs with the rollout work (item 4), when the digest surface and the number of consuming jobs actually grow.

  2. Local run CI=1 — declined. Confirmed in playwright.example-tests.config.ts that CI only flips retries (1 vs 0), workers (50% vs default), forbidOnly, reporter, and reuseExistingServer. None of these affect the captured pixels (screenshot comparison is deterministic). Keeping local fast (server reuse, fail-fast, no retries) is the better local-dev default; the byte-identical claim rests on the image/Chromium, not on scheduling.

  3. npm cache on component-tests job — declined. Single job, npm ci rebuilds node_modules regardless; not worth the cache wiring at current scale.

  4. Broader keyboard tests + projects-based suite separation + CI sharding — deferred. Explicitly post-thin-slice rollout work.

  5. html reporter + README "consume CI artifact" note — declined/deferred. Optional; the failure-path trace upload already covers the debugging need.

  6. "bundles Node 24" CI comment — kept as-is. Accurate today and low-risk. The doc-rot concern is subsumed by item 1, which I'm deferring; the comment correctly documents why we skip setup-node (the image already bundles a matching Node).

Consider this ready to merge.

@adrianschmidt

Copy link
Copy Markdown
Contributor Author

Visual baseline: reverted to a full-component capture

The round-2 fixups had narrowed the visual screenshot to just the menu surface (expect(surface).toHaveScreenshot(...)). I've reverted that back to a full capture (expect(page).toHaveScreenshot(...)).

Why: the purpose of this test is to verify the component as a whole. The menu's position relative to its trigger button is exactly the kind of thing a visual test should catch — and a surface-only capture drops it. The baseline now spans both the "Menu" button and the open menu beneath it.

Trade-off (intentional): the menu renders as a portal on document.body, and the #/debug/ route renders the full docs shell — so a viewport capture is the only thing that can span both the trigger and the surface. That means the baseline also includes the surrounding docs chrome (nav/sidebar). That chrome is static and deterministic, so it won't cause flake; if it ever legitimately changes, the baseline is regenerated like any other.

Baseline regenerated in the same digest-pinned Playwright image and verified locally (3/3 green); CI re-verifies it in that same image.

🤖 Posted by Claude Code on behalf of @adrianschmidt

@adrianschmidt adrianschmidt force-pushed the test/menu-visual-interaction-tests branch from b60895d to 08b4a33 Compare June 29, 2026 16:20
@adrianschmidt adrianschmidt marked this pull request as ready for review June 29, 2026 16:21
@adrianschmidt adrianschmidt requested review from a team and lime-opensource as code owners June 29, 2026 16:21

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/pr-checks.yml:
- Line 185: The checkout step in the workflow still uses the default credential
persistence, which leaves the job token available in the working copy. Update
the existing actions/checkout usage in this job to explicitly set
persist-credentials to false so the later npm ci and Playwright steps run with a
checked-out repository but without stored Git credentials.

In `@scripts/visual-tests-docker.sh`:
- Around line 21-27: The docker invocation in scripts/visual-tests-docker.sh is
running as the image default user, which can leave updated snapshots in the
bind-mounted /work checkout owned by root. Update the docker run setup to
execute the container with the host UID/GID or another ownership-matching
approach so files written by the visual test flow stay editable on the host.
Keep the change localized to the docker command used by the visual snapshot/test
script.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5d83670b-1093-4524-b50d-3e67a020aad8

📥 Commits

Reviewing files that changed from the base of the PR and between cdb1dde and 08b4a33.

⛔ Files ignored due to path filters (1)
  • example-tests/components/menu.spec.ts-snapshots/menu-open.png is excluded by !**/*.png
📒 Files selected for processing (6)
  • .github/workflows/pr-checks.yml
  • example-tests/components/README.md
  • example-tests/components/menu.spec.ts
  • package.json
  • playwright.example-tests.config.ts
  • scripts/visual-tests-docker.sh

Comment thread .github/workflows/pr-checks.yml
Comment thread scripts/visual-tests-docker.sh Outdated
adrianschmidt and others added 2 commits June 29, 2026 19:02
Add a toHaveScreenshot test for the open menu, gated behind
RUN_VISUAL_SNAPSHOTS and run only inside the pinned Playwright Docker image
so macOS dev and Linux CI compare byte-identical pixels. Includes the Docker
wrapper script, npm scripts, snapshot-path config, and the committed baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a PR job that runs the interaction + visual component example-tests
inside mcr.microsoft.com/playwright:v1.60.0-jammy, matching the environment
that generated the committed baselines, and uploads diff images/traces on
failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adrianschmidt adrianschmidt force-pushed the test/menu-visual-interaction-tests branch from ad4d4c2 to 33dcb29 Compare June 29, 2026 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant