Skip to content

ci: use browser-style User-Agent for lychee link check#62

Merged
ajbozarth merged 2 commits into
generative-computing:mainfrom
ajbozarth:ci/lychee-user-agent
Jun 11, 2026
Merged

ci: use browser-style User-Agent for lychee link check#62
ajbozarth merged 2 commits into
generative-computing:mainfrom
ajbozarth:ci/lychee-user-agent

Conversation

@ajbozarth

@ajbozarth ajbozarth commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Lychee's default User-Agent (lychee/x.y.z) is rejected with 403 Forbidden by some hosts behind bot-detection layers (Cloudflare-fronted sites in particular), even when the URLs resolve correctly in a real browser. Set a Mozilla-compatible UA so lychee presents as a normal client.

This is the alternative to per-host exclusions like --exclude '.*substack\.com', which silently un-check entire domains sitewide and would let a real broken link slip through unnoticed.

Surfaced while reviewing #61 — that PR added a substack.com exclusion to get its link-check passing. With this change in place, the exclusion can be dropped from #61 and Substack links will be checked normally going forward.

Test plan

  • CI link-check job passes against the full repo (lychee runs on content/blogs/**/*.md *.md, not just changed files, so this PR's run is a dry run across every existing blog link
  • No regressions on currently-passing hosts under the new UA

Lychee's default User-Agent ('lychee/x.y.z') is rejected with 403 by
some hosts behind bot-detection layers (e.g. Cloudflare-fronted sites),
even when the URLs resolve correctly in a real browser. Set a
Mozilla-compatible UA so lychee presents as a normal client.

Avoids per-host exclusions that silently un-check entire domains
sitewide and keeps the link checker actionable for catching typos and
rotted URLs.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth ajbozarth requested a review from a team as a code owner June 10, 2026 17:03
@ajbozarth ajbozarth mentioned this pull request Jun 10, 2026
4 tasks
@ajbozarth ajbozarth self-assigned this Jun 10, 2026
@planetf1

Copy link
Copy Markdown
Collaborator

e2e tests appear to be flaking across multiple runs/PRs — not related to this change. Will keep the --exclude '.*substack\.com' workaround in #61 for now so the blog can merge; happy to drop it once this lands and stabilises.

Playwright 1.58.2 hangs in `playwright install chromium` after the
download reaches 100% on Node 24.16.0+ due to an extract-zip/yauzl
regression (microsoft/playwright#41000). CI's test-e2e job has been
timing out at the 6h limit since the runner picked up the new Node.
Fixed in Playwright 1.60.0.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth ajbozarth enabled auto-merge June 11, 2026 15:59
@ajbozarth ajbozarth added this pull request to the merge queue Jun 11, 2026
Merged via the queue into generative-computing:main with commit ae01a4f Jun 11, 2026
7 checks passed
@ajbozarth ajbozarth deleted the ci/lychee-user-agent branch June 11, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants