refactor(seo): consolidate spec language overview into hub with ?language= filter#5297
Conversation
…uage= filter
The /{spec_id} and /{spec_id}/{language} pages rendered virtually identical
content with separate canonical URLs, causing duplicate-content issues for
search engines. Consolidate them into a single canonical hub page
(/{spec_id}), and serve language filtering as /{spec_id}?language={language}
— a client-side filter whose canonical tag still points at the unfiltered
hub. Same pattern already used for ?view=interactive on detail pages.
- Router: /:specId/:language redirects to /:specId?language=:language
- SpecPage: Mode reduced to hub|detail; hub grid filters from ?language=
when present, canonical always /{spec_id}
- Sitemap: stop emitting /{spec}/{language} URLs
- SEO proxy: /seo-proxy/{spec}/{language} returns 301 to /seo-proxy/{spec}
- nginx python.anyplot.ai: hub rewrites to /seo-proxy/{spec} (detail URLs
keep /python segment since those are content-unique)
- Docs + unit tests updated
https://claude.ai/code/session_01Hiwzn5mc979FDGCHkW4os1
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| it — Google should consolidate the page, not a filtered variant. | ||
| """ | ||
| del language # referenced for route matching only; deliberately not forwarded | ||
| return RedirectResponse(url=f"/seo-proxy/{spec_id}", status_code=301) |
There was a problem hiding this comment.
Pull request overview
This PR consolidates spec language overview pages into the cross-language hub to eliminate duplicate-content URLs for search engines, moving language selection to a ?language= filter while keeping a single canonical URL per spec.
Changes:
- Redirect SPA route
/:specId/:language→/:specId?language=:language, and simplifySpecPageto hub/detail modes with hub filtering driven by?language=. - Update sitemap generation/tests to stop emitting
/{spec_id}/{language}URLs and keep only hub + implementation detail URLs. - Change SEO proxy
GET /seo-proxy/{spec_id}/{language}to a permanent 301 redirect to/seo-proxy/{spec_id}, and adjust python.anyplot.ai nginx proxying accordingly; update SEO docs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/api/test_seo_helpers.py | Updates sitemap helper expectations to exclude the per-language overview tier. |
| tests/unit/api/test_routers.py | Updates sitemap router test expectations to exclude /{spec_id}/{language} URLs. |
| docs/reference/seo.md | Documents the consolidated hub + ?language= filtering and updated sitemap behavior. |
| app/src/router.tsx | Adds a redirect component for legacy /:specId/:language SPA routes. |
| app/src/pages/SpecPage.tsx | Removes language mode, adds hub filtering from ?language=, updates canonical/title/links. |
| app/nginx.conf | Updates python.anyplot.ai bot proxying to use the hub proxy without injecting /python for hub routes. |
| api/routers/seo.py | Removes language-overview URLs from sitemap and converts /seo-proxy/{spec}/{language} to a 301 redirect. |
Comments suppressed due to low confidence (1)
app/nginx.conf:201
- On the python.anyplot.ai server block, a request like
/scatter-basic/pythonwill match this/:specId/:librarylocation and proxy to/seo-proxy/scatter-basic/python/python(treatingpythonas a library). That will likely 404 and prevents bots from following the intended/seo-proxy/{spec_id}/{language} -> 301 /seo-proxy/{spec_id}consolidation path. Add a higher-priority location for^/{spec_id}/python/?$(or a general/{spec_id}/{language}handler) that proxies to/seo-proxy/$spec_id/python(or directly to/seo-proxy/$spec_id).
# /:specId/:library -> detail on main domain (language stays in path)
location ~ "^/(?<spec_id>[A-Za-z0-9][A-Za-z0-9-]*)/(?<library>[A-Za-z0-9][A-Za-z0-9-]*)/?$" {
set $python_seo_uri /seo-proxy/$spec_id/python/$library;
error_page 418 = @seo_proxy_python;
if ($is_bot) { return 418; }
| @router.get("/seo-proxy/{spec_id}/{language}") | ||
| async def seo_spec_language(spec_id: str, language: str, db: AsyncSession | None = Depends(optional_db)): | ||
| """Bot-optimized language-specific spec overview.""" | ||
| if db is None: | ||
| return HTMLResponse( | ||
| BOT_HTML_TEMPLATE.format( | ||
| title=f"{html.escape(spec_id)} - {html.escape(language)} | anyplot.ai", | ||
| description=DEFAULT_DESCRIPTION, | ||
| image=DEFAULT_HOME_IMAGE, | ||
| url=f"https://anyplot.ai/{html.escape(spec_id)}/{html.escape(language)}", | ||
| ) | ||
| ) | ||
|
|
||
| key = cache_key("seo", spec_id, language) | ||
| cached = get_cache(key) | ||
| if cached: | ||
| return HTMLResponse(cached) | ||
|
|
||
| repo = SpecRepository(db) | ||
| spec = await repo.get_by_id(spec_id) | ||
| if not spec: | ||
| raise HTTPException(status_code=404, detail="Spec not found") | ||
|
|
||
| lang_impls = [i for i in spec.impls if i.library and i.library.language == language] | ||
| has_previews = any(i.preview_url for i in lang_impls) | ||
| image = f"https://api.anyplot.ai/og/{spec_id}.png" if has_previews else DEFAULT_HOME_IMAGE | ||
|
|
||
| result = BOT_HTML_TEMPLATE.format( | ||
| title=f"{html.escape(spec.title)} - {html.escape(language)} | anyplot.ai", | ||
| description=html.escape(spec.description or DEFAULT_DESCRIPTION), | ||
| image=html.escape(image, quote=True), | ||
| url=f"https://anyplot.ai/{html.escape(spec_id)}/{html.escape(language)}", | ||
| ) | ||
| set_cache(key, result) | ||
| return HTMLResponse(result) | ||
| async def seo_spec_language(spec_id: str, language: str): | ||
| """Permanent redirect: language-overview URLs now live on the hub with ?language=. | ||
|
|
||
| The /{spec_id}/{language} tier was consolidated into /{spec_id} to eliminate | ||
| duplicate content. Bots following this endpoint get a 301 to the hub proxy; | ||
| humans get the SPA redirect configured in app/src/router.tsx. The `language` | ||
| query parameter is dropped because the hub's canonical tag does not include | ||
| it — Google should consolidate the page, not a filtered variant. | ||
| """ | ||
| del language # referenced for route matching only; deliberately not forwarded | ||
| return RedirectResponse(url=f"/seo-proxy/{spec_id}", status_code=301) |
There was a problem hiding this comment.
This endpoint’s behavior changed from serving HTML to returning a permanent 301. There’s currently no unit test asserting the redirect status code and Location header (and that the language segment is intentionally dropped). Adding a test for GET /seo-proxy/{spec}/{language} would prevent regressions and ensure crawlers get the expected consolidation behavior.
| if (mode === 'hub') { | ||
| trackPageview(languageFilter ? `/${specId}?language=${languageFilter}` : `/${specId}`); | ||
| } else if (mode === 'detail' && selectedLibrary) { |
There was a problem hiding this comment.
In hub mode this calls trackPageview with a ?language= query string. useAnalytics().trackPageview currently validates urlOverride with /^\/([\w\-/])*$/ (no ?, =), so this override will be rejected and the pageview won’t be tracked when languageFilter is set. Consider either (a) encoding the filter into a path-only override that matches the allowed charset, or (b) not overriding and extending the analytics URL builder to incorporate language safely.
…path segment Three fixes for PR #5297: 1. seo_spec_language (api/routers/seo.py): validate spec_id against the canonical `^[a-z0-9]+(-[a-z0-9]+)*$` pattern before embedding it in the Location header. Closes the CodeQL "Untrusted URL redirection" alert. 2. Add unit tests asserting the 301 + Location behaviour and the 404 response for malformed spec_ids. 3. Fix analytics tracking for the filtered hub: buildPlausibleUrl now includes `language` in its orderedKeys list, so ?language=python is converted to the /{spec}/language/python path-segment form that matches every other filter. SpecPage.tsx hub mode calls trackPageview() without an override so the new path-segment URL is picked up. This unblocks pageview tracking that was silently dropped by the urlOverride validation regex (which rejects ? and =). https://claude.ai/code/session_01Hiwzn5mc979FDGCHkW4os1
The /{spec_id} and /{spec_id}/{language} pages rendered virtually identical
content with separate canonical URLs, causing duplicate-content issues for
search engines. Consolidate them into a single canonical hub page
(/{spec_id}), and serve language filtering as /{spec_id}?language={language}
— a client-side filter whose canonical tag still points at the unfiltered
hub. Same pattern already used for ?view=interactive on detail pages.
when present, canonical always /{spec_id}
keep /python segment since those are content-unique)
https://claude.ai/code/session_01Hiwzn5mc979FDGCHkW4os1