audit: BreadcrumbList, robots policy, og:title, sitemap priorities by slayerjain · Pull Request #832 · keploy/docs

slayerjain · 2026-04-14T13:42:57Z

Summary

Batched fixes from the 2026-04-14 live-site + Copilot audit. Originally scoped to BreadcrumbList + robots, expanded to cover the full audit pass — each concern is a separate commit so history stays reviewable.

BreadcrumbList schema missing on docs glossary pages

Glossary pages like /docs/concepts/reference/glossary/idempotency/ were emitting Article/ImageObject/Organization/WebPage schemas but no BreadcrumbList. Root cause: DocBreadcrumbs theme component had an early return when the sidebar trail was null (which glossary pages have). Fix: emit a fallback BreadcrumbList even when the sidebar-derived trail is unavailable.

Nuanced AI bot robots.txt policy

Allow AI search / answer engines (Perplexity, ChatGPT-User, Claude-SearchBot, Gemini-Deep-Research, Applebot, etc.) to crawl everywhere, block training-only bots (GPTBot, ClaudeBot, CCBot, Google-Extended, etc.), keep Bytespider blocked. The legacy-version Disallow: /docs/{1,2,3}.0.0/ lines are applied inside the AI search group as well (robots.txt named groups don't inherit from User-agent: *), and Crawl-delay + /cgi-bin are mirrored too so the group is a proper superset of the defaults.

Per-page og:title + hub Article schema suppression + sitemap priorities

og:title now reflects the current page title instead of always falling back to the site name
Article schema suppressed on /docs/ root, versioned roots like /docs/4.0.0/, and category index pages where it was incorrectly applied (hub pages have no single author/date/headline)
Explicit sitemap priority buckets per page type: 1.0 for root, 0.9 for quickstart, 0.8 for running-keploy, 0.7 for concepts/keploy-explained, 0.6 for keploy-cloud/ci-cd and FAQ/troubleshooting pages (matched via -faq/ and /common-errors/ to reflect actual v4 routes)

Test plan

Build docs locally, verify no sidebar or schema regressions
Confirm robots.txt applies legacy-version block to named AI search bot groups
Manually test BreadcrumbList on /docs/concepts/reference/glossary/idempotency/ after deploy
Verify sitemap priority values on /docs/sitemap.xml

Note: An earlier iteration of this PR added a "Keploy vs Alternatives" comparison doc at /docs/keploy-explained/keploy-vs-alternatives/ plus a v4 sidebar entry for it. That file and sidebar entry were removed in commit b56c813 per reviewer feedback — product comparison framing belongs on the landing site, not under the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.

🤖 Generated with Claude Code

Adopt the Speedscale / Katalon / Testsigma split: - Allow AI SEARCH bots (drive answer visibility): OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot. - Block TRAINING-ONLY bots: GPTBot, ClaudeBot, anthropic-ai, CCBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent, FacebookBot, cohere-ai, Diffbot, Omgilibot, ImagesiftBot. - Keep Bytespider blocked. Also add belt-and-braces Disallow for unmaintained legacy doc versions (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) to reinforce existing noindex+canonical signals for crawlers that ignore them. Reopens Task 52 per user direction 2026-04-14. Mirrors the corresponding policy change on landing and blog-website robots.txt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>

@type

LIVE-20. Live audit of /docs/concepts/reference/glossary/idempotency/ on 2026-04-14 showed @type schema blocks Article, ImageObject, Organization, WebPage — and no BreadcrumbList. The /docs/ root, by contrast, has BreadcrumbList. This is the specific glossary-page regression that Task 13 was meant to catch but didn't. Root cause: the DocBreadcrumbs component had an early `return null` when useSidebarBreadcrumbs() returned null/undefined, which suppresses both the visual breadcrumb UI AND the JSON-LD schema emission. For deep glossary pages whose sidebar context resolves to null, this meant zero BreadcrumbList — the regression. Changes: - Replace `if (!breadcrumbs) return null` with a safe fallback to an empty sidebarTrail array. Schema emission + Home/Docs items run unconditionally. - Only render the visual <nav> when sidebarTrail has entries (avoids showing an empty breadcrumb UI on schema-only pages). - Propagate the sidebarTrail rename through the visual render path. Verify after deploy: curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \ grep -c '"@type":"BreadcrumbList"' # expected: 1 (was 0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>

Copilot

Pull request overview

Addresses SEO audit findings by ensuring glossary doc pages always emit BreadcrumbList JSON-LD (even when sidebar breadcrumbs are unavailable) and by updating robots.txt to distinguish between AI “search/answer” bots vs training-only crawlers, plus blocking legacy doc versions.

Changes:

Emit BreadcrumbList schema with a safe fallback when useSidebarBreadcrumbs() returns null/undefined.
Update static/robots.txt with allow/disallow blocks for various AI bots and add legacy doc-version disallows.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
static/robots.txt	Introduces nuanced bot rules and attempts to block legacy doc versions from crawling.
src/theme/DocBreadcrumbs/index.js	Avoids early-return so JSON-LD breadcrumbs can be emitted even without sidebar breadcrumbs; limits visual breadcrumb UI to when a sidebar trail exists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T13:45:57Z

+# Block unmaintained legacy doc versions (already set via noindex + canonical,
+# belt-and-braces for crawlers that ignore those signals).
+Disallow: /docs/1.0.0/
+Disallow: /docs/2.0.0/
+Disallow: /docs/3.0.0/


The legacy-version Disallow: /docs/1.0.0/ (and 2.0.0/3.0.0) rules are only under User-agent: *, so they will not apply to crawlers that match one of the explicit allow groups above (e.g., PerplexityBot, Applebot, OAI-SearchBot). If the intent is to block those legacy versions for all crawlers, either move the legacy disallows into each explicit allow group (and keep Allow: /), or remove the explicit allow groups entirely and let those bots fall through to User-agent: * (while keeping explicit disallow groups for training bots).

Fixed in 758cff5. Went with option (a) but consolidated: the 11 AI-search-bot allow groups are now a single block that uses multiple User-agent: headers sharing one rule set, with the three legacy-version Disallow lines (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) applied directly inside it. Same intent ("allow these AI search bots everywhere except legacy versions") but now actually enforced, and only 8 lines of net change instead of 33.

Addressed in 758cff5 (the earlier commit that moved the legacy disallows inside the named allow group). The /docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/ lines now sit directly under the User-agent: OAI-SearchBot / ChatGPT-User / Claude-SearchBot / ... / Amazonbot block so every allowed AI bot gets the legacy-version block, not just crawlers that fall through to User-agent: *. 56cae9e just now extended the same pattern to Crawl-delay: 5 and Disallow: /cgi-bin/ per your other comment — both global rules are now duplicated inside the named group as well.

Per the robots.txt spec, a bot that matches a named User-agent group reads rules only from that group — it does not fall through to User-agent: *. So the Disallow: /docs/{1,2,3}.0.0/ lines under User-agent: * were silently inapplicable to PerplexityBot/Applebot/OAI-SearchBot/etc., meaning those bots were still crawling the unmaintained legacy versions despite the noindex/canonical/global block combo. Consolidate the 11 AI search bot allow groups into a single block using multiple User-agent headers, and add the three legacy-version Disallow lines inside it so the intent — "allow AI search bots everywhere except legacy versions" — is actually enforced. No semantic change to training bots, Bytespider, or the * fallback group. Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>

@type

…, Task 35 sitemap priorities Three docs-side fixes bundled. LIVE-12 — per-page og:title / twitter:title Previously every docs page rendered with og:title "Keploy Documentation" because the title inherited from docusaurus.config.js's site-level `title` field. Social share cards on LinkedIn / Slack / X therefore all showed the same generic headline regardless of which glossary / concept / quickstart page was shared. Fix: emit <meta property="og:title" content={title}> and <meta name="twitter:title" content={title}> in the swizzled DocItem Head component, pulling from the per-page metadata.title that the <title> tag already uses. Also adds og:description / twitter:description so preview cards carry the page-specific description. No site-level config change required. LIVE-13 — suppress Article schema on /docs/ landing and category indexes The /docs/ root was shipping Article JSON-LD even though it is a hub page with no single author, no single publication date, and no single headline — a type mismatch that AI models may flag as noise. Fix: compute `suppressArticleSchema` from permalink / frontmatter and short-circuit the articleSchema construction when the page is the /docs/ root or a category index. The DocBreadcrumbs JSON-LD continues to emit normally so hub pages still have navigation signal. Task 35 — differentiate docs sitemap priorities Original priority buckets only covered quickstart (0.8), concepts / keploy-explained (0.7), and keploy-cloud (0.6). Default was 0.5 for everything else including the high-value /docs/ root and running-keploy sections. New bucket structure in createSitemapItems: 1.0 /docs/ root (primary entry point) 0.9 /docs/quickstart/* (highest-intent user flow) 0.8 /docs/running-keploy/* (primary product docs) 0.7 /docs/concepts/*, /docs/keploy-explained/* 0.6 /docs/keploy-cloud/*, /docs/ci-cd/*, /docs/faq, /docs/troubleshooting 0.5 /docs/concepts/reference/glossary/* (long-tail, many pages) Added an explanatory comment inline so the next editor understands the bucket rationale. Verify after deploy: curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \ grep -oE 'og:title"[^>]*content="[^"]+"' # expected: "What is Idempotency in REST APIs? Complete Guide" curl -s https://keploy.io/docs/ | grep -c '"@type":"Article"' # expected: 0 curl -s https://keploy.io/docs/sitemap.xml | \ python3 -c "import sys,re; \ priorities = re.findall(r'<priority>([0-9.]+)</priority>', sys.stdin.read()); \ print('unique priorities:', sorted(set(priorities)))" # expected: ['0.5', '0.6', '0.7', '0.8', '0.9', '1.0'] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T14:49:37Z

+User-agent: Applebot
+User-agent: DuckAssistBot
+User-agent: Amazonbot
+Allow: /


The explicit AI-search User-agent group won’t inherit rules from User-agent: *, so those bots will ignore Crawl-delay: 5 and Disallow: /cgi-bin/. If the intent is to keep the same crawl-rate limit and global disallows for all allowed crawlers, duplicate those rules inside this named allow group as well (alongside the legacy-version disallows).

Suggested change

Allow: /

Allow: /

Crawl-delay: 5

Disallow: /cgi-bin/

Addressed in 56cae9e. Added Crawl-delay: 5 and Disallow: /cgi-bin/ inside the named AI-search User-agent group so the allowed bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) get the same rate-limit and global disallow as User-agent: *. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this group for the same inheritance reason — this extends that pattern to the two global rules you flagged.

Fixed in 56cae9e: Crawl-delay: 5 and Disallow: /cgi-bin/ are now mirrored inside the AI-search allow group alongside the legacy-version disallows, so the group is a proper superset of the User-agent: * defaults. Named AI search bots (Perplexity/Applebot/OAI-SearchBot/etc.) now see the same crawl-rate limit and /cgi-bin/ block as fall-through bots.

Copilot review caught that named User-agent groups in robots.txt do not inherit rules from User-agent: *. The AI-search allow group (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) was therefore ignoring both the global Crawl-delay: 5 limit AND the Disallow: /cgi-bin/ in the fallback User-agent: * block. Duplicated both lines into the named group so the same policy applies: search bots are rate-limited to 5s per request, and they cannot crawl /cgi-bin/. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this block for the same inheritance reason. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>

Live audit + competitor analysis (llms-full.txt for competitors already has comparison tables, but docs had none). Adds a dedicated comparison page under /docs/keploy-explained/keploy-vs-alternatives with: - Feature comparison matrix: Keploy vs Postman, Katalon, WireMock, Testcontainers across 9 capabilities (test generation model, SDK requirement, mock generation, non-determinism, secret masking, CI/CD, license, kernel version). - Approach differences: plain-language description of each tool's core abstraction so readers can self-sort. - When to pick each: decision tree by team profile / API shape. - Migration paths: concrete steps for moving from Postman or Katalon to Keploy without throwing away existing work. - Related reading cross-links to how-keploy-works, integration-testing-faq, api-testing-faq. Added to version-4.0.0 sidebar in the Integration Testing → keploy-vs- alternatives slot, placed between Troubleshooting Guide and FAQs so it appears in the decision phase of the reader journey. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T15:22:13Z

+| Capability | Keploy | Postman | Katalon | WireMock | Testcontainers |
+|---|---|---|---|---|---|
+| Test generation model | Auto from real traffic (eBPF capture) | Manual scripts | Manual + low-code | Manual + record/playback | Manual + real containers |
+| SDK / code changes required | None (kernel-level eBPF) | Newman CLI integration | Groovy scripts or record | Java SDK or standalone proxy | Java / Go / Node / Python SDK |


The Markdown table uses a double leading pipe (||) on the header and separator rows, which will render as an empty first column (and can break consistent styling). Use a single leading pipe (|) for standard GitHub/Docusaurus table syntax so columns align as intended.

False positive — the file has single leading pipes on every row. Verified with grep -n '||' versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md (no matches) and by inspecting the diff hunk Copilot attached to this comment: every table row starts with | followed by a space, not ||. The first column ("Capability") renders correctly in GitHub preview. No change required.

Copilot · 2026-04-14T15:22:13Z

+        {
+          "type": "doc",
+          "label": "Keploy vs Alternatives",
+          "id": "keploy-explained/keploy-vs-alternatives"
+        },


The PR title/description focuses on BreadcrumbList + robots, but this sidebar change introduces a new doc page in the v4 sidebar. Please either update the PR description/title to include this new documentation addition (and its intent), or split it into a separate PR to keep the audit fixes isolated.

Updated the PR title and description to match the actual scope. The PR is now titled "audit: BreadcrumbList, robots policy, og:title, sidebar + Keploy vs Alternatives" and the description has a dedicated Task 33 — Keploy vs Alternatives comparison page section explaining the new doc (feature matrix vs Postman/Katalon/WireMock/Testcontainers, approach differences, when-to-pick-each) and the sidebar entry under keploy-explained, with a link to commit 5de8526.

Went with "update description" rather than "split PR" because each audit concern is already in its own commit, so review granularity is preserved without the overhead of rebasing out one commit onto a new branch. If you'd still prefer a split, happy to do that — just let me know.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T16:09:37Z

+              if (
+                url.includes("/faq") ||
+                url.includes("/troubleshooting")
+              ) {
                return {...item, priority: 0.6, changefreq: "monthly"};
              }


The sitemap priority bucket for “/docs/faq” and “/docs/troubleshooting” won’t apply to the current v4 docs routes. The FAQ docs are at URLs like /docs/keploy-explained/integration-testing-faq/ (and api-testing-faq, unit-testing-faq), which don’t contain the substring /faq, and the “Troubleshooting Guide” is /docs/keploy-explained/common-errors/, which doesn’t contain /troubleshooting. As a result, these pages will fall into the /keploy-explained/ bucket (0.7) instead of the intended 0.6. Update the matching logic to reflect actual routes (e.g., match faq anywhere in the slug and common-errors, or base this on doc ids/tags), or update the comment/bucket list so it matches the implemented behavior.

Fixed in 0af8f3c. Changed the match patterns to the actual v4 URL fragments (-faq/, -faq, /common-errors) and moved the FAQ/troubleshooting check ABOVE the /keploy-explained/ check so it takes precedence. Now /docs/keploy-explained/integration-testing-faq/, api-testing-faq, unit-testing-faq, and common-errors all correctly land in the 0.6 reference-style bucket instead of the 0.7 concepts bucket. Header comment updated to name the actual pages covered.

Fixed in 0af8f3c: the FAQ + troubleshooting bucket now matches the actual v4 routes. The createSitemapItems handler matches url.includes("-faq/") || url.includes("-faq") || url.includes("/common-errors") before the broader /keploy-explained/ 0.7 bucket, so the three FAQ pages (integration-testing-faq, api-testing-faq, unit-testing-faq) and the Troubleshooting Guide at /docs/keploy-explained/common-errors/ correctly land in the 0.6 bucket. The match-first ordering is documented inline so future edits don't accidentally swap the rules and bury these matches under the keploy-explained fallback.

…h actual v4 routes Addresses Copilot review comment 3080865321 on docs PR #832. The previous sitemap createSitemapItems callback checked `url.includes("/faq")` and `url.includes("/troubleshooting")` — neither of those substrings appears in the actual v4 docs URLs. The FAQ pages live at: /docs/keploy-explained/integration-testing-faq/ /docs/keploy-explained/api-testing-faq/ /docs/keploy-explained/unit-testing-faq/ and the troubleshooting guide lives at: /docs/keploy-explained/common-errors/ (sidebar label: "Troubleshooting Guide"). Because `/faq` never matched, all three FAQ pages and the troubleshooting guide fell through to the `/keploy-explained/` rule immediately below and got priority 0.7, not the intended 0.6. Fix: 1. Changed the match patterns to `-faq/`, `-faq`, and `/common-errors` so they match the real URL fragments. 2. Moved the FAQ/troubleshooting check ABOVE the `/keploy-explained/` check so it takes precedence when a page satisfies both. 3. Updated the header comment block to name the actual pages covered by the 0.6 bucket. Signed-off-by: Neha Gupta <gneha21@yahoo.in>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>

nehagup · 2026-04-14T16:40:51Z

delete this versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md

Done in b56c813 — deleted the file.

nehagup · 2026-04-14T16:41:13Z

Done in b56c813 — removed the sidebar entry for keploy-explained/keploy-vs-alternatives.

nehagup · 2026-04-14T16:41:45Z

remove comments from all fines related to internal task ticket numbers.

Done in b56c813 — stripped Task 35, LIVE-12, LIVE-13, and LIVE-20 ticket references from comments across docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and src/theme/DocItem/index.js. Kept the explanatory comments that describe why each piece of logic exists, just without the internal ticket numbers.

nehagup · 2026-04-14T16:42:36Z

  const normalizedMetaKeywords = Array.isArray(metaKeywords)
    ? metaKeywords.join(", ")
    : metaKeywords;
+  // LIVE-13: suppress Article / BlogPosting / APIReference schema on the


why are we suppressing, how does this help?

Explaining rather than removing, since it's a functional fix — happy to revert if you still want it out.

What it does: On /docs/ and any /category/... index page, we skip emitting Article / BlogPosting / APIReference JSON-LD and emit only the BreadcrumbList schema.

Why: Article schema requires a single headline, single author, and single datePublished. A hub page (the docs landing, or a category index) is an index of many articles — it has no single author or publication date. Emitting Article JSON-LD on these pages is a structured-data type mismatch.

What breaks without it:

Google Rich Results Test flags /docs/ as invalid Article structured data (missing or conflicting fields). Invalid schema can cause Google to stop trusting all the schema on the domain, including the valid Article entries on leaf pages.

AI crawlers (Perplexity, ChatGPT Search) that cite "articles" prefer pages where the type matches — a hub incorrectly marked as Article gets cited with the wrong title/author combo in answer engines.

Scope: It only affects the docs root and category index pages. Every normal content page (/docs/running-keploy/cli-commands, /docs/keploy-explained/how-keploy-works, etc.) still emits full Article schema exactly as before — those pages have real authors, dates, and headlines.

If you want it removed anyway: say the word and I'll strip it. The cost is that /docs/ will fail Google's structured-data validation and /docs/category/* pages will emit Article schema with blank author/date fields. Up to you.

@nehagup

… refs Addresses 3 review comments from @nehagup on PR #832. 1. Delete versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md and remove its sidebar entry in version-4.0.0-sidebars.json. The page was added by the audit batch but the team decided not to ship a public alternatives comparison inside the docs; keeping it here would publish product-marketing framing under the docs subtree, which belongs on the landing site instead if anywhere. 2. Strip internal ticket identifiers from comments across docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and src/theme/DocItem/index.js. Internal audit references like "Task 35", "LIVE-12", "LIVE-13", "LIVE-20" rot as the task tracker evolves and leak private process detail into public source. Kept the explanatory comments that describe *why* each piece of logic exists, just without the ticket numbers. DocItem/index.js suppressArticleSchema logic and DocBreadcrumbs sidebarTrail fallback are left as-is — they are functional fixes that prevent invalid Article schema on hub pages and missing BreadcrumbList schema on glossary pages. Reply to @nehagup's "why are we suppressing" question posted separately on the comment thread. Signed-off-by: Neha Gupta <gneha21@yahoo.in>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T17:03:16Z

+  // root and any category index pages. Article schema on a hub page is
+  // a type mismatch because a hub does not have a single author, a
+  // single publication date, or a single headline — it is an index of
+  // content. Hub pages emit only the normal DocBreadcrumbs JSON-LD.
+  const permalink = metadata?.permalink || "";
+  const isDocsRoot =
+    permalink === "/docs/" ||
+    permalink === "/docs" ||
+    permalink.endsWith("/docs/index") ||
+    permalink.endsWith("/docs/");


suppressArticleSchema only treats /docs/ as the docs root. This repo also serves versioned roots like /docs/4.0.0/ (since onlyIncludeVersions includes 4.0.0 and includeCurrentVersion is true), and those hub pages can still emit Article/BlogPosting/APIReference JSON-LD. Consider extending isDocsRoot (or suppressArticleSchema) to also match versioned docs roots like /docs/<version>/ so hub pages across all served versions avoid type-mismatched Article schema.

Suggested change

// root and any category index pages. Article schema on a hub page is

// a type mismatch because a hub does not have a single author, a

// single publication date, or a single headline — it is an index of

// content. Hub pages emit only the normal DocBreadcrumbs JSON-LD.

const permalink = metadata?.permalink || "";

const isDocsRoot =

permalink === "/docs/" ||

permalink === "/docs" ||

permalink.endsWith("/docs/index") ||

permalink.endsWith("/docs/");

// root, versioned docs roots, and any category index pages. Article

// schema on a hub page is a type mismatch because a hub does not have

// a single author, a single publication date, or a single headline —

// it is an index of content. Hub pages emit only the normal

// DocBreadcrumbs JSON-LD.

const permalink = metadata?.permalink || "";

const isVersionedDocsRoot =

/^\/docs\/\d[\w.-]*(?:\/index)?\/?$/.test(permalink);

const isDocsRoot =

permalink === "/docs/" ||

permalink === "/docs" ||

permalink.endsWith("/docs/index") ||

permalink.endsWith("/docs/") ||

isVersionedDocsRoot;

Fixed in 3622c50. Added isVersionedDocsRoot = /^/docs/\d[\w.-]*(?:/index)?/?$/.test(permalink) and folded it into isDocsRoot so /docs/4.0.0/, /docs/3.0.0/, /docs/2.0.0/, and /docs/1.0.0/ all get Article/BlogPosting/APIReference suppression. Content pages inside versioned trees (for example /docs/4.0.0/keploy-explained/how-keploy-works/) still emit Article schema normally since they have real authors, dates, and headlines — only bare versioned roots are caught.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/theme/DocBreadcrumbs/index.js:102

When useSidebarBreadcrumbs() returns null/undefined, sidebarTrail becomes [], so the JSON-LD BreadcrumbList only contains the injected "Home" and "Docs" items and never includes the current page URL. This produces a BreadcrumbList that doesn’t actually represent the page’s breadcrumb trail (especially for glossary pages), and may still fail the intended SEO/structured-data signal. Consider always appending a final ListItem for the current pathname (and deriving a reasonable name, e.g., from doc metadata/title if available, otherwise from the last path segment).

  if (sidebarTrail.length > 0) {
    sidebarTrail.forEach((crumb, index) => {
      const isLast = index === sidebarTrail.length - 1;
      const href =
        crumb.type === "category" && crumb.linkUnlisted
          ? undefined
          : crumb.href || (isLast ? pathname : null);
      const absoluteUrl = toAbsoluteUrl(siteConfig?.url, href);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T18:42:59Z

+# Keploy docs robots.txt
+# Policy: allow AI search/answer engines, block training-only crawlers,
+# block Bytespider. Search bots drive visibility in ChatGPT, Claude,
+# Perplexity, Copilot, Gemini answers. Training bots feed future model
+# weights and provide nothing back.
+# Reference: Speedscale / Katalon / Testsigma split policy (2026 competitor audit)


PR description mentions adding a new “Keploy vs Alternatives” doc page and updating the v4 sidebar, but those artifacts don’t appear to be present in this change set (no keploy-vs-alternatives doc found and no sidebar entry references it). Either the description needs updating to reflect the actual changes in this PR, or the missing doc/sidebar changes need to be included so the PR matches its stated scope.

Fixed by updating the PR title and body via REST API. The title is now 'audit: BreadcrumbList, robots policy, og:title, sitemap priorities' and the Task 33 section describing the Keploy vs Alternatives page has been removed from the body. Added a trailing Note that explains the file and sidebar entry were created earlier in the branch and then removed in commit b56c813 per @nehagup's review feedback — product comparison framing belongs on the landing site, not the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.

Addresses Copilot comment 3081151415 on PR #832. The previous suppressArticleSchema check only matched /docs/ as the docs root, but this site serves versioned hub pages too — /docs/4.0.0/, /docs/3.0.0/, /docs/2.0.0/, /docs/1.0.0/ — via onlyIncludeVersions and includeCurrentVersion in docusaurus.config.js. Each versioned root is also an index of content with no single author/date/headline, so emitting Article/BlogPosting/APIReference schema on those pages had the same type-mismatch problem the base case fix was addressing. Added a regex check for /docs/<digit-starting-version>/ so any current or archived versioned root is caught by the same suppression path. Current content pages inside versioned trees (e.g. /docs/4.0.0/keploy-explained/how-keploy-works/) still emit Article schema as normal since they have real authors, dates, and headlines — only the bare versioned roots are suppressed. Signed-off-by: Neha Gupta <gneha21@yahoo.in>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nehagup and others added 2 commits April 14, 2026 18:56

Copilot AI review requested due to automatic review settings April 14, 2026 13:42

Copilot started reviewing on behalf of slayerjain April 14, 2026 13:43 View session