audit: BreadcrumbList, robots policy, og:title, sitemap priorities#832
audit: BreadcrumbList, robots policy, og:title, sitemap priorities#832
Conversation
Adopt the Speedscale / Katalon / Testsigma split: - Allow AI SEARCH bots (drive answer visibility): OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot. - Block TRAINING-ONLY bots: GPTBot, ClaudeBot, anthropic-ai, CCBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent, FacebookBot, cohere-ai, Diffbot, Omgilibot, ImagesiftBot. - Keep Bytespider blocked. Also add belt-and-braces Disallow for unmaintained legacy doc versions (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) to reinforce existing noindex+canonical signals for crawlers that ignore them. Reopens Task 52 per user direction 2026-04-14. Mirrors the corresponding policy change on landing and blog-website robots.txt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>
LIVE-20. Live audit of /docs/concepts/reference/glossary/idempotency/ on 2026-04-14 showed @type schema blocks Article, ImageObject, Organization, WebPage — and no BreadcrumbList. The /docs/ root, by contrast, has BreadcrumbList. This is the specific glossary-page regression that Task 13 was meant to catch but didn't. Root cause: the DocBreadcrumbs component had an early `return null` when useSidebarBreadcrumbs() returned null/undefined, which suppresses both the visual breadcrumb UI AND the JSON-LD schema emission. For deep glossary pages whose sidebar context resolves to null, this meant zero BreadcrumbList — the regression. Changes: - Replace `if (!breadcrumbs) return null` with a safe fallback to an empty sidebarTrail array. Schema emission + Home/Docs items run unconditionally. - Only render the visual <nav> when sidebarTrail has entries (avoids showing an empty breadcrumb UI on schema-only pages). - Propagate the sidebarTrail rename through the visual render path. Verify after deploy: curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \ grep -c '"@type":"BreadcrumbList"' # expected: 1 (was 0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>
There was a problem hiding this comment.
Pull request overview
Addresses SEO audit findings by ensuring glossary doc pages always emit BreadcrumbList JSON-LD (even when sidebar breadcrumbs are unavailable) and by updating robots.txt to distinguish between AI “search/answer” bots vs training-only crawlers, plus blocking legacy doc versions.
Changes:
- Emit
BreadcrumbListschema with a safe fallback whenuseSidebarBreadcrumbs()returns null/undefined. - Update
static/robots.txtwith allow/disallow blocks for various AI bots and add legacy doc-version disallows.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| static/robots.txt | Introduces nuanced bot rules and attempts to block legacy doc versions from crawling. |
| src/theme/DocBreadcrumbs/index.js | Avoids early-return so JSON-LD breadcrumbs can be emitted even without sidebar breadcrumbs; limits visual breadcrumb UI to when a sidebar trail exists. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Block unmaintained legacy doc versions (already set via noindex + canonical, | ||
| # belt-and-braces for crawlers that ignore those signals). | ||
| Disallow: /docs/1.0.0/ | ||
| Disallow: /docs/2.0.0/ | ||
| Disallow: /docs/3.0.0/ |
There was a problem hiding this comment.
The legacy-version Disallow: /docs/1.0.0/ (and 2.0.0/3.0.0) rules are only under User-agent: *, so they will not apply to crawlers that match one of the explicit allow groups above (e.g., PerplexityBot, Applebot, OAI-SearchBot). If the intent is to block those legacy versions for all crawlers, either move the legacy disallows into each explicit allow group (and keep Allow: /), or remove the explicit allow groups entirely and let those bots fall through to User-agent: * (while keeping explicit disallow groups for training bots).
There was a problem hiding this comment.
Fixed in 758cff5. Went with option (a) but consolidated: the 11 AI-search-bot allow groups are now a single block that uses multiple User-agent: headers sharing one rule set, with the three legacy-version Disallow lines (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) applied directly inside it. Same intent ("allow these AI search bots everywhere except legacy versions") but now actually enforced, and only 8 lines of net change instead of 33.
There was a problem hiding this comment.
Addressed in 758cff5 (the earlier commit that moved the legacy disallows inside the named allow group). The /docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/ lines now sit directly under the User-agent: OAI-SearchBot / ChatGPT-User / Claude-SearchBot / ... / Amazonbot block so every allowed AI bot gets the legacy-version block, not just crawlers that fall through to User-agent: *. 56cae9e just now extended the same pattern to Crawl-delay: 5 and Disallow: /cgi-bin/ per your other comment — both global rules are now duplicated inside the named group as well.
Per the robots.txt spec, a bot that matches a named User-agent group reads
rules only from that group — it does not fall through to User-agent: *.
So the Disallow: /docs/{1,2,3}.0.0/ lines under User-agent: * were silently
inapplicable to PerplexityBot/Applebot/OAI-SearchBot/etc., meaning those
bots were still crawling the unmaintained legacy versions despite the
noindex/canonical/global block combo.
Consolidate the 11 AI search bot allow groups into a single block using
multiple User-agent headers, and add the three legacy-version Disallow
lines inside it so the intent — "allow AI search bots everywhere except
legacy versions" — is actually enforced. No semantic change to training
bots, Bytespider, or the * fallback group.
Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
…, Task 35 sitemap priorities
Three docs-side fixes bundled.
LIVE-12 — per-page og:title / twitter:title
Previously every docs page rendered with og:title "Keploy Documentation"
because the title inherited from docusaurus.config.js's site-level
`title` field. Social share cards on LinkedIn / Slack / X therefore
all showed the same generic headline regardless of which glossary /
concept / quickstart page was shared.
Fix: emit <meta property="og:title" content={title}> and
<meta name="twitter:title" content={title}> in the swizzled DocItem
Head component, pulling from the per-page metadata.title that the
<title> tag already uses. Also adds og:description / twitter:description
so preview cards carry the page-specific description. No site-level
config change required.
LIVE-13 — suppress Article schema on /docs/ landing and category indexes
The /docs/ root was shipping Article JSON-LD even though it is a hub
page with no single author, no single publication date, and no single
headline — a type mismatch that AI models may flag as noise.
Fix: compute `suppressArticleSchema` from permalink / frontmatter and
short-circuit the articleSchema construction when the page is the /docs/
root or a category index. The DocBreadcrumbs JSON-LD continues to emit
normally so hub pages still have navigation signal.
Task 35 — differentiate docs sitemap priorities
Original priority buckets only covered quickstart (0.8), concepts /
keploy-explained (0.7), and keploy-cloud (0.6). Default was 0.5 for
everything else including the high-value /docs/ root and running-keploy
sections.
New bucket structure in createSitemapItems:
1.0 /docs/ root (primary entry point)
0.9 /docs/quickstart/* (highest-intent user flow)
0.8 /docs/running-keploy/* (primary product docs)
0.7 /docs/concepts/*, /docs/keploy-explained/*
0.6 /docs/keploy-cloud/*, /docs/ci-cd/*, /docs/faq, /docs/troubleshooting
0.5 /docs/concepts/reference/glossary/* (long-tail, many pages)
Added an explanatory comment inline so the next editor understands
the bucket rationale.
Verify after deploy:
curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \
grep -oE 'og:title"[^>]*content="[^"]+"'
# expected: "What is Idempotency in REST APIs? Complete Guide"
curl -s https://keploy.io/docs/ | grep -c '"@type":"Article"'
# expected: 0
curl -s https://keploy.io/docs/sitemap.xml | \
python3 -c "import sys,re; \
priorities = re.findall(r'<priority>([0-9.]+)</priority>', sys.stdin.read()); \
print('unique priorities:', sorted(set(priorities)))"
# expected: ['0.5', '0.6', '0.7', '0.8', '0.9', '1.0']
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Neha Gupta <gneha21@yahoo.in>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| User-agent: Applebot | ||
| User-agent: DuckAssistBot | ||
| User-agent: Amazonbot | ||
| Allow: / |
There was a problem hiding this comment.
The explicit AI-search User-agent group won’t inherit rules from User-agent: *, so those bots will ignore Crawl-delay: 5 and Disallow: /cgi-bin/. If the intent is to keep the same crawl-rate limit and global disallows for all allowed crawlers, duplicate those rules inside this named allow group as well (alongside the legacy-version disallows).
| Allow: / | |
| Allow: / | |
| Crawl-delay: 5 | |
| Disallow: /cgi-bin/ |
There was a problem hiding this comment.
Addressed in 56cae9e. Added Crawl-delay: 5 and Disallow: /cgi-bin/ inside the named AI-search User-agent group so the allowed bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) get the same rate-limit and global disallow as User-agent: *. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this group for the same inheritance reason — this extends that pattern to the two global rules you flagged.
There was a problem hiding this comment.
Fixed in 56cae9e: Crawl-delay: 5 and Disallow: /cgi-bin/ are now mirrored inside the AI-search allow group alongside the legacy-version disallows, so the group is a proper superset of the User-agent: * defaults. Named AI search bots (Perplexity/Applebot/OAI-SearchBot/etc.) now see the same crawl-rate limit and /cgi-bin/ block as fall-through bots.
Copilot review caught that named User-agent groups in robots.txt do not inherit rules from User-agent: *. The AI-search allow group (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) was therefore ignoring both the global Crawl-delay: 5 limit AND the Disallow: /cgi-bin/ in the fallback User-agent: * block. Duplicated both lines into the named group so the same policy applies: search bots are rate-limited to 5s per request, and they cannot crawl /cgi-bin/. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this block for the same inheritance reason. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Live audit + competitor analysis (llms-full.txt for competitors already has comparison tables, but docs had none). Adds a dedicated comparison page under /docs/keploy-explained/keploy-vs-alternatives with: - Feature comparison matrix: Keploy vs Postman, Katalon, WireMock, Testcontainers across 9 capabilities (test generation model, SDK requirement, mock generation, non-determinism, secret masking, CI/CD, license, kernel version). - Approach differences: plain-language description of each tool's core abstraction so readers can self-sort. - When to pick each: decision tree by team profile / API shape. - Migration paths: concrete steps for moving from Postman or Katalon to Keploy without throwing away existing work. - Related reading cross-links to how-keploy-works, integration-testing-faq, api-testing-faq. Added to version-4.0.0 sidebar in the Integration Testing → keploy-vs- alternatives slot, placed between Troubleshooting Guide and FAQs so it appears in the decision phase of the reader journey. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Capability | Keploy | Postman | Katalon | WireMock | Testcontainers | | ||
| |---|---|---|---|---|---| | ||
| | Test generation model | Auto from real traffic (eBPF capture) | Manual scripts | Manual + low-code | Manual + record/playback | Manual + real containers | | ||
| | SDK / code changes required | None (kernel-level eBPF) | Newman CLI integration | Groovy scripts or record | Java SDK or standalone proxy | Java / Go / Node / Python SDK | |
There was a problem hiding this comment.
The Markdown table uses a double leading pipe (||) on the header and separator rows, which will render as an empty first column (and can break consistent styling). Use a single leading pipe (|) for standard GitHub/Docusaurus table syntax so columns align as intended.
There was a problem hiding this comment.
False positive — the file has single leading pipes on every row. Verified with grep -n '||' versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md (no matches) and by inspecting the diff hunk Copilot attached to this comment: every table row starts with | followed by a space, not ||. The first column ("Capability") renders correctly in GitHub preview. No change required.
| { | ||
| "type": "doc", | ||
| "label": "Keploy vs Alternatives", | ||
| "id": "keploy-explained/keploy-vs-alternatives" | ||
| }, |
There was a problem hiding this comment.
The PR title/description focuses on BreadcrumbList + robots, but this sidebar change introduces a new doc page in the v4 sidebar. Please either update the PR description/title to include this new documentation addition (and its intent), or split it into a separate PR to keep the audit fixes isolated.
There was a problem hiding this comment.
Updated the PR title and description to match the actual scope. The PR is now titled "audit: BreadcrumbList, robots policy, og:title, sidebar + Keploy vs Alternatives" and the description has a dedicated Task 33 — Keploy vs Alternatives comparison page section explaining the new doc (feature matrix vs Postman/Katalon/WireMock/Testcontainers, approach differences, when-to-pick-each) and the sidebar entry under keploy-explained, with a link to commit 5de8526.
Went with "update description" rather than "split PR" because each audit concern is already in its own commit, so review granularity is preserved without the overhead of rebasing out one commit onto a new branch. If you'd still prefer a split, happy to do that — just let me know.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ( | ||
| url.includes("/faq") || | ||
| url.includes("/troubleshooting") | ||
| ) { | ||
| return {...item, priority: 0.6, changefreq: "monthly"}; | ||
| } |
There was a problem hiding this comment.
The sitemap priority bucket for “/docs/faq” and “/docs/troubleshooting” won’t apply to the current v4 docs routes. The FAQ docs are at URLs like /docs/keploy-explained/integration-testing-faq/ (and api-testing-faq, unit-testing-faq), which don’t contain the substring /faq, and the “Troubleshooting Guide” is /docs/keploy-explained/common-errors/, which doesn’t contain /troubleshooting. As a result, these pages will fall into the /keploy-explained/ bucket (0.7) instead of the intended 0.6. Update the matching logic to reflect actual routes (e.g., match faq anywhere in the slug and common-errors, or base this on doc ids/tags), or update the comment/bucket list so it matches the implemented behavior.
There was a problem hiding this comment.
Fixed in 0af8f3c. Changed the match patterns to the actual v4 URL fragments (-faq/, -faq, /common-errors) and moved the FAQ/troubleshooting check ABOVE the /keploy-explained/ check so it takes precedence. Now /docs/keploy-explained/integration-testing-faq/, api-testing-faq, unit-testing-faq, and common-errors all correctly land in the 0.6 reference-style bucket instead of the 0.7 concepts bucket. Header comment updated to name the actual pages covered.
There was a problem hiding this comment.
Fixed in 0af8f3c: the FAQ + troubleshooting bucket now matches the actual v4 routes. The createSitemapItems handler matches url.includes("-faq/") || url.includes("-faq") || url.includes("/common-errors") before the broader /keploy-explained/ 0.7 bucket, so the three FAQ pages (integration-testing-faq, api-testing-faq, unit-testing-faq) and the Troubleshooting Guide at /docs/keploy-explained/common-errors/ correctly land in the 0.6 bucket. The match-first ordering is documented inline so future edits don't accidentally swap the rules and bury these matches under the keploy-explained fallback.
…h actual v4 routes Addresses Copilot review comment 3080865321 on docs PR #832. The previous sitemap createSitemapItems callback checked `url.includes("/faq")` and `url.includes("/troubleshooting")` — neither of those substrings appears in the actual v4 docs URLs. The FAQ pages live at: /docs/keploy-explained/integration-testing-faq/ /docs/keploy-explained/api-testing-faq/ /docs/keploy-explained/unit-testing-faq/ and the troubleshooting guide lives at: /docs/keploy-explained/common-errors/ (sidebar label: "Troubleshooting Guide"). Because `/faq` never matched, all three FAQ pages and the troubleshooting guide fell through to the `/keploy-explained/` rule immediately below and got priority 0.7, not the intended 0.6. Fix: 1. Changed the match patterns to `-faq/`, `-faq`, and `/common-errors` so they match the real URL fragments. 2. Moved the FAQ/troubleshooting check ABOVE the `/keploy-explained/` check so it takes precedence when a page satisfies both. 3. Updated the header comment block to name the actual pages covered by the 0.6 bucket. Signed-off-by: Neha Gupta <gneha21@yahoo.in>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
There was a problem hiding this comment.
delete this versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md
There was a problem hiding this comment.
Done in b56c813 — removed the sidebar entry for keploy-explained/keploy-vs-alternatives.
There was a problem hiding this comment.
remove comments from all fines related to internal task ticket numbers.
There was a problem hiding this comment.
Done in b56c813 — stripped Task 35, LIVE-12, LIVE-13, and LIVE-20 ticket references from comments across docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and src/theme/DocItem/index.js. Kept the explanatory comments that describe why each piece of logic exists, just without the internal ticket numbers.
| const normalizedMetaKeywords = Array.isArray(metaKeywords) | ||
| ? metaKeywords.join(", ") | ||
| : metaKeywords; | ||
| // LIVE-13: suppress Article / BlogPosting / APIReference schema on the |
There was a problem hiding this comment.
why are we suppressing, how does this help?
There was a problem hiding this comment.
Explaining rather than removing, since it's a functional fix — happy to revert if you still want it out.
What it does: On /docs/ and any /category/... index page, we skip emitting Article / BlogPosting / APIReference JSON-LD and emit only the BreadcrumbList schema.
Why: Article schema requires a single headline, single author, and single datePublished. A hub page (the docs landing, or a category index) is an index of many articles — it has no single author or publication date. Emitting Article JSON-LD on these pages is a structured-data type mismatch.
What breaks without it:
- Google Rich Results Test flags
/docs/as invalidArticlestructured data (missing or conflicting fields). Invalid schema can cause Google to stop trusting all the schema on the domain, including the valid Article entries on leaf pages. - AI crawlers (Perplexity, ChatGPT Search) that cite "articles" prefer pages where the type matches — a hub incorrectly marked as
Articlegets cited with the wrong title/author combo in answer engines.
Scope: It only affects the docs root and category index pages. Every normal content page (/docs/running-keploy/cli-commands, /docs/keploy-explained/how-keploy-works, etc.) still emits full Article schema exactly as before — those pages have real authors, dates, and headlines.
If you want it removed anyway: say the word and I'll strip it. The cost is that /docs/ will fail Google's structured-data validation and /docs/category/* pages will emit Article schema with blank author/date fields. Up to you.
… refs Addresses 3 review comments from @nehagup on PR #832. 1. Delete versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md and remove its sidebar entry in version-4.0.0-sidebars.json. The page was added by the audit batch but the team decided not to ship a public alternatives comparison inside the docs; keeping it here would publish product-marketing framing under the docs subtree, which belongs on the landing site instead if anywhere. 2. Strip internal ticket identifiers from comments across docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and src/theme/DocItem/index.js. Internal audit references like "Task 35", "LIVE-12", "LIVE-13", "LIVE-20" rot as the task tracker evolves and leak private process detail into public source. Kept the explanatory comments that describe *why* each piece of logic exists, just without the ticket numbers. DocItem/index.js suppressArticleSchema logic and DocBreadcrumbs sidebarTrail fallback are left as-is — they are functional fixes that prevent invalid Article schema on hub pages and missing BreadcrumbList schema on glossary pages. Reply to @nehagup's "why are we suppressing" question posted separately on the comment thread. Signed-off-by: Neha Gupta <gneha21@yahoo.in>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // root and any category index pages. Article schema on a hub page is | ||
| // a type mismatch because a hub does not have a single author, a | ||
| // single publication date, or a single headline — it is an index of | ||
| // content. Hub pages emit only the normal DocBreadcrumbs JSON-LD. | ||
| const permalink = metadata?.permalink || ""; | ||
| const isDocsRoot = | ||
| permalink === "/docs/" || | ||
| permalink === "/docs" || | ||
| permalink.endsWith("/docs/index") || | ||
| permalink.endsWith("/docs/"); |
There was a problem hiding this comment.
suppressArticleSchema only treats /docs/ as the docs root. This repo also serves versioned roots like /docs/4.0.0/ (since onlyIncludeVersions includes 4.0.0 and includeCurrentVersion is true), and those hub pages can still emit Article/BlogPosting/APIReference JSON-LD. Consider extending isDocsRoot (or suppressArticleSchema) to also match versioned docs roots like /docs/<version>/ so hub pages across all served versions avoid type-mismatched Article schema.
| // root and any category index pages. Article schema on a hub page is | |
| // a type mismatch because a hub does not have a single author, a | |
| // single publication date, or a single headline — it is an index of | |
| // content. Hub pages emit only the normal DocBreadcrumbs JSON-LD. | |
| const permalink = metadata?.permalink || ""; | |
| const isDocsRoot = | |
| permalink === "/docs/" || | |
| permalink === "/docs" || | |
| permalink.endsWith("/docs/index") || | |
| permalink.endsWith("/docs/"); | |
| // root, versioned docs roots, and any category index pages. Article | |
| // schema on a hub page is a type mismatch because a hub does not have | |
| // a single author, a single publication date, or a single headline — | |
| // it is an index of content. Hub pages emit only the normal | |
| // DocBreadcrumbs JSON-LD. | |
| const permalink = metadata?.permalink || ""; | |
| const isVersionedDocsRoot = | |
| /^\/docs\/\d[\w.-]*(?:\/index)?\/?$/.test(permalink); | |
| const isDocsRoot = | |
| permalink === "/docs/" || | |
| permalink === "/docs" || | |
| permalink.endsWith("/docs/index") || | |
| permalink.endsWith("/docs/") || | |
| isVersionedDocsRoot; |
There was a problem hiding this comment.
Fixed in 3622c50. Added isVersionedDocsRoot = /^/docs/\d[\w.-]*(?:/index)?/?$/.test(permalink) and folded it into isDocsRoot so /docs/4.0.0/, /docs/3.0.0/, /docs/2.0.0/, and /docs/1.0.0/ all get Article/BlogPosting/APIReference suppression. Content pages inside versioned trees (for example /docs/4.0.0/keploy-explained/how-keploy-works/) still emit Article schema normally since they have real authors, dates, and headlines — only bare versioned roots are caught.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/theme/DocBreadcrumbs/index.js:102
- When
useSidebarBreadcrumbs()returns null/undefined,sidebarTrailbecomes[], so the JSON-LD BreadcrumbList only contains the injected "Home" and "Docs" items and never includes the current page URL. This produces a BreadcrumbList that doesn’t actually represent the page’s breadcrumb trail (especially for glossary pages), and may still fail the intended SEO/structured-data signal. Consider always appending a final ListItem for the currentpathname(and deriving a reasonablename, e.g., from doc metadata/title if available, otherwise from the last path segment).
if (sidebarTrail.length > 0) {
sidebarTrail.forEach((crumb, index) => {
const isLast = index === sidebarTrail.length - 1;
const href =
crumb.type === "category" && crumb.linkUnlisted
? undefined
: crumb.href || (isLast ? pathname : null);
const absoluteUrl = toAbsoluteUrl(siteConfig?.url, href);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Keploy docs robots.txt | ||
| # Policy: allow AI search/answer engines, block training-only crawlers, | ||
| # block Bytespider. Search bots drive visibility in ChatGPT, Claude, | ||
| # Perplexity, Copilot, Gemini answers. Training bots feed future model | ||
| # weights and provide nothing back. | ||
| # Reference: Speedscale / Katalon / Testsigma split policy (2026 competitor audit) |
There was a problem hiding this comment.
PR description mentions adding a new “Keploy vs Alternatives” doc page and updating the v4 sidebar, but those artifacts don’t appear to be present in this change set (no keploy-vs-alternatives doc found and no sidebar entry references it). Either the description needs updating to reflect the actual changes in this PR, or the missing doc/sidebar changes need to be included so the PR matches its stated scope.
There was a problem hiding this comment.
Fixed by updating the PR title and body via REST API. The title is now 'audit: BreadcrumbList, robots policy, og:title, sitemap priorities' and the Task 33 section describing the Keploy vs Alternatives page has been removed from the body. Added a trailing Note that explains the file and sidebar entry were created earlier in the branch and then removed in commit b56c813 per @nehagup's review feedback — product comparison framing belongs on the landing site, not the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.
Addresses Copilot comment 3081151415 on PR #832. The previous suppressArticleSchema check only matched /docs/ as the docs root, but this site serves versioned hub pages too — /docs/4.0.0/, /docs/3.0.0/, /docs/2.0.0/, /docs/1.0.0/ — via onlyIncludeVersions and includeCurrentVersion in docusaurus.config.js. Each versioned root is also an index of content with no single author/date/headline, so emitting Article/BlogPosting/APIReference schema on those pages had the same type-mismatch problem the base case fix was addressing. Added a regex check for /docs/<digit-starting-version>/ so any current or archived versioned root is caught by the same suppression path. Current content pages inside versioned trees (e.g. /docs/4.0.0/keploy-explained/how-keploy-works/) still emit Article schema as normal since they have real authors, dates, and headlines — only the bare versioned roots are suppressed. Signed-off-by: Neha Gupta <gneha21@yahoo.in>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Batched fixes from the 2026-04-14 live-site + Copilot audit. Originally scoped to BreadcrumbList + robots, expanded to cover the full audit pass — each concern is a separate commit so history stays reviewable.
BreadcrumbList schema missing on docs glossary pages
Glossary pages like
/docs/concepts/reference/glossary/idempotency/were emitting Article/ImageObject/Organization/WebPage schemas but no BreadcrumbList. Root cause:DocBreadcrumbstheme component had an early return when the sidebar trail was null (which glossary pages have). Fix: emit a fallback BreadcrumbList even when the sidebar-derived trail is unavailable.Nuanced AI bot robots.txt policy
Allow AI search / answer engines (Perplexity, ChatGPT-User, Claude-SearchBot, Gemini-Deep-Research, Applebot, etc.) to crawl everywhere, block training-only bots (GPTBot, ClaudeBot, CCBot, Google-Extended, etc.), keep Bytespider blocked. The legacy-version
Disallow: /docs/{1,2,3}.0.0/lines are applied inside the AI search group as well (robots.txt named groups don't inherit fromUser-agent: *), and Crawl-delay + /cgi-bin are mirrored too so the group is a proper superset of the defaults.Per-page og:title + hub Article schema suppression + sitemap priorities
/docs/root, versioned roots like/docs/4.0.0/, and category index pages where it was incorrectly applied (hub pages have no single author/date/headline)-faq/and/common-errors/to reflect actual v4 routes)Test plan
Note: An earlier iteration of this PR added a "Keploy vs Alternatives" comparison doc at /docs/keploy-explained/keploy-vs-alternatives/ plus a v4 sidebar entry for it. That file and sidebar entry were removed in commit b56c813 per reviewer feedback — product comparison framing belongs on the landing site, not under the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.
🤖 Generated with Claude Code