Skip to content

fix(docs): export clean selected-language markdown from Copy page button#460

Open
g-despot wants to merge 1 commit into
mainfrom
fix/copy-page-markdown-export
Open

fix(docs): export clean selected-language markdown from Copy page button#460
g-despot wants to merge 1 commit into
mainfrom
fix/copy-page-markdown-export

Conversation

@g-despot

Copy link
Copy Markdown
Contributor

Problem

The docs "Copy page" button copied page chrome and didn't produce real markdown. Reported by the team:

  • copies headers / other page content that shouldn't be included
  • the "markdown" output includes HTML tags / doesn't look right

Root cause: copyPageAsMarkdown scraped document.querySelector("article").innerText. The <article> wrapper includes breadcrumbs, the version badge, the mobile TOC, and the footer ("Edit this page", tags), and innerText is rendered text, not markdown (code fences, heading #, and link syntax are all lost).

Fix

Rewrote copyPageAsMarkdown to produce clean markdown:

  • Scope extraction to .theme-doc-markdown (the content body; page chrome is a sibling outside it).
  • Convert HTML → Markdown with turndown + turndown-plugin-gfm, dynamically imported inside the handler so the site still server-renders at build time (turndown touches the DOM).
  • Capture only the currently-selected code language. The custom code Tabs render every language into the DOM and hide the non-selected ones, so the copy strips [aria-hidden="true"].
  • Strip the remaining in-content chrome: default-tab label strips + inactive panels ([role="tablist"], [hidden]), decorative icons (img[alt=""]), the code-tab header controls (the .code container header — language selector, "API docs" link, "More info"), and Cloud/Academy badges.
  • Preserve Prism code blocks as fenced blocks with the correct language tag and line breaks.
  • Robustness: a new error copy state ("Copy failed"), a navigator.clipboard guard, and aria-live="polite" on the status label.

A reusable data-copy-exclude opt-out marker was added to CloudOnlyBadge and AcademyBadge so their text no longer leaks into the copied markdown.

Verification

  • yarn build passes (exit 0).
  • The turndown pipeline was validated against the real built DOM: page chrome excluded, selected-language-only confirmed by element counts (non-selected language code blocks drop to zero), code blocks fenced with language + line breaks, headings/links/lists correct, and Cloud/Academy badge text stripped.

Out of scope / follow-ups

  • The llms-txt plugin path was intentionally not used, to keep this change self-contained.
  • For a later PR: "View as markdown" builds a 404 when a page has no source and pins main (version drift); the prompts-variant copy doesn't use the new error state.

Files

  • src/components/ContextualMenu/index.js — the fix
  • src/components/CloudOnlyBadge/index.jsx, src/components/AcademyBadge/index.jsxdata-copy-exclude markers
  • package.json, yarn.lock — add turndown + turndown-plugin-gfm

The "Copy page" button scraped `article.innerText`, capturing page chrome
(breadcrumbs, version badge, mobile TOC, footer "Edit this page"/tags) and
emitting rendered text instead of markdown. Rewrite `copyPageAsMarkdown` to
scope to `.theme-doc-markdown` and convert with turndown (dynamically
imported for SSR safety), capturing only the currently-selected code language.

- Strip hidden language panels, default-tab chrome (role=tablist / hidden),
  decorative icons, the code-tab header controls, and Cloud/Academy badges
- Preserve Prism code blocks as fenced blocks with language + line breaks
- Add an `error` copy state, a clipboard guard, and aria-live on the status
- Add a `data-copy-exclude` opt-out marker (CloudOnlyBadge, AcademyBadge)

@orca-security-eu orca-security-eu Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant