Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions src/languages/lib/correct-translation-content.ts
Original file line number Diff line number Diff line change
Expand Up @@ -884,6 +884,34 @@ export function correctTranslatedContentStrings(
}
}

// Unescape HTML entity-encoded tags (`&lt;tag&gt;` → `<tag>`) that Crowdin
// introduces when the English source uses inline raw HTML — e.g.
// `<code><a href="...">label</a></code>` inside table `<td>` cells.
// Without this fix, those tags render as literal `<code>` text on translated
// pages rather than as styled code elements.
// Only unescape tag names present as raw HTML in the English source to avoid
// incorrectly expanding intentional `&lt;` entity sequences.
if (englishContent && content.includes('&lt;')) {
const englishTagNames = new Set(
[...englishContent.matchAll(/<([a-z][a-z0-9]*)/gi)].map((m) => m[1].toLowerCase()),
)
if (englishTagNames.size > 0) {
content = content.replace(
/&lt;(\/?[a-z][a-z0-9]*)(\s[^<>]*?)?&gt;/gi,
(match, tag: string, attrs = '') => {
const baseName = tag.replace(/^\//, '').toLowerCase()
return englishTagNames.has(baseName) ? `<${tag}${attrs}>` : match
},
Comment on lines +899 to +904
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attrs capture in the entity-tag regex ((\s[^<>]*?)?) will stop at the first &gt; it sees. If an attribute value itself contains an encoded &gt;/&lt; (e.g. title="a &gt; b"), the match can terminate early and the replacement will corrupt the tag/content. Consider switching to an attribute pattern that respects quoted strings, or using an HTML parser to unescape tags safely.

Copilot uses AI. Check for mistakes.
)
Comment on lines +894 to +905
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unescapes and re-enables arbitrary attributes from translated content for any tag name found in the English source (including event-handler attributes like onload/onclick). Since the markdown pipeline allows dangerous HTML (rehype-raw) and the UI renders HTML via dangerouslySetInnerHTML, it would be safer to restrict which tags/attributes can be unescaped (for example, allowlist tags and allow only safe attributes such as href on <a>).

Copilot uses AI. Check for mistakes.
}
}

// Remove bare code-fence wrapping from bold heading lines. Translation pipelines
// sometimes wrap `**heading**` lines in bare (no-language) fenced code blocks,
// causing them to render as code instead of bold text. Strip the fences and
// restore the heading as plain Markdown.
content = content.replace(/^```\s*\n(\*\*[^\n]+\*\*)\s*\n```/gm, '$1')

// Collapsed Markdown table rows — restore linebreaks between `|` cells.
content = content.replaceAll(' | | ', ' |\n| ')

Expand Down
74 changes: 74 additions & 0 deletions src/languages/tests/correct-translation-content.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1355,6 +1355,80 @@ describe('correctTranslatedContentStrings', () => {
expect(fix('{{%raw %}', 'es')).toBe('{% raw %}')
expect(fix('{{% raw %}', 'es')).toBe('{% raw %}')
})

test('unescapes entity-encoded HTML tags when English source has matching raw HTML', () => {
const english =
'<td><code><a href="https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md">ubuntu-latest</a></code></td>'

expect(fix('&lt;code&gt;ubuntu-latest&lt;/code&gt;', 'ko', english)).toBe(
'<code>ubuntu-latest</code>',
)
expect(
fix('&lt;a href="https://example.com"&gt;ubuntu-latest&lt;/a&gt;', 'ko', english),
).toBe('<a href="https://example.com">ubuntu-latest</a>')
expect(
fix(
'&lt;code&gt;&lt;a href="https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md"&gt;ubuntu-latest&lt;/a&gt;&lt;/code&gt;',
'ko',
english,
),
).toBe(
'<code><a href="https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md">ubuntu-latest</a></code>',
)
})
Comment on lines +1359 to +1378
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new HTML-unescape behavior has good basic coverage, but it doesn't exercise attributes that contain encoded &lt;/&gt; sequences (which can break naive tag-matching regexes). Adding a test like &lt;a title="a &gt; b"&gt;...&lt;/a&gt; would help prevent regressions and validate the intended parsing behavior.

Copilot uses AI. Check for mistakes.

test('does not unescape entity-encoded tags absent from English source', () => {
const english = '<p>Simple paragraph without code elements</p>'
const input = '&lt;code&gt;text&lt;/code&gt;'
expect(fix(input, 'ko', english)).toBe(input)
})

test('does not unescape entity-encoded tags when no English content provided', () => {
const input = '&lt;code&gt;ubuntu-latest&lt;/code&gt;'
expect(fix(input, 'ko')).toBe(input)
})

test('removes bare code-fence wrapping from bold heading lines', () => {
const input = '```\n**다음은 작업을 다운로드하는 데 필요합니다.**\n```'
expect(fix(input, 'ko')).toBe('**다음은 작업을 다운로드하는 데 필요합니다.**')
})

test('removes bare code-fence wrapping from bold headings between real code blocks', () => {
const input = [
'```shell copy',
'github.com',
'api.github.com',
'```',
'',
'```',
'**다음은 작업을 다운로드하는 데 필요합니다.**',
'```',
'',
'```shell copy',
'codeload.github.com',
'```',
].join('\n')

const expected = [
'```shell copy',
'github.com',
'api.github.com',
'```',
'',
'**다음은 작업을 다운로드하는 데 필요합니다.**',
'',
'```shell copy',
'codeload.github.com',
'```',
].join('\n')

expect(fix(input, 'ko')).toBe(expected)
})

test('does not strip language-specified code fences with bold content', () => {
const input = '```shell\n**not a heading**\n```'
expect(fix(input, 'ko')).toBe(input)
})
})

// ─── EDGE CASES ────────────────────────────────────────────────────
Expand Down
Loading