Skip to content

Feat english braille rules#170

Open
owjs3901 wants to merge 13 commits into
mainfrom
feat-english-braille-rules
Open

Feat english braille rules#170
owjs3901 wants to merge 13 commits into
mainfrom
feat-english-braille-rules

Conversation

@owjs3901

Copy link
Copy Markdown
Contributor

No description provided.

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown
Contributor

Changepacks

owjs3901 added 5 commits June 19, 2026 20:41
Adds an opt-in Unified English Braille Grade-2 encoder, off by default so the stable Korean/Math pipeline is untouched (default build testcases unchanged at english 616 / korean 1527 / math 892).

english_ueb (base feature):
- §3 symbols, §6 numbers (incl. §6.3 numeric-mode continuation across digit separators), §7 punctuation, §8 capitals (§8.4 passages, §8.7 grade-1 shortform-collision guard)
- §2.6 standing-alone (§2.6.2 leading-punctuation asymmetry)
- §10.1-10.5 word/groupsigns, §10.7 initial-letter (rare-sequence safe subset only), §10.8 final-letter groupsigns (tion/ness/ment/ity/...)
- §4.2 accented letters (lowercase): accent indicator + base letter
- math-expression preflight (is_math_owned) so the math engine keeps ownership of sin/3ab/f(x-1)/log2/...

english_ueb_cmudict (optional feature, bundles CMUdict, Simplified BSD):
- §10.6 restricted be/con lower groupsigns via a pronunciation-based syllable-boundary classifier (become vs beckon, benefit vs beneficent)
- conservative: spells out on any ambiguity, never guesses

english testcases: 616 -> 1074 (base) / 1099 (cmudict). Pronunciation/morphology-dependent contractions (risky §10.7 set, §10.11 bridging) intentionally deferred rather than mis-generalized.
Adds the dollar sign as a UEB §3.10 currency sign (\$ -> ⠈⠎). A balanced \$...\$ LaTeX math span is kept with the math engine via is_math_owned, so only a lone/trailing \$ (currency: US\$, A\, \) reaches the English path.

english testcases +4 (base 1074 -> 1078); korean/math and the default build unchanged. Bare currency-only inputs (\, \) stay with the legacy path: a lone currency symbol is language-context ambiguous (Korean §65 wants the ⠴ prefix), so routing it to UEB is deferred to an explicit-mode pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants