Add hi_en Code Switched by RajanPutty · Pull Request #415 · NVIDIA/NeMo-text-processing

RajanPutty · 2026-04-17T14:20:35Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Before your PR is "Ready for review"

Pre checks:

PR Type:

New Feature
Bugfix
Documentation
Test

If you haven't finished some of the above items you can still open "Draft" PR.

Signed-off-by: RajanPutty <rputty@nvidia.com>

for more information, see https://pre-commit.ci

mgrafu · 2026-04-27T16:51:14Z

@@ -0,0 +1,416 @@
+10K	ten k


do we need the data files here or can we just refer to the ones in the original languages? is there a difference between the two?

mgrafu · 2026-04-27T16:51:31Z

@@ -0,0 +1,7 @@
+१/४	पाव


same question as for en_whitelist

mgrafu · 2026-04-27T17:13:53Z

            from nemo_text_processing.inverse_text_normalization.he.verbalizers.verbalize_final import (
                VerbalizeFinalFst,
            )
-        elif lang == 'ko':  # Korean


let's rebase so we don't delete existing languages

mgrafu · 2026-04-27T17:14:51Z

    parser.add_argument(
        "--lang",
        help="language",
-        choices=["ar", "de", "en", "es", "es_en", "fr", "hi", "hy", "ko", "mr", "pt", "ru", "sv", "vi", "zh", 'ja'],


this should also get resolved with rebasing

mgrafu · 2026-04-27T17:15:22Z

            'hy',
            'mr',
            'ja',
-            'ko',


this should also get resolved with rebasing

mgrafu · 2026-04-27T17:15:57Z

@@ -0,0 +1,30 @@
+दिल्ली एक एक शून्य शून्य शून्य एक~दिल्ली ११०००१


let's add English address test cases too

mgrafu · 2026-04-27T17:17:53Z

@@ -0,0 +1,39 @@
+आठ बटा तीन~८/३


let's add English test cases too

mgrafu · 2026-04-27T17:22:38Z

            'mr',
            'ja',
            'rw',
-            'ko',


this should also get resolved with rebasing

mgrafu · 2026-04-27T17:22:57Z

            ClassifyFst as TNClassifyFst,
        )
        from nemo_text_processing.text_normalization.rw.verbalizers.verbalize import VerbalizeFst as TNVerbalizeFst
-    elif args.language == 'ko':


this should also get resolved with rebasing

RajanPutty and others added 2 commits April 17, 2026 19:47

Add hi_en Code Switched

5e42e2b

Signed-off-by: RajanPutty <rputty@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f735184

for more information, see https://pre-commit.ci

RajanPutty marked this pull request as ready for review April 17, 2026 14:56

mgrafu reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hi_en Code Switched#415

Add hi_en Code Switched#415
RajanPutty wants to merge 2 commits intoNVIDIA:staging/hi_en_itn_codeswitchedfrom
RajanPutty:hi_en_itn_codeswitched

RajanPutty commented Apr 17, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

mgrafu Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,30 @@
		दिल्ली एक एक शून्य शून्य शून्य एक~दिल्ली ११०००१

Conversation

RajanPutty commented Apr 17, 2026

What does this PR do ?

Before your PR is "Ready for review"

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants