hi_itn_electronic#437
Conversation
Signed-off-by: Mayuri S <mayuris@nvidia.com>
Signed-off-by: Mayuri S <mayuris@nvidia.com>
Signed-off-by: Mayuri S <mayuris@nvidia.com>
Signed-off-by: Mayuri S <mayuris@nvidia.com>
Signed-off-by: Mayuri S <mayuris@nvidia.com>
Signed-off-by: mayuris-00 <mayuris@nvidia.com>
Signed-off-by: Mayuri S <mayuris@nvidia.com>
3e004de to
01b2fc4
Compare
| @@ -0,0 +1,21 @@ | |||
| ग्लूकोज C6H12O6 | |||
There was a problem hiding this comment.
please refer to the changes we did with chemical formulas for TN. we do not want hardcoded formulas, only base elements hardcoded and rules that build on them instead. if this is not possible, it's better to state this as a limitation than to have only for certain elements.
| venv वेन्व | ||
| SAMPLE एस ए एम पी एल ई | ||
| hotmail हॉटमेल | ||
| ExpressScribeTranscriptionSoftware ई एक्स पी आर ई एस एस एस सी आर आई बी ई टी आर ए एन एस सी आर आई पी टी आई ओ एन एस ओ एफ टी डब्ल्यू ए आर ई |
There was a problem hiding this comment.
how is this a common word?
| hotmail हॉटमेल | ||
| ExpressScribeTranscriptionSoftware ई एक्स पी आर ई एस एस एस सी आर आई बी ई टी आर ए एन एस सी आर आई पी टी आई ओ एन एस ओ एफ टी डब्ल्यू ए आर ई | ||
| Phones पी एच ओ एन ई एस | ||
| TXR20820d90fb1d3327447009e701166f29 टी एक्स आर दो शून्य आठ दो शून्य डी नौ शून्य एफ बी एक डी तीन तीन दो सात चार चार सात शून्य शून्य नौ ई सात शून्य एक एक छह छह एफ दो नौ |
There was a problem hiding this comment.
how is this a common word?
| @@ -0,0 +1,10 @@ | |||
| 1 १ | |||
There was a problem hiding this comment.
are these digits any different from the ones available for cardinals?
| @@ -0,0 +1,10 @@ | |||
| एक 1 | |||
There was a problem hiding this comment.
are these digits any different from the ones available for cardinals?
| @@ -0,0 +1,54 @@ | |||
| a ए | |||
There was a problem hiding this comment.
instead of hardcoding upper and lower, let's use capitalize in script
| nic एन आई सी | ||
| sims सिम्स | ||
| pope पोप | ||
| Zoom ज़ेड ओ ओ एम |
There was a problem hiding this comment.
is it necessary to have overlap between domain and server name?
| sharda शारदा | ||
| universities यूनिवर्सिटीज़ | ||
| mcdonald मैक्डॉनल्ड | ||
| southmountaincc साउथ माउन्टेन सी सी |
There was a problem hiding this comment.
let's trim this list to only be the most common cases
| @@ -0,0 +1,24 @@ | |||
| ज़ेड एक्स आठ शून्य एक नौ आठ शून्य ZX80 1980 | |||
There was a problem hiding this comment.
let's have a serial class instead of hardcoding certain alphanumeric cases (you can look at the TN implementation for this)
| -> tokens { electronic { path: "/home/user/documents" } } | ||
| IP address: | ||
| e.g. एक नौ दो डॉट एक छह आठ डॉट एक डॉट एक | ||
| -> tokens { electronic { ip: "192.168.1.1" } } |
There was a problem hiding this comment.
please only use the tags defined in the semiotic classes proto
'ip' is not one of them
|
|
||
| special_codes_map = pynini.string_file(get_abs_path("data/electronic/special_codes.tsv")).optimize() | ||
|
|
||
| to_lower = pynini.cdrewrite( |
There was a problem hiding this comment.
please use
instead| ) | ||
| latin_run_lower = make_lower(latin_run) | ||
|
|
||
| _drive_chars = pynini.union("C", "D", "E", "F", "G", "H", "I", "J") |
There was a problem hiding this comment.
this should be a tsv instead of hardcoded here
| drive_letter = pynini.compose(letter_map_upper, _drive_chars) | ||
|
|
||
| def _backslash(): | ||
| return pynutil.delete("बैकवर्ड") + delete_space + pynutil.delete("स्लैश") + pynutil.insert("\\\\") |
There was a problem hiding this comment.
in general let's add all transformations in tsv files
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.