Migrate obsoleted editdistance to rapidfuzz by excosy · Pull Request #3105 · modelscope/FunASR

excosy · 2026-07-02T14:49:35Z

Summary

Since editdistance has been archived and cannot be newly installed on Windows, replace it with rapidfuzz.

Type of change

Validation

python -m compileall funasr examples tests
Docs or links checked
Runtime/deployment command tested

gemini-code-assist

Code Review

This pull request replaces the editdistance dependency with rapidfuzz to calculate Character Error Rate (CER) and Word Error Rate (WER) in funasr/metrics/common.py, updating setup.py accordingly. The review feedback suggests addressing potential ZeroDivisionError bugs in calculate_cer and calculate_wer when reference sequences are empty. Additionally, it recommends reordering imports in common.py to adhere to PEP 8 standards and specifying a minimum version constraint of rapidfuzz>=3.0.0 in setup.py to prevent import errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-02T14:51:36Z

        char_eds, char_ref_lens = [], []
        for i, seq_hat_text in enumerate(seqs_hat):
            seq_true_text = seqs_true[i]
            hyp_chars = seq_hat_text.replace(" ", "")
            ref_chars = seq_true_text.replace(" ", "")
-            char_eds.append(editdistance.eval(hyp_chars, ref_chars))
+            char_eds.append(Levenshtein.distance(hyp_chars, ref_chars))
            char_ref_lens.append(len(ref_chars))
        return float(sum(char_eds)) / sum(char_ref_lens)


If seqs_true contains only empty strings or strings with only spaces, char_ref_lens will sum to 0, leading to a ZeroDivisionError. Adding a guard check to ensure the reference length is greater than 0 before division prevents potential runtime crashes.

Suggested change

char_eds, char_ref_lens = [], []

for i, seq_hat_text in enumerate(seqs_hat):

seq_true_text = seqs_true[i]

hyp_chars = seq_hat_text.replace(" ", "")

ref_chars = seq_true_text.replace(" ", "")

char_eds.append(editdistance.eval(hyp_chars, ref_chars))

char_eds.append(Levenshtein.distance(hyp_chars, ref_chars))

char_ref_lens.append(len(ref_chars))

return float(sum(char_eds)) / sum(char_ref_lens)

char_eds, char_ref_lens = [], []

for i, seq_hat_text in enumerate(seqs_hat):

seq_true_text = seqs_true[i]

hyp_chars = seq_hat_text.replace(" ", "")

ref_chars = seq_true_text.replace(" ", "")

char_eds.append(Levenshtein.distance(hyp_chars, ref_chars))

char_ref_lens.append(len(ref_chars))

ref_len = sum(char_ref_lens)

return float(sum(char_eds)) / ref_len if ref_len > 0 else 0.0

gemini-code-assist · 2026-07-02T14:51:36Z

        word_eds, word_ref_lens = [], []
        for i, seq_hat_text in enumerate(seqs_hat):
            seq_true_text = seqs_true[i]
            hyp_words = seq_hat_text.split()
            ref_words = seq_true_text.split()
-            word_eds.append(editdistance.eval(hyp_words, ref_words))
+            word_eds.append(Levenshtein.distance(hyp_words, ref_words))
            word_ref_lens.append(len(ref_words))
        return float(sum(word_eds)) / sum(word_ref_lens)


If seqs_true contains only empty strings or strings with only spaces, word_ref_lens will sum to 0, leading to a ZeroDivisionError. Adding a guard check to ensure the reference length is greater than 0 before division prevents potential runtime crashes.

Suggested change

word_eds, word_ref_lens = [], []

for i, seq_hat_text in enumerate(seqs_hat):

seq_true_text = seqs_true[i]

hyp_words = seq_hat_text.split()

ref_words = seq_true_text.split()

word_eds.append(editdistance.eval(hyp_words, ref_words))

word_eds.append(Levenshtein.distance(hyp_words, ref_words))

word_ref_lens.append(len(ref_words))

return float(sum(word_eds)) / sum(word_ref_lens)

word_eds, word_ref_lens = [], []

for i, seq_hat_text in enumerate(seqs_hat):

seq_true_text = seqs_true[i]

hyp_words = seq_hat_text.split()

ref_words = seq_true_text.split()

word_eds.append(Levenshtein.distance(hyp_words, ref_words))

word_ref_lens.append(len(ref_words))

ref_len = sum(word_ref_lens)

return float(sum(word_eds)) / ref_len if ref_len > 0 else 0.0

gemini-code-assist · 2026-07-02T14:51:36Z

+from rapidfuzz.distance import Levenshtein
 from itertools import groupby


According to PEP 8, standard library imports (like itertools) should be grouped together and placed before third-party imports (like rapidfuzz). Please reorder these imports to adhere to PEP 8 guidelines.

Suggested change

from rapidfuzz.distance import Levenshtein

from itertools import groupby

from itertools import groupby

from rapidfuzz.distance import Levenshtein

References

PEP 8 states that imports should be grouped in the following order: 1. Standard library imports, 2. Related third party imports, 3. Local application/library specific imports. ^(link)

gemini-code-assist · 2026-07-02T14:51:36Z

        # Speaker & evaluation
        "umap_learn",
-        "editdistance>=0.5.2",
+        "rapidfuzz",


The rapidfuzz.distance module (specifically Levenshtein) was introduced in rapidfuzz version 2.0.0. To prevent potential ImportError issues on environments with older versions of rapidfuzz installed, please specify a minimum version constraint such as rapidfuzz>=3.0.0.

Suggested change

"rapidfuzz",

"rapidfuzz>=3.0.0",

gemini-code-assist · 2026-07-02T14:51:37Z

    # train: The modules invoked when training only.
    "train": [
-        "editdistance",
+        "rapidfuzz",


To remain consistent with the main requirements, please specify the same minimum version constraint for rapidfuzz in the train extra requirements.

Suggested change

"rapidfuzz",

"rapidfuzz>=3.0.0",

Migrate obsoleted editdistance to rapidfuzz

dad70eb

gemini-code-assist Bot reviewed Jul 2, 2026

View reviewed changes

Adopt gemini suggestions

71af8eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate obsoleted editdistance to rapidfuzz#3105

Migrate obsoleted editdistance to rapidfuzz#3105
excosy wants to merge 2 commits into
modelscope:mainfrom
excosy:main

excosy commented Jul 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		from rapidfuzz.distance import Levenshtein
		from itertools import groupby

Uh oh!

Conversation

excosy commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

excosy commented Jul 2, 2026 •

edited

Loading