Skip to content

fix(chunking): preserve sentence order in NlpSentenceChunking#1910

Open
kuishou68 wants to merge 1 commit intounclecode:mainfrom
kuishou68:fix/nlp-sentence-chunking-order
Open

fix(chunking): preserve sentence order in NlpSentenceChunking#1910
kuishou68 wants to merge 1 commit intounclecode:mainfrom
kuishou68:fix/nlp-sentence-chunking-order

Conversation

@kuishou68
Copy link
Copy Markdown

Summary

Fixes #1909

Bug

NlpSentenceChunking.chunk() was returning list(set(sens)) which destroys the natural document order of sentences (Python sets are unordered) and incorrectly removes duplicate sentences.

Fix

Return sens directly — nltk.sent_tokenize() already returns sentences in document order.

Using list(set(sens)) destroys sentence order and incorrectly deduplicates.
Fix: return the sentences list directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: NlpSentenceChunking.chunk() uses list(set(sens)) which destroys sentence order

1 participant