Use java.text.BreakIterator in DefaultTextDoubleClickStrategy#3990
Use java.text.BreakIterator in DefaultTextDoubleClickStrategy#3990vogella wants to merge 2 commits into
Conversation
e9be22e to
40485b6
Compare
|
I'm looking forward to have icu entirely removed. |
|
I don't have a reference but the behavior is partially covered by unit tests which I extended. This was therefore not a drop in replacement. |
|
This pull request changes some projects for the first time in this development cycle. An additional commit containing all the necessary changes was pushed to the top of this PR's branch. To obtain these changes (for example if you want to push more changes) either fetch from your fork or apply the git patch. Git patchFurther information are available in Common Build Issues - Missing version increments. |
I'm not deeply familiar with that part of the code, but it might be that The ICU Of course I'm not sure how correct that is. But if it's true, we risk breaking the existing translation e.g. for Asian languages. |
|
The Copilot dump you pasted is generic and over-broad for this specific change.
So the JDK iterator is already the de-facto choice in the editor stack. After this PR, DefaultTextDoubleClickStrategy is the last com.ibm.icu.text.BreakIterator reference in org.eclipse.jface.text. Happy to add a CJK/Thai sample to the test to document the behavior if that helps. |
|
Also note the scope here: this change only affects double-click word selection |
Yes, of course. But my intend was to understand if it generally is a safe replacement or if one has to pay more attention.
Don't get me wrong, I'm glad when ICU is finally removed as dependency, but we should take care if that severely breaks existing users or not and should assert the risk and then decide if it's acceptable or not.
Sure, that would help. But I don't have one at hand, do you? If not I can ask around if somebody else can provide one. |
|
I will use these: こんにちは世界 Japanese "Hello, world" |
Replaces com.ibm.icu.text.BreakIterator with java.text.BreakIterator in DefaultTextDoubleClickStrategy and drops the matching Import-Package: com.ibm.icu.text from the bundle manifest. The JDK BreakIterator exposes the same API (getWordInstance, preceding, following, isBoundary, setText, DONE) and the existing POSIX-locale workaround for '.' not being treated as a word boundary continues to work with java.text. Removes the last com.ibm.icu reference from org.eclipse.jface.text.
|
The scope here is narrow: this only uses getWordInstance() for double-click word selection, and the new identifier fast-path means the locale-aware iterator is only a fallback for non-ASCII text. I added CJK/Thai tests to make it concrete. |
|
It's nice to see the added test cases! |
HannesWell
left a comment
There was a problem hiding this comment.
The scope here is narrow: this only uses getWordInstance() for double-click word selection, and the new identifier fast-path means the locale-aware iterator is only a fallback for non-ASCII text.
I added CJK/Thai tests to make it concrete.
Thanks for the update. Then this is probably fine or at least save enough from my POV.
| private static boolean isIdentifierPart(char c) { | ||
| return c == '_' || (c < 128 && (c >= '0' && c <= '9' || c >= 'A' && c <= 'Z' || c >= 'a' && c <= 'z')); | ||
| } |
There was a problem hiding this comment.
Is this related to Character.isJavaIdentifierPart(char) ? Or would it make sense to reuse it, although it's not identical atm.
Replaces
com.ibm.icu.text.BreakIteratorwithjava.text.BreakIteratorinDefaultTextDoubleClickStrategyand drops the matchingImport-Package: com.ibm.icu.textfrom the bundle manifest. The JDKBreakIteratorexposes the same API (getWordInstance,preceding,following,isBoundary,setText,DONE) and the existing POSIX-locale workaround for.not being treated as a word boundary continues to work. This removes the lastcom.ibm.icureference fromorg.eclipse.jface.text.Planned for 4.41