Skip to content

string performance optimisation#102

Open
sergeevik wants to merge 1 commit intodashjoin:mainfrom
sergeevik:performance-string
Open

string performance optimisation#102
sergeevik wants to merge 1 commit intodashjoin:mainfrom
sergeevik:performance-string

Conversation

@sergeevik
Copy link
Copy Markdown
Contributor

@sergeevik sergeevik commented Mar 24, 2026

string and regex optimisation

For string:

  • create string by ""+char created new every time. If use String.valueOf some string get from java cache (memmory optimisation)
  • replace string concationation in cycle on stringbuilder

For regex:

  • pattern compile one time and then use multiple times instead compile every time

JsonParser.java
replace readEscape logic with hex data (remove create string and parse it)

@sergeevik sergeevik force-pushed the performance-string branch 2 times, most recently from 9fb13e2 to 267a13d Compare March 24, 2026 23:12
@sergeevik sergeevik force-pushed the performance-string branch from 267a13d to f5a2474 Compare March 24, 2026 23:18
@aeberhart
Copy link
Copy Markdown
Contributor

Thanks for the PR @sergeevik. Have you run some benchmarks on this? I suspect that there won't be much impact since the JVM does a lot of these optimizations already.

@sergeevik
Copy link
Copy Markdown
Contributor Author

@aeberhart Benchmarks are a complex thing. I tried them on simple examples and stumbled upon many JVM optimizations (due to the simplicity of the scenarios).

This is more of a general practice (my experience).

Regarding strings and creating new strings by appending to chars, the JVM might optimize them, but there's no point in waiting for JIT to notice.

String appending in a loop is never optimized by JIT.

Regex compilation is useful because JIT doesn't do it itself. The JVM might cache some things, but not much.

@sergeevik
Copy link
Copy Markdown
Contributor Author

For example look at the Signature.validate method

jetbrains performance alanytics show memmory allocation
biggest on 'match.split' 5GB
image

deep into String split
image
4.8 GB by 'pattern.split'

deep into Pattern.split
image
3.94 on matcher invoke

deep into mather call
image
2.56 on compile invoke

But we split by empty string. wich mean just get char array. If replace this logic on Pattern.compile we save 50% RAM. If replace this by toCharArray save 100% RAM

less gc call

just example. I try collect more preformance check on my data.

@sergeevik
Copy link
Copy Markdown
Contributor Author

In my case. i try two version
first without my changes. max used RAM 5.5 GB
image

second with my changes. max used RAM 4.1
image

seems it's no diff on result used RAM when all data calculated but defferent use
and this less gc activity

But I'm starting to doubt the usefulness of using String.valueOf for char ;

    public static String valueOf(char c) {
        if (COMPACT_STRINGS && StringLatin1.canEncode(c)) {
            return new String(StringLatin1.toBytes(c), LATIN1);
        }
        return new String(StringUTF16.toBytes(c), UTF16);
    }

it seems like it doesn't create fewer objects. So, I'll make a couple of changes and run some more measurements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants