perf: improve wildcard query perf with predicate and contains-check pushdown #397
perf: improve wildcard query perf with predicate and contains-check pushdown #397
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #397 +/- ##
==========================================
+ Coverage 71.28% 71.51% +0.22%
==========================================
Files 210 210
Lines 15579 15662 +83
==========================================
+ Hits 11105 11200 +95
+ Misses 3673 3663 -10
+ Partials 801 799 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| return b.Payload[offset : offset+l] | ||
| } | ||
|
|
||
| func (b *Block) FindContains(from, to int, needle []byte) ([]int, error) { |
There was a problem hiding this comment.
We've discussed that you can perform bytes.Contains on the block payload before checking each token individually. Have you measured performance of such optimization?
There was a problem hiding this comment.
We've discussed that you can perform bytes.Contains on the block payload before checking each token individually.
Yes, I tried calling bytes.Index on entire payload. It boosts even further comparing to this PR:
message:foobar
35 ms => 9 ms
However, this means that when bytes.Index returns and if we have some proper index returned, then we need to do a bin search on Offsets to find an index and then check for false positive. It also comes with neat property that we can avoid call Unpack (build offsets) lazily which boosts cold query performance (somewhat around extra 20%).
I put a task to the backlog, decided that it's too much for a single PR.
| } | ||
|
|
||
| func (b *Block) FindContains(from, to int, needle []byte) ([]int, error) { | ||
| indices := make([]int, 0) |
There was a problem hiding this comment.
I guess you could pass here slice of needles as well to handle queries like message:*foo*bar* with multiple needles. Or there is something that blocks such improvement?
There was a problem hiding this comment.
No, I think it's doable. Maybe will do
Description
Currently we spend only a fraction of time calling
bytes.Index. This PR partially addresses that.This PR pushes
pattern.SearchertoBlocklevel, so thatBlockis able to stream tokens through searcher. For ordinary wildcards like*error*there is directFindContainsmethod which is even faster.For example, query
message:*foobarf*:main: 86 ms
using
FindToken: 50 msusing
FindContains: 37 msSo, FindContains just throws out costly abstractions to get additional performance. We could also provide a dedicated func like FindSuffix, for example. This is a typical example when performance requires additional code.
trace_id:*foobark8s_pod:*6message:*err*message:*foo*message:*request*message:*foobar*foobar*message:*foobarfoobar*message:*very_very_message_aggregator_events*Next steps:
bytes.IndexoverBlockpayload - already shows good results