feat(search): cache never-match queries via the predicate cache#6556
Open
PSeitz wants to merge 2 commits into
Open
feat(search): cache never-match queries via the predicate cache#6556PSeitz wants to merge 2 commits into
PSeitz wants to merge 2 commits into
Conversation
Repeated super-selective or non-existent term queries over many splits re-ran warmup on every request. When warmup proves a split empty (a required term has an empty posting list), record a fake empty entry in the existing predicate cache, and consult it before warmup so later requests with the same predicate short-circuit with no storage reads. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines
+634
to
+640
| fn negative_cache_key(query_ast: &QueryAst) -> Option<String> { | ||
| let inner = match query_ast { | ||
| QueryAst::Cache(cache_node) => cache_node.inner.as_ref(), | ||
| other => other, | ||
| }; | ||
| serde_json::to_string(inner).ok() | ||
| } |
Contributor
There was a problem hiding this comment.
i think it would be interesting for this to also extract Must/Filter arms in BoolQuery (and possibly lonely Should), if any "must" part of the query doesn't match, the whole query necessarily doesn't match
Collaborator
Author
There was a problem hiding this comment.
In it's current form it doesn't work as I want it yet. It doesn't match non-existent terms if they are part of a longer chain, e.g. "fielda:doesnotexist" and then "fielda:doesnotexist fieldb:asdf"
The negative cache keyed each provably-empty split on the whole query AST, so any added/removed filter or different time window produced a new key. In production this gave near-zero reuse: every filter permutation re-opened and re-probed all splits. A required term's absence in a split is immutable and independent of the rest of the query and of the time window. Key the cache on (split, term) instead: warmup reports each required term it proves absent via an `on_absent` callback, and a query short-circuits before warmup when any of its required terms is already known absent. Adding required terms can only make a query emptier, so cached absences keep pruning as filters change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Repeated super-selective or non-existent term queries over many splits re-ran warmup on every request.
When warmup proves a split empty (a required term has an empty posting list), record a fake empty entry in the existing predicate cache, and consult it before warmup so later requests with the same predicate short-circuit with no storage reads.