[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515
Draft
RyanL1997 wants to merge 1 commit into
Draft
[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515RyanL1997 wants to merge 1 commit into
RyanL1997 wants to merge 1 commit into
Conversation
…earch-project#5481) When a timestamp comparison is AND'd with a SARG-eligible sibling (e.g. severityText IN ('ERROR','WARN') or a chained OR of equalities), Calcite's RexSimplify folds the sibling into a Sarg and re-types the timestamp literal — DATE_SUB(NOW(), INTERVAL 5 MINUTE) loses its EXPR_TIMESTAMP UDT and arrives as VARCHAR. That break the DSL emission downstream: - LiteralExpression.value() takes the isString() branch and returns the raw RexLiteral.stringValue() — the space-separated "2026-05-28 16:18:43" form, never normalized to ISO-8601. - addFormatIfNecessary skips .format("date_time") because literal.isDateTime() reads false. Both checks read the literal's surviving UDT, which is unreliable after RexSimplify. The shard receives "2026-05-28 16:18:43" against the default strict_date_optional_time||epoch_millis parser and returns 500. Fix the five non-Sarg comparison paths in SimpleQueryExpression (equals, notEquals, gt, gte, lt, lte) to additionally check rel.isTimeStampType() — the same defensive pattern the Sarg path in constructQueryExpressionForSearch already uses. When the field is a timestamp, the value is routed through timestampValueForPushDown to canonicalize to ISO-8601 regardless of the literal's surviving type, and the range query always carries format("date_time"). The non-timestamp-field branch is unchanged — termQuery / boolQuery behavior for keyword/text/numeric fields is preserved exactly. Verified end-to-end on a live node: - Before: range emits "2026-06-04 16:53:50" with no format attr → shard rejects the parse, query returns HTTP 500. - After: range emits "2026-06-04T17:14:01.000Z" with format "date_time" → query returns the expected ERROR + WARN rows. Resolves opensearch-project#5481 Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Contributor
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When a timestamp comparison is AND'd with a SARG-eligible sibling (
IN, chainedORof equalities), Calcite'sRexSimplifyfolds the sibling into aSargand re-types the timestamp literal during simplification.DATE_SUB(NOW(), INTERVAL 5 MINUTE)loses itsEXPR_TIMESTAMPUDT and arrives at the predicate analyzer asVARCHAR.That breaks the DSL emission downstream in
SimpleQueryExpression:LiteralExpression.value()takes theisString()branch and returns the rawRexLiteral.stringValue()— the space-separated"2026-05-28 16:18:43"form, never normalized to ISO-8601.addFormatIfNecessaryskips.format("date_time")becauseliteral.isDateTime()readsfalse.Both checks key off the literal's surviving UDT, which is unreliable after
RexSimplify. The shard receives"2026-05-28 16:18:43"against the defaultstrict_date_optional_time||epoch_millisparser and returns HTTP 500.Fix
The five non-Sarg comparison paths in
SimpleQueryExpression(equals,notEquals,gt,gte,lt,lte) now key offrel.isTimeStampType()— the field's type viaNamedFieldExpression. This is the same defensive pattern the Sarg path inconstructQueryExpressionForSearch(PredicateAnalyzer.java:794-796) already uses.When the field is a timestamp:
timestampValueForPushDownto canonicalize to ISO-8601 regardless of the literal's surviving type.format("date_time").The non-timestamp-field branch is unchanged —
termQuery/boolQuerybehavior for keyword/text/numeric fields is preserved exactly.Verification
End-to-end on a live local node with the index template and sample data from the issue:
Before:
After:
:opensearch:testpasses with no changes to existing assertions; the existingdate_timeformat assertions inPredicateAnalyzerTest.javacontinue to hold for the literal-typed-correctly path.Related Issues
Resolves #5481
Check List
--signoffor-s.