Skip to content

[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515

Draft
RyanL1997 wants to merge 1 commit into
opensearch-project:mainfrom
RyanL1997:worktree-timestamp-format-pushdown-5481
Draft

[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515
RyanL1997 wants to merge 1 commit into
opensearch-project:mainfrom
RyanL1997:worktree-timestamp-format-pushdown-5481

Conversation

@RyanL1997
Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 commented Jun 4, 2026

Description

When a timestamp comparison is AND'd with a SARG-eligible sibling (IN, chained OR of equalities), Calcite's RexSimplify folds the sibling into a Sarg and re-types the timestamp literal during simplification. DATE_SUB(NOW(), INTERVAL 5 MINUTE) loses its EXPR_TIMESTAMP UDT and arrives at the predicate analyzer as VARCHAR.

That breaks the DSL emission downstream in SimpleQueryExpression:

  1. LiteralExpression.value() takes the isString() branch and returns the raw RexLiteral.stringValue() — the space-separated "2026-05-28 16:18:43" form, never normalized to ISO-8601.
  2. addFormatIfNecessary skips .format("date_time") because literal.isDateTime() reads false.

Both checks key off the literal's surviving UDT, which is unreliable after RexSimplify. The shard receives "2026-05-28 16:18:43" against the default strict_date_optional_time||epoch_millis parser and returns HTTP 500.

Fix

The five non-Sarg comparison paths in SimpleQueryExpression (equals, notEquals, gt, gte, lt, lte) now key off rel.isTimeStampType() — the field's type via NamedFieldExpression. This is the same defensive pattern the Sarg path in constructQueryExpressionForSearch (PredicateAnalyzer.java:794-796) already uses.

When the field is a timestamp:

  • The endpoint value is routed through timestampValueForPushDown to canonicalize to ISO-8601 regardless of the literal's surviving type.
  • The range query always carries format("date_time").

The non-timestamp-field branch is unchanged — termQuery / boolQuery behavior for keyword/text/numeric fields is preserved exactly.

Verification

End-to-end on a live local node with the index template and sample data from the issue:

Before:

"range":{"@timestamp":{"from":"2026-06-04 16:53:50",...}}    // no format, raw string
→ failed to parse date field [2026-06-04 16:53:50] with format [strict_date_optional_time||epoch_millis]
→ HTTP 500

After:

"range":{"@timestamp":{"from":"2026-06-04T17:14:01.000Z",..., "format":"date_time", ...}}
→ STATUS: OK — 2 rows returned (ERROR + WARN; INFO filtered out)

:opensearch:test passes with no changes to existing assertions; the existing date_time format assertions in PredicateAnalyzerTest.java continue to hold for the literal-typed-correctly path.

Related Issues

Resolves #5481

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

…earch-project#5481)

When a timestamp comparison is AND'd with a SARG-eligible sibling (e.g.
severityText IN ('ERROR','WARN') or a chained OR of equalities), Calcite's
RexSimplify folds the sibling into a Sarg and re-types the timestamp
literal — DATE_SUB(NOW(), INTERVAL 5 MINUTE) loses its EXPR_TIMESTAMP UDT
and arrives as VARCHAR.

That break the DSL emission downstream:
- LiteralExpression.value() takes the isString() branch and returns the
  raw RexLiteral.stringValue() — the space-separated "2026-05-28 16:18:43"
  form, never normalized to ISO-8601.
- addFormatIfNecessary skips .format("date_time") because
  literal.isDateTime() reads false.

Both checks read the literal's surviving UDT, which is unreliable after
RexSimplify. The shard receives "2026-05-28 16:18:43" against the default
strict_date_optional_time||epoch_millis parser and returns 500.

Fix the five non-Sarg comparison paths in SimpleQueryExpression (equals,
notEquals, gt, gte, lt, lte) to additionally check rel.isTimeStampType()
— the same defensive pattern the Sarg path in
constructQueryExpressionForSearch already uses. When the field is a
timestamp, the value is routed through timestampValueForPushDown to
canonicalize to ISO-8601 regardless of the literal's surviving type, and
the range query always carries format("date_time").

The non-timestamp-field branch is unchanged — termQuery / boolQuery
behavior for keyword/text/numeric fields is preserved exactly.

Verified end-to-end on a live node:
- Before: range emits "2026-06-04 16:53:50" with no format attr → shard
  rejects the parse, query returns HTTP 500.
- After:  range emits "2026-06-04T17:14:01.000Z" with format "date_time"
  → query returns the expected ERROR + WARN rows.

Resolves opensearch-project#5481

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Null Pointer Risk

isFieldOrLiteralDateTime accesses rel.isTimeStampType() when rel != null, but endpointValue calls literal.value().toString() at line 1478 without checking if literal.value() is null. If a literal with a null value reaches this path (field is timestamp, literal is not DateTime), toString() will throw NullPointerException. This can occur if Calcite produces a null-valued VARCHAR literal after type stripping.

private Object endpointValue(LiteralExpression literal, boolean isTimeStamp) {
  if (!isTimeStamp) {
    return literal.value();
  }
  if (literal.isDateTime()) {
    // literal.value() already normalizes for EXPR_TIMESTAMP / EXPR_DATE literals.
    return literal.value();
  }
  // Field is timestamp but literal was re-typed to VARCHAR — normalize the raw value.
  return timestampValueForPushDown(literal.value().toString());
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Timestamp range pushdown emits unparseable date format when AND'd with IN/OR clause

1 participant