Skip to content

Add SQL histogram and date_histogram bucket functions#5514

Open
varun-st wants to merge 1 commit into
opensearch-project:mainfrom
varun-st:sql-histogram-bucket-functions
Open

Add SQL histogram and date_histogram bucket functions#5514
varun-st wants to merge 1 commit into
opensearch-project:mainfrom
varun-st:sql-histogram-bucket-functions

Conversation

@varun-st
Copy link
Copy Markdown

@varun-st varun-st commented Jun 4, 2026

Summary

Adds parse-time support for histogram and date_histogram in V2 SQL with named-argument invocation. Each call lowers to existing AST primitives (Span, COALESCE, DATE_FORMAT, TIMESTAMPADD).

Supported parameters

  • histogram: field, interval, offset, missing
  • date_histogram: field, interval / fixed_interval / calendar_interval, format, time_zone, missing

Deferred to follow-up PR

  • min_doc_count, order, alias — need parser-side plumbing to mutate the surrounding query (HAVING / ORDER BY / SELECT-list alias).
  • date_histogram offset — needs a duration-string parser distinct from time_zone's ZoneOffset format.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The time_zone parameter is validated at parse time as a ZoneOffset (e.g., '+05:30', 'Z'), but named time zones like 'America/New_York' are not supported. If users expect full time zone support (as the parameter name suggests), this will fail at parse time with a confusing error. The error message mentions "offset like '+05:30' or 'Z'" but does not clarify that named zones are unsupported.

private static UnresolvedExpression applyTimeZoneShift(
    UnresolvedExpression field, Literal timeZoneLiteral) {
  String tzString = timeZoneLiteral.getValue().toString();
  int offsetSeconds;
  try {
    offsetSeconds = ZoneOffset.of(tzString).getTotalSeconds();
  } catch (RuntimeException ex) {
    throw new SemanticCheckException(
        "time_zone must be a valid offset like '+05:30' or 'Z'; got '" + tzString + "'");
  }
Possible Issue

The original buildFunction method call was removed without verifying that all its logic is preserved. If buildFunction performed additional validation or transformation beyond constructing a Function object, that logic is now lost. This could break existing scalar function calls if they relied on that behavior.

public UnresolvedExpression visitScalarFunctionCall(ScalarFunctionCallContext ctx) {
  String functionName = ctx.scalarFunctionName().getText();
  List<UnresolvedExpression> args =
      ctx.functionArgs().functionArg().stream()
          .map(this::visitFunctionArg)
          .collect(Collectors.toList());

  Optional<BucketFunctionExpander> bucketExpander = BucketFunctionRegistry.lookup(functionName);
  if (bucketExpander.isPresent()) {
    return bucketExpander.get().expand(args);
  }

  return new Function(functionName, args);
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Catch specific exception type

The catch block catches all RuntimeException types, which may mask unexpected
errors. Consider catching only DateTimeException to specifically handle invalid
timezone offset parsing failures while allowing other runtime exceptions to
propagate.

sql/src/main/java/org/opensearch/sql/sql/parser/bucket/DateHistogramExpander.java [118-131]

 private static UnresolvedExpression applyTimeZoneShift(
     UnresolvedExpression field, Literal timeZoneLiteral) {
   String tzString = timeZoneLiteral.getValue().toString();
   int offsetSeconds;
   try {
     offsetSeconds = ZoneOffset.of(tzString).getTotalSeconds();
-  } catch (RuntimeException ex) {
+  } catch (DateTimeException ex) {
     throw new SemanticCheckException(
         "time_zone must be a valid offset like '+05:30' or 'Z'; got '" + tzString + "'");
   }
   return new Function(
       "timestampadd",
       List.of(AstDSL.stringLiteral("SECOND"), AstDSL.intLiteral(offsetSeconds), field));
 }
Suggestion importance[1-10]: 7

__

Why: Catching DateTimeException instead of RuntimeException is more precise and prevents masking unexpected errors. However, this is a minor improvement in exception handling that doesn't address a critical bug.

Medium

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant