Add parser_override to eliminate SEMANTIC prefix requirement#30
Add parser_override to eliminate SEMANTIC prefix requirement#30
Conversation
Use DuckDB's parser_override API (v1.5+) to intercept queries before the native parser, allowing AGGREGATE() to work in all statement types (SELECT, CTAS, INSERT...SELECT) without the SEMANTIC prefix. The existing parse_function fallback and yardstick_bind operator extension are kept for backwards compatibility. Also filters AGGREGATE() argument extraction to simple identifiers so DuckDB's built-in list aggregate() function is not intercepted.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dbb7c0afb0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (result.error) { | ||
| // Expansion failed: this might not be a yardstick AGGREGATE() call | ||
| // (e.g. DuckDB's built-in list aggregate function). Fall through to | ||
| // the native parser in case it can handle the query. | ||
| yardstick_free_aggregate_result(result); | ||
| return ParserOverrideResult(); |
There was a problem hiding this comment.
Keep AGGREGATE rewrite path for CTAS/INSERT errors
When yardstick_expand_aggregate returns an error, this branch always falls back to DuckDB's native parser, which leaves the original SQL unchanged. That breaks the new non-SEMANTIC path for statement types like CREATE TABLE ... AS SELECT and INSERT ... SELECT: expand_aggregate still relies on parse_select, and yardstick_parse_select explicitly rejects non-SELECT top-level statements (src/yardstick_parser_ffi.cpp around lines 1098-1100), so these queries are never rewritten and end up failing on DuckDB's built-in aggregate function instead of executing Yardstick semantics.
Useful? React with 👍 / 👎.
| // Only include calls WITHOUT AT modifier, and only when the argument | ||
| // is a simple identifier (measure name). This avoids intercepting | ||
| // DuckDB's built-in aggregate([list], 'fn') list function. | ||
| if modifiers.is_empty() && parse_simple_measure_ref(measure).is_some() { |
There was a problem hiding this comment.
Accept quoted measure names in aggregate extraction
The new simple-reference gate causes extract_all_aggregate_calls to skip valid SQL identifiers that require quoting (for example AGGREGATE("total revenue")), because parse_simple_measure_ref rejects spaces and other quoted-identifier characters. Before this change, plain AGGREGATE(...) calls were still extracted; now these quoted measure names are not rewritten and will fall through to DuckDB's unrelated aggregate function path.
Useful? React with 👍 / 👎.
Fix parse_simple_measure_ref to allow any characters inside matched
quotes (double-quotes, backticks, brackets) so AGGREGATE("col_name")
is correctly recognized as a measure reference. Add test for quoted
measure names.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cfd15a3a9f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/yardstick_extension.cpp
Outdated
| // Wrap in table function call and parse with DuckDB's native parser | ||
| string wrapper_sql = "SELECT * FROM yardstick('" + escaped_sql + "')"; | ||
|
|
There was a problem hiding this comment.
Preserve statement context when rewriting non-SELECT queries
This override always rewrites successful AGGREGATE() expansions into SELECT * FROM yardstick('...'), even when the original statement was CREATE TABLE ... AS SELECT or INSERT ... SELECT. That routes execution through YardstickQueryBind, which runs con.Query(...) on a new Connection during bind time, so side effects occur outside the caller's transaction/temp-schema context (e.g., rollback does not reliably undo the write). For non-SELECT statements, return parsed expanded_sql directly instead of wrapping it in the table function.
Useful? React with 👍 / 👎.
| // Only include calls WITHOUT AT modifier, and only when the argument | ||
| // is a simple identifier (measure name). This avoids intercepting | ||
| // DuckDB's built-in aggregate([list], 'fn') list function. | ||
| if modifiers.is_empty() && parse_simple_measure_ref(measure).is_some() { |
There was a problem hiding this comment.
Allow 3+ part qualified measure refs in AGGREGATE extraction
extract_all_aggregate_calls now drops plain AGGREGATE(...) calls unless parse_simple_measure_ref succeeds, but that parser only accepts one- or two-part identifiers. Valid SQL references like AGGREGATE(schema.view.measure) (or catalog-qualified variants) are therefore skipped, so the rewrite leaves AGGREGATE intact and the query falls through to DuckDB's native aggregate path instead of Yardstick semantics.
Useful? React with 👍 / 👎.
…ion context For CTAS and INSERT...SELECT, return the expanded SQL as parsed statements instead of wrapping in yardstick() table function. The table function executes via con.Query() on a fresh Connection, which runs outside the caller's transaction context. SELECT statements still use the table function wrapper for the second expansion pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bfafe8d1dc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Only include calls WITHOUT AT modifier, and only when the argument | ||
| // is a simple identifier (measure name). This avoids intercepting | ||
| // DuckDB's built-in aggregate([list], 'fn') list function. | ||
| if modifiers.is_empty() && parse_simple_measure_ref(measure).is_some() { |
There was a problem hiding this comment.
Accept spaced qualified refs when extracting AGGREGATE calls
The new simple-reference filter now rejects any whitespace outside quoted identifiers, so valid SQL like AGGREGATE(o . sumRevenue) (or AGGREGATE(o. sumRevenue)) is skipped by extract_all_aggregate_calls. When this happens, Yardstick leaves the call unreplaced and execution falls through to DuckDB's built-in aggregate function (which expects a list + function name), causing these previously-rewritable measure queries to fail.
Useful? React with 👍 / 👎.
parse_simple_measure_ref now accepts spaces around the dot in qualified identifiers (e.g. "s . revenue", "s. revenue") and trims parts before normalization. Add tests for spaced qualified refs.
Summary
Uses DuckDB's
parser_overrideAPI (v1.5+) to intercept all queries before the native parser runs, eliminating the need for theSEMANTICprefix onAGGREGATE()calls. This enablesAGGREGATE()in statement types the bind-time fallback couldn't handle (CTAS, INSERT...SELECT). The existingparse_functionfallback andyardstick_bindoperator extension are preserved for backwards compatibility.parser_overrideset tofallbackmode: yardstick gets first crack at every query, non-yardstick SQL falls through to the native parserAGGREGATE()argument extraction to simple identifiers so DuckDB's built-inaggregate([list], 'fn')list function is not interceptedSEMANTICprefix, plus CTAS/INSERT/list-aggregate compatibility tests (2204 total assertions)