Skip to content

Allow safe Cypher reserved keywords in alias positions (#2355)#2415

Merged
jrgemignani merged 1 commit intoapache:masterfrom
crprashant:fix/2355-reserved-keywords-as-aliases
Apr 29, 2026
Merged

Allow safe Cypher reserved keywords in alias positions (#2355)#2415
jrgemignani merged 1 commit intoapache:masterfrom
crprashant:fix/2355-reserved-keywords-as-aliases

Conversation

@crprashant
Copy link
Copy Markdown
Contributor

@crprashant crprashant commented Apr 27, 2026

PR: Allow safe Cypher reserved keywords in alias positions (#2355)

Branch: crprashant:fix/2355-reserved-keywords-as-aliasesapache:master
Open URL: https://github.com/crprashant/age/pull/new/fix/2355-reserved-keywords-as-aliases
Closes: #2355


Summary

RETURN 1 AS count (and equivalents using any of 49 non-conflicting reserved
keywords) failed with syntax error at or near "count". This PR fixes it by
introducing a tightly-scoped var_name_alias grammar rule used only at the
alias-binding sites of RETURN, WITH, YIELD, and UNWIND. The set of
keywords newly accepted as aliases is exactly safe_keywords — the same set
already accepted in func_name and schema_name.

Reproducer (from the issue)

LOAD 'age';
SET search_path = ag_catalog;
SELECT * FROM create_graph('g');
SELECT * FROM cypher('g', $$ RETURN 1 AS count $$) AS (a agtype);
Before After
Result ERROR: syntax error at or near "count" 1 (1 row)

Root cause

In src/backend/parser/cypher_gram.y the productions

yield_item   : expr AS var_name
return_item  : expr AS var_name
unwind       : UNWIND expr AS var_name

referenced var_name, which expands only to symbolic_name (i.e. plain
IDENTIFIER). Reserved keywords — even those that pose no parsing ambiguity —
were therefore rejected as aliases, although schema_name already includes
reserved_keyword and func_name already includes safe_keywords.

Fix

A new non-terminal var_name_alias is introduced and used only in the three
alias-binding sites above:

var_name_alias:
    var_name
    | safe_keywords
    {
        $$ = pstrdup((char *) $1);
    }
    ;

var_name itself is not broadened, intentionally. Allowing
safe_keywords everywhere var_name is used produces 156 shift/reduce
conflicts
in bison (verified locally) because keyword tokens collide with
their syntactic roles inside expressions and patterns. Restricting the
broadening to AS-bound alias positions removes that ambiguity entirely —
the build is conflict-free.

conflicted_keywords (END, NULL, TRUE, FALSE) remain rejected
everywhere; they are genuinely ambiguous with literal / CASE productions.

Behavior matrix

Form Before After Why
RETURN 1 AS count error ✅ ok count ∈ safe_keywords
RETURN 1 AS exists / coalesce / match / where / order / limit / distinct / optional / detach / contains / starts / ends / in / is / not / yield / call / ... error ✅ ok safe_keywords
RETURN 1 AS count, 2 AS exists, 3 AS where error ✅ ok multiple aliases
WITH 1 AS count RETURN 1 AS x error ✅ ok WITH alias binding
UNWIND [1,2,3] AS row RETURN 1 AS x error ✅ ok UNWIND alias binding
RETURN 1 AS null / true / false / end error ❌ still error conflicted_keywords (intentional)
MATCH (count) RETURN 1 AS x error ❌ still error pattern bindings unchanged (intentional)
RETURN 1 AS my_alias ✅ ok ✅ ok unaffected

Known limitation (intentional, scope-bounded)

Even with this PR, referencing an alias whose name is a keyword (e.g.
WITH 1 AS count RETURN count) still fails, because expr_var reads through
var_name, which is unchanged. Broadening expr_var to accept keywords
re-introduces the 156 grammar conflicts. The literal repro from the issue
(RETURN 1 AS count) is resolved; reading keyword-named aliases back is a
deeper grammar refactor and out of scope here. This is documented inline
in the regression test.

Testing

  • Build: make PG_CONFIG=/usr/lib/postgresql/18/bin/pg_config -j$(nproc)
    clean, no new bison warnings, no shift/reduce conflicts.
  • Regression suite: make installcheck on PostgreSQL 18.
    Result: 32/34 tests pass, including the new reserved_keyword_alias test.
    The remaining failure (age_upgrade) is pre-existing on master and
    unrelated to this change.
  • PostgreSQL version coverage: the change is parser-only and
    PG-version-agnostic. The fix lives entirely in cypher_gram.y and the
    generated cypher_gram.c; no PG-version-specific APIs are touched. The
    reporter's environment was AGE 1.16.0 on PostgreSQL 16.13. Building
    master against PG 16 currently fails for unrelated reasons
    (cypher_set.c/age.c use PG18-only APIs such as index_close and
    age_shmem_startup_hook); this is independent of Usind reserved keywords as aliases in Cypher queries #2355 and is not
    addressed here.

New regression test

regress/sql/reserved_keyword_alias.sql covers:

  1. The exact reproducer from the issue.
  2. 18 representative safe_keywords as RETURN ... AS <kw>.
  3. Multiple keyword aliases in one projection.
  4. WITH ... AS <kw> alias binding.
  5. UNWIND ... AS <kw> alias binding.
  6. Negative tests for END / NULL / TRUE / FALSE (must still error).
  7. Negative test for keyword in pattern-variable position (must still error).
  8. A normal-identifier sanity check.

Risk analysis

  • Grammar conflicts: zero new conflicts (verified — bison was clean,
    build had no -Wconflicts-sr warning).
  • Call-site impact: only three productions reference var_name_alias
    (yield_item, return_item, unwind). All other 6 use sites of var_name
    (named paths, pattern variables, edge bindings, expr_var,
    var_name_opt) are untouched.
  • Output / semantics: only the parser accepts a strictly larger input
    language. Accepted-before queries continue to behave identically.

Files changed

File Notes
src/backend/parser/cypher_gram.y New var_name_alias rule; alias-binding productions switched to it.
Makefile New test reserved_keyword_alias added to REGRESS between security and drop.
regress/sql/reserved_keyword_alias.sql New regression test.
regress/expected/reserved_keyword_alias.out Expected output.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the Cypher parser to allow a curated set of “safe” reserved keywords (the existing safe_keywords set) to be used as aliases only in RETURN, WITH, YIELD, and UNWIND ... AS <alias> positions, addressing #2355 (RETURN 1 AS count previously errored).

Changes:

  • Introduces a new var_name_alias grammar rule and uses it at alias-binding sites (RETURN/WITH/YIELD/UNWIND).
  • Adds a new regression test (reserved_keyword_alias) plus expected output to validate accepted/rejected alias cases.
  • Registers the new regression test in the top-level Makefile’s REGRESS list.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/backend/parser/cypher_gram.y Adds var_name_alias and switches alias-binding productions to accept safe_keywords in ... AS <alias> positions.
regress/sql/reserved_keyword_alias.sql New regression test covering keyword aliases in RETURN/WITH/UNWIND plus negative cases.
regress/expected/reserved_keyword_alias.out Expected output for the new regression test.
Makefile Adds reserved_keyword_alias to REGRESS test ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread regress/expected/reserved_keyword_alias.out Outdated
Comment thread regress/sql/reserved_keyword_alias.sql Outdated
Comment thread src/backend/parser/cypher_gram.y Outdated
Comment thread regress/sql/reserved_keyword_alias.sql Outdated
@jrgemignani
Copy link
Copy Markdown
Contributor

@crprashant Please address these minor issues -

Recommendation

  1. Drop the redundant pstrdup in the new action body.
    The safe_keywords rule already runs KEYWORD_STRDUP($1) on every alternative, which is pnstrdup-backed. The extra pstrdup((char *) $1) in var_name_alias is a second heap allocation per parse with no purpose. Replace it with $$ = (char *) $1; to match the established pattern in schema_name:

    schema_name:
        symbolic_name
        | reserved_keyword
            {
                /* we don't need to copy it, as it already has been */
                $$ = (char *) $1;
            }
        ;

    This is a one-line cosmetic improvement that also makes the code symmetric with neighbouring rules and slightly reduces parser allocations on alias-heavy queries.

  2. Add an inline comment in var_name_alias documenting the unreadable-alias limitation, with an issue link.
    The PR description correctly notes that WITH 1 AS count RETURN count still fails because expr_var reads through var_name, which is unchanged. That asymmetry (writable but not readable) will surprise the next maintainer who reads only the grammar file. Add 2–3 lines pointing to the follow-up issue, e.g.:

    /*
     * NOTE: Reading a keyword-named alias back (e.g. WITH 1 AS count RETURN count)
     * is intentionally still rejected — broadening expr_var to accept safe_keywords
     * reintroduces ~156 shift/reduce conflicts. Tracked in issue #NNNN.
     */

    File the follow-up issue and reference its number here so the limitation is discoverable from the source, not just the merged PR description.

  3. Add two regression cases to round out coverage.

    • Backtick-quoted alias positive case: RETURN 1 AS \count`— confirms that quoting forces theIDENTIFIER` token path and continues to work, so future grammar refactors don't accidentally regress quoted identifiers.
    • Known-limitation negative case: WITH 1 AS count RETURN count — currently fails. Adding it with the failing expected output gives the next contributor a precise file to update when they fix expr_var, and serves as permanent self-documentation in the regress tree. Mark it clearly in the test file with a comment like -- Known limitation: see issue #NNNN.
  4. Fix the doc nit in the var_name_alias comment.
    The current comment says the new rule permits "the same set of non-conflicting reserved keywords accepted by safe_keywords (and already accepted in func_name / schema_name)." That is slightly inaccurate: func_name accepts safe_keywords, but schema_name accepts the broader reserved_keyword (which includes conflicted_keywords). Tighten the wording to "(already accepted in func_name; schema_name accepts the broader reserved_keyword)" so a future reader doesn't infer that schema_name is a 1:1 precedent for the new rule.

@crprashant
Copy link
Copy Markdown
Contributor Author

@jrgemignani thanks for the careful review — pushed 49a424e addressing all four points:

  1. Redundant pstrdup removed. safe_keywords already heap-copies via the KEYWORD_STRDUP macro (pnstrdup((kw), strlen(kw))), so the action is now $$ = (char *) $1; — the same pattern schema_name uses for its reserved_keyword branch. An inline comment explains why no further copy is needed.
  2. Read-back limitation documented in code. Added a NOTE: block above var_name_alias explaining that WITH 1 AS count RETURN count is intentionally still rejected because broadening expr_var/var_name reintroduces ~156 shift/reduce conflicts, and linked it to the new follow-up issue Cypher keyword-named aliases cannot be referenced back as variables #2416.
  3. Doc comment tightened. Replaced "(and already accepted in func_name / schema_name)" with "(already accepted in func_name; schema_name accepts the broader reserved_keyword)" — accurately reflects that schema_name is a strict superset of what var_name_alias accepts.
  4. Regression coverage extended. regress/sql/reserved_keyword_alias.sql now also exercises:
    • Positive: RETURN 1 AS `count` and WITH 1 AS `count` RETURN `count` — the backtick-quoted IDENTIFIER path, so future grammar refactors can't silently regress quoted identifiers when the unquoted form is also a keyword.
    • Negative known-limitation: WITH 1 AS count RETURN count is captured with its syntax error at end of input output and a comment pointing at Cypher keyword-named aliases cannot be referenced back as variables #2416, giving the next contributor who tackles expr_var a precise file to update.

Validation re-run on the amended commit:

  • Build/Regression (PG18, COPT=-Werror, EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm"): 36/37 — only the pre-existing age_upgrade failure remains (same as master).
  • Python driver: all 4 suites green.
  • Go driver: all tests green.
  • Node.js driver: 31/31 green.
  • JDBC driver (gradle clean test): BUILD SUCCESSFUL, every *Test PASSED.
  • No new bison shift/reduce conflicts.

Happy to keep iterating if anything else stands out.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread regress/sql/reserved_keyword_alias.sql Outdated
Comment thread regress/expected/reserved_keyword_alias.out Outdated
@jrgemignani
Copy link
Copy Markdown
Contributor

@crprashant Can you address the copilot messages above about CRLF? These files are for linux and need to conform to that standard.

This new expected output file appears to be committed with CRLF (Windows) line endings (existing regress/expected/*.out files use LF). Please convert it to LF-only line endings so pg_regress comparisons are stable across platforms.

Cypher productions in `cypher_gram.y` that bind an alias via the AS
keyword (RETURN/WITH/YIELD ... AS x and UNWIND ... AS x) only accepted
plain identifiers. As a result, completely valid Cypher such as

    SELECT * FROM cypher('g', $$ RETURN 1 AS count $$) AS (a agtype);

failed with `syntax error at or near "count"`, even though `count` is
already accepted in other identifier positions (it appears in the
existing `safe_keywords` list and is permitted in `func_name`;
`schema_name` accepts the broader `reserved_keyword` set).

This patch introduces a dedicated `var_name_alias` non-terminal used
only in the three alias-binding sites (yield_item, return_item,
unwind). It accepts everything `var_name` accepts, plus the entire
`safe_keywords` set, so the 49 non-conflicting reserved keywords
(count, exists, coalesce, match, return, where, order, limit, distinct,
optional, detach, contains, starts, ends, in, is, not, ...) are now
usable as aliases.

The change is intentionally scoped to alias positions:

  * `var_name` itself (used by pattern-variable bindings like
    `(x:Label)`, edge bindings, named paths, and `expr_var` references)
    is unchanged. Allowing safe_keywords there triggers 156 shift/reduce
    conflicts because keyword tokens collide with their roles inside
    expressions and patterns.
  * `conflicted_keywords` (END, NULL, TRUE, FALSE) remain rejected in
    every position; they are genuinely ambiguous with literal/CASE
    productions.

Reading a keyword-named alias back through `expr_var` still fails (e.g.
`WITH 1 AS count RETURN count`) because `expr_var` reads through
`var_name`. That asymmetry is captured as a known limitation in the
regression suite and tracked separately in apache#2416.

Regression coverage lives in `regress/sql/reserved_keyword_alias.sql`
and `regress/expected/reserved_keyword_alias.out`, exercising:

  * the original repro,
  * representative safe_keywords across RETURN/WITH/UNWIND,
  * multiple keyword aliases in one projection,
  * a backtick-quoted alias positive case,
  * the known read-back limitation as a negative test, and
  * explicit negatives proving END/NULL/TRUE/FALSE and pattern-position
    keywords still error out.

Closes apache#2355.
@crprashant crprashant force-pushed the fix/2355-reserved-keywords-as-aliases branch from 49a424e to f686e93 Compare April 28, 2026 21:54
@crprashant
Copy link
Copy Markdown
Contributor Author

@jrgemignani good catch — thanks! Pushed f686e93 with both new regression files normalized to LF-only line endings, matching the rest of the regress/sql/*.sql and regress/expected/*.out tree. The earlier amend accidentally retained CRLF from a Windows-side scratch file.

Verified:

$ file regress/sql/reserved_keyword_alias.sql regress/expected/reserved_keyword_alias.out
regress/sql/reserved_keyword_alias.sql:      ASCII text
regress/expected/reserved_keyword_alias.out: ASCII text

(file reports with CRLF line terminators when CR is present; both now report plain ASCII text ⇒ LF-only.)

The expected output was regenerated from a fresh installcheck run against the LF .sql, so the captured SET search_path, LOAD, and other echoed lines are all LF-only — no mixed endings remain. Build/Regression on PG18 with COPT=-Werror EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm" is still 36/37 (only the pre-existing age_upgrade fails). No driver test impact since no driver code or fixtures changed.

@jrgemignani jrgemignani merged commit ff47828 into apache:master Apr 29, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants