Skip to content

Propagate null through unnest and single() three-valued logic#2406

Open
MuhammadTahaNaveed wants to merge 1 commit intoapache:masterfrom
MuhammadTahaNaveed:i2393
Open

Propagate null through unnest and single() three-valued logic#2406
MuhammadTahaNaveed wants to merge 1 commit intoapache:masterfrom
MuhammadTahaNaveed:i2393

Conversation

@MuhammadTahaNaveed
Copy link
Copy Markdown
Member

Two related defects in how AGE handled cypher null inside list-iterating constructs.

age_unnest packaged every iterated element as a non-SQL-NULL agtype datum, even AGTV_NULL scalars. SQL IS NULL / IS NOT NULL then couldn't see those nulls, so [x IN [null, 1] WHERE x IS NULL] dropped the null it was meant to keep, and WHERE x IS NOT NULL kept the null it was meant to drop. The same mismatch surfaced in UNWIND. AGE already treats SQL NULL as the row-level representation of cypher null elsewhere (RETURN null AS v yields SQL NULL, strict operators short-circuit on it); age_unnest now does the same by emitting the row with nulls[0] = true when the element is AGTV_NULL.

single() previously transformed to SELECT count(*) FROM unnest(list) AS x WHERE pred IS TRUE, with the grammar wrapping the result as (subquery) = 1. With the unnest fix, [null, 5] WHERE x > 0 left one definite true after the WHERE filter -> count = 1 -> true. Neo4j returns null because the unknown predicate could itself be a second match. Rewritten to a CASE built on count(*) FILTER (WHERE pred IS TRUE) and bool_or(pred IS NULL):

CASE WHEN count() FILTER (WHERE pred IS TRUE) >= 2 THEN false
WHEN bool_or(pred IS NULL) THEN NULL
WHEN count(
) FILTER (WHERE pred IS TRUE) = 1 THEN true
ELSE false END

The >=2 arm runs first so two definite trues dominate any unknowns. Fits inside the existing make_predicate_case_expr helper alongside all/any/none, removes the special-case transform branch and the grammar = 1 wrap. A small make_count_star_filter_agg helper mirrors the existing make_bool_or_agg. Verified against Neo4j for the new edge cases (one-true-plus-null, two-trues-plus-null, all-nulls, mixed-true-false-null).

The predicate_functions regression also picks up the corrected behavior of any/all/none over null elements: null > 0 now yields SQL NULL instead of being silently treated as true, so the three-valued combinators in those functions produce the openCypher results the comments previously documented as buggy.

Two related defects in how AGE handled cypher null inside list-iterating
constructs.

age_unnest packaged every iterated element as a non-SQL-NULL agtype
datum, even AGTV_NULL scalars. SQL `IS NULL` / `IS NOT NULL` then
couldn't see those nulls, so `[x IN [null, 1] WHERE x IS NULL]` dropped
the null it was meant to keep, and `WHERE x IS NOT NULL` kept the null
it was meant to drop. The same mismatch surfaced in UNWIND. AGE already
treats SQL NULL as the row-level representation of cypher null
elsewhere (`RETURN null AS v` yields SQL NULL, strict operators
short-circuit on it); age_unnest now does the same by emitting the row
with `nulls[0] = true` when the element is AGTV_NULL.

single() previously transformed to `SELECT count(*) FROM unnest(list)
AS x WHERE pred IS TRUE`, with the grammar wrapping the result as
`(subquery) = 1`. With the unnest fix, `[null, 5] WHERE x > 0` left
one definite true after the WHERE filter -> count = 1 -> true. Neo4j
returns null because the unknown predicate could itself be a second
match. Rewritten to a CASE built on `count(*) FILTER (WHERE pred IS
TRUE)` and `bool_or(pred IS NULL)`:

  CASE WHEN count(*) FILTER (WHERE pred IS TRUE) >= 2 THEN false
       WHEN bool_or(pred IS NULL)                     THEN NULL
       WHEN count(*) FILTER (WHERE pred IS TRUE) =  1 THEN true
       ELSE false END

The >=2 arm runs first so two definite trues dominate any unknowns.
Fits inside the existing make_predicate_case_expr helper alongside
all/any/none, removes the special-case transform branch and the
grammar `= 1` wrap. A small `make_count_star_filter_agg` helper
mirrors the existing `make_bool_or_agg`. Verified against Neo4j for
the new edge cases (one-true-plus-null, two-trues-plus-null,
all-nulls, mixed-true-false-null).

The predicate_functions regression also picks up the corrected
behavior of any/all/none over null elements: `null > 0` now yields
SQL NULL instead of being silently treated as true, so the
three-valued combinators in those functions produce the openCypher
results the comments previously documented as buggy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes Cypher null handling in list-iteration constructs and aligns predicate functions (notably single()) with openCypher/Neo4j three-valued logic by propagating SQL NULL correctly through UNNEST/UNWIND and updating transformed predicate-function subqueries.

Changes:

  • Update age_unnest to emit SQL NULL rows for AGTV_NULL elements so IS NULL / IS NOT NULL behave correctly.
  • Remove the grammar-level (subquery) = 1 wrapping for single() and instead return a boolean directly from a CASE-based aggregate expression.
  • Extend regression coverage for null-related edge cases in predicate functions, list comprehension filters, and UNWIND.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/backend/utils/adt/agtype.c Emit SQL NULL for AGTV_NULL elements during unnest/unwind iteration.
src/backend/parser/cypher_gram.y Simplify predicate-function node building now that single() returns boolean directly.
src/backend/parser/cypher_clause.c Add count(*) FILTER aggregate helper and implement three-valued CASE logic for single().
regress/sql/predicate_functions.sql Add/adjust regression queries to validate three-valued semantics across any/all/none/single.
regress/sql/list_comprehension.sql Add regression cases for IS NULL / IS NOT NULL behavior over null list elements and UNWIND.
regress/expected/predicate_functions.out Update expected outputs for new three-valued/null propagation behavior.
regress/expected/list_comprehension.out Update expected outputs for list-comprehension and UNWIND null filtering behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1834 to +1836
Int64GetDatum(2), false, FLOAT8PASSBYVAL);
Node *int_1 = (Node *) makeConst(INT8OID, -1, InvalidOid, sizeof(int64),
Int64GetDatum(1), false, FLOAT8PASSBYVAL);
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeConst() is creating INT8OID constants but uses FLOAT8PASSBYVAL for the constbyval flag. This should use the byval macro appropriate for int8 (e.g., INT8PASSBYVAL, or derive it via type metadata) to avoid incorrect Datum handling on platforms where pass-by-value differs.

Suggested change
Int64GetDatum(2), false, FLOAT8PASSBYVAL);
Node *int_1 = (Node *) makeConst(INT8OID, -1, InvalidOid, sizeof(int64),
Int64GetDatum(1), false, FLOAT8PASSBYVAL);
Int64GetDatum(2), false, INT8PASSBYVAL);
Node *int_1 = (Node *) makeConst(INT8OID, -1, InvalidOid, sizeof(int64),
Int64GetDatum(1), false, INT8PASSBYVAL);

Copilot uses AI. Check for mistakes.
Comment on lines +1741 to +1743
* Empty list: all aggregates return NULL on zero rows, so the CASE
* falls through to the default: false for any()/single(), true for
* all()/none(). This matches Cypher's vacuous truth semantics.
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is inaccurate for single(): count(*) (including count(*) FILTER (...)) returns 0 on zero rows, not NULL. The logic still appears to produce the intended default, but the comment should be corrected (e.g., note that bool_or(...) yields NULL on zero rows while count(*) yields 0).

Suggested change
* Empty list: all aggregates return NULL on zero rows, so the CASE
* falls through to the default: false for any()/single(), true for
* all()/none(). This matches Cypher's vacuous truth semantics.
* Empty list: bool_or(...) returns NULL on zero rows, while count(*)
* (including count(*) FILTER (...)) returns 0, so the CASE still falls
* through to the default: false for any()/single(), true for all()/none().
* This matches Cypher's vacuous truth semantics.

Copilot uses AI. Check for mistakes.
Comment on lines +1693 to +1695
agg = makeNode(Aggref);
agg->aggfnoid = count_oid;
agg->aggtype = INT8OID;
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_count_star_filter_agg hard-codes the aggregate return type as INT8OID. While count(*) returns bigint in standard PostgreSQL, it’s more robust to derive the return type from count_oid (as the parser/transform layer typically does) to avoid subtle issues if function resolution changes or if this helper is reused for other aggregates in the future.

Copilot uses AI. Check for mistakes.
@jrgemignani
Copy link
Copy Markdown
Contributor

@MuhammadTahaNaveed It looks like you need to rebase this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

List-comprehension WHERE filters may mishandle null elements.

3 participants