Propagate null through unnest and single() three-valued logic#2406
Propagate null through unnest and single() three-valued logic#2406MuhammadTahaNaveed wants to merge 1 commit intoapache:masterfrom
Conversation
Two related defects in how AGE handled cypher null inside list-iterating
constructs.
age_unnest packaged every iterated element as a non-SQL-NULL agtype
datum, even AGTV_NULL scalars. SQL `IS NULL` / `IS NOT NULL` then
couldn't see those nulls, so `[x IN [null, 1] WHERE x IS NULL]` dropped
the null it was meant to keep, and `WHERE x IS NOT NULL` kept the null
it was meant to drop. The same mismatch surfaced in UNWIND. AGE already
treats SQL NULL as the row-level representation of cypher null
elsewhere (`RETURN null AS v` yields SQL NULL, strict operators
short-circuit on it); age_unnest now does the same by emitting the row
with `nulls[0] = true` when the element is AGTV_NULL.
single() previously transformed to `SELECT count(*) FROM unnest(list)
AS x WHERE pred IS TRUE`, with the grammar wrapping the result as
`(subquery) = 1`. With the unnest fix, `[null, 5] WHERE x > 0` left
one definite true after the WHERE filter -> count = 1 -> true. Neo4j
returns null because the unknown predicate could itself be a second
match. Rewritten to a CASE built on `count(*) FILTER (WHERE pred IS
TRUE)` and `bool_or(pred IS NULL)`:
CASE WHEN count(*) FILTER (WHERE pred IS TRUE) >= 2 THEN false
WHEN bool_or(pred IS NULL) THEN NULL
WHEN count(*) FILTER (WHERE pred IS TRUE) = 1 THEN true
ELSE false END
The >=2 arm runs first so two definite trues dominate any unknowns.
Fits inside the existing make_predicate_case_expr helper alongside
all/any/none, removes the special-case transform branch and the
grammar `= 1` wrap. A small `make_count_star_filter_agg` helper
mirrors the existing `make_bool_or_agg`. Verified against Neo4j for
the new edge cases (one-true-plus-null, two-trues-plus-null,
all-nulls, mixed-true-false-null).
The predicate_functions regression also picks up the corrected
behavior of any/all/none over null elements: `null > 0` now yields
SQL NULL instead of being silently treated as true, so the
three-valued combinators in those functions produce the openCypher
results the comments previously documented as buggy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes Cypher null handling in list-iteration constructs and aligns predicate functions (notably single()) with openCypher/Neo4j three-valued logic by propagating SQL NULL correctly through UNNEST/UNWIND and updating transformed predicate-function subqueries.
Changes:
- Update
age_unnestto emit SQL NULL rows forAGTV_NULLelements soIS NULL/IS NOT NULLbehave correctly. - Remove the grammar-level
(subquery) = 1wrapping forsingle()and instead return a boolean directly from a CASE-based aggregate expression. - Extend regression coverage for null-related edge cases in predicate functions, list comprehension filters, and UNWIND.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/backend/utils/adt/agtype.c | Emit SQL NULL for AGTV_NULL elements during unnest/unwind iteration. |
| src/backend/parser/cypher_gram.y | Simplify predicate-function node building now that single() returns boolean directly. |
| src/backend/parser/cypher_clause.c | Add count(*) FILTER aggregate helper and implement three-valued CASE logic for single(). |
| regress/sql/predicate_functions.sql | Add/adjust regression queries to validate three-valued semantics across any/all/none/single. |
| regress/sql/list_comprehension.sql | Add regression cases for IS NULL / IS NOT NULL behavior over null list elements and UNWIND. |
| regress/expected/predicate_functions.out | Update expected outputs for new three-valued/null propagation behavior. |
| regress/expected/list_comprehension.out | Update expected outputs for list-comprehension and UNWIND null filtering behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Int64GetDatum(2), false, FLOAT8PASSBYVAL); | ||
| Node *int_1 = (Node *) makeConst(INT8OID, -1, InvalidOid, sizeof(int64), | ||
| Int64GetDatum(1), false, FLOAT8PASSBYVAL); |
There was a problem hiding this comment.
makeConst() is creating INT8OID constants but uses FLOAT8PASSBYVAL for the constbyval flag. This should use the byval macro appropriate for int8 (e.g., INT8PASSBYVAL, or derive it via type metadata) to avoid incorrect Datum handling on platforms where pass-by-value differs.
| Int64GetDatum(2), false, FLOAT8PASSBYVAL); | |
| Node *int_1 = (Node *) makeConst(INT8OID, -1, InvalidOid, sizeof(int64), | |
| Int64GetDatum(1), false, FLOAT8PASSBYVAL); | |
| Int64GetDatum(2), false, INT8PASSBYVAL); | |
| Node *int_1 = (Node *) makeConst(INT8OID, -1, InvalidOid, sizeof(int64), | |
| Int64GetDatum(1), false, INT8PASSBYVAL); |
| * Empty list: all aggregates return NULL on zero rows, so the CASE | ||
| * falls through to the default: false for any()/single(), true for | ||
| * all()/none(). This matches Cypher's vacuous truth semantics. |
There was a problem hiding this comment.
This comment is inaccurate for single(): count(*) (including count(*) FILTER (...)) returns 0 on zero rows, not NULL. The logic still appears to produce the intended default, but the comment should be corrected (e.g., note that bool_or(...) yields NULL on zero rows while count(*) yields 0).
| * Empty list: all aggregates return NULL on zero rows, so the CASE | |
| * falls through to the default: false for any()/single(), true for | |
| * all()/none(). This matches Cypher's vacuous truth semantics. | |
| * Empty list: bool_or(...) returns NULL on zero rows, while count(*) | |
| * (including count(*) FILTER (...)) returns 0, so the CASE still falls | |
| * through to the default: false for any()/single(), true for all()/none(). | |
| * This matches Cypher's vacuous truth semantics. |
| agg = makeNode(Aggref); | ||
| agg->aggfnoid = count_oid; | ||
| agg->aggtype = INT8OID; |
There was a problem hiding this comment.
make_count_star_filter_agg hard-codes the aggregate return type as INT8OID. While count(*) returns bigint in standard PostgreSQL, it’s more robust to derive the return type from count_oid (as the parser/transform layer typically does) to avoid subtle issues if function resolution changes or if this helper is reused for other aggregates in the future.
|
@MuhammadTahaNaveed It looks like you need to rebase this PR |
Two related defects in how AGE handled cypher null inside list-iterating constructs.
age_unnest packaged every iterated element as a non-SQL-NULL agtype datum, even AGTV_NULL scalars. SQL
IS NULL/IS NOT NULLthen couldn't see those nulls, so[x IN [null, 1] WHERE x IS NULL]dropped the null it was meant to keep, andWHERE x IS NOT NULLkept the null it was meant to drop. The same mismatch surfaced in UNWIND. AGE already treats SQL NULL as the row-level representation of cypher null elsewhere (RETURN null AS vyields SQL NULL, strict operators short-circuit on it); age_unnest now does the same by emitting the row withnulls[0] = truewhen the element is AGTV_NULL.single() previously transformed to
SELECT count(*) FROM unnest(list) AS x WHERE pred IS TRUE, with the grammar wrapping the result as(subquery) = 1. With the unnest fix,[null, 5] WHERE x > 0left one definite true after the WHERE filter -> count = 1 -> true. Neo4j returns null because the unknown predicate could itself be a second match. Rewritten to a CASE built oncount(*) FILTER (WHERE pred IS TRUE)andbool_or(pred IS NULL):CASE WHEN count() FILTER (WHERE pred IS TRUE) >= 2 THEN false
WHEN bool_or(pred IS NULL) THEN NULL
WHEN count() FILTER (WHERE pred IS TRUE) = 1 THEN true
ELSE false END
The >=2 arm runs first so two definite trues dominate any unknowns. Fits inside the existing make_predicate_case_expr helper alongside all/any/none, removes the special-case transform branch and the grammar
= 1wrap. A smallmake_count_star_filter_agghelper mirrors the existingmake_bool_or_agg. Verified against Neo4j for the new edge cases (one-true-plus-null, two-trues-plus-null, all-nulls, mixed-true-false-null).The predicate_functions regression also picks up the corrected behavior of any/all/none over null elements:
null > 0now yields SQL NULL instead of being silently treated as true, so the three-valued combinators in those functions produce the openCypher results the comments previously documented as buggy.