Skip to content

Fix VLE [*0..N] zero-hop self-binding when edge label is missing (#2382)#2419

Open
crprashant wants to merge 1 commit intoapache:masterfrom
crprashant:fix/2382-vle-zero-hop
Open

Fix VLE [*0..N] zero-hop self-binding when edge label is missing (#2382)#2419
crprashant wants to merge 1 commit intoapache:masterfrom
crprashant:fix/2382-vle-zero-hop

Conversation

@crprashant
Copy link
Copy Markdown
Contributor

Summary

Fixes #2382: variable-length relationship patterns with a zero lower bound ([*0..N]) failed to produce the zero-hop self-binding when the edge label did not exist in the graph.

Per Neo4j/openCypher semantics, a pattern like (p)-[:FRIEND*0..1]-(f) must always include the zero-length match where f binds to the same node as p, regardless of whether any FRIEND edge exists. Apache AGE was treating any missing edge label as fatal for the entire MATCH.

Repro (before this PR)

SELECT * FROM cypher('fuzz_graph', $$ CREATE (:Person {name:'Alice'}) $$) AS (v agtype);

SELECT * FROM cypher('fuzz_graph', $$
  MATCH (p:Person {name: 'Alice'})
  OPTIONAL MATCH (p)-[:FRIEND*0..1]-(f:Person)
  RETURN p.name AS person, f.name AS friend
$$) AS (person agtype, friend agtype);

-- Before:  Alice | NULL
-- After:   Alice | Alice   (matches Neo4j)

Root cause

Two separate label-validity gates short-circuited the MATCH when an edge label was not in the cache:

  • match_check_valid_label() injected a WHERE false (or volatile equivalent) at the MATCH level.
  • path_check_valid_label() made the planner emit NULL::agtype constants in place of vertex bindings even when the vertex labels themselves were valid.

Neither check distinguished fixed-length patterns (which legitimately require an edge of the label) from VLE patterns with a lower bound of 0 (which don't).

There was also a latent runtime bug in is_an_edge_match(): when the user requested a label that did not exist, the InvalidOid check fell through the existing "no constraints → match all" fast-path. Once the parser-side gate was relaxed, [:NOEXIST*0..N] would have begun traversing arbitrary other-label edges. This PR fixes that path too so the zero-hop case is the only thing that matches.

Fix

  1. src/backend/parser/cypher_clause.c — new helper is_zero_lower_bound_vle() that defensively inspects the vle() FuncCall built by build_VLE_relation() (in cypher_gram.y). Both match_check_valid_label() and path_check_valid_label() use it to skip the false-where short-circuit only for relationships that are zero-bound VLEs. Mixed patterns where another segment is genuinely unsatisfiable still short-circuit correctly.
  2. src/backend/utils/adt/age_vle.cis_an_edge_match() returns false early when the requested label does not exist. The zero-hop self-binding is generated by build_VLE_zero_container() and never calls is_an_edge_match(), so this fix does not affect the zero-hop semantics.
  3. regress/sql/cypher_vle.sql — seven regression cases:
    • plain MATCH [:NOEXIST*0..1] → zero-hop only (no leakage onto other-label edges in the graph)
    • OPTIONAL MATCH form (issue's exact repro shape)
    • [*0..0] → zero-hop only
    • fixed-length [:NOEXIST*1..1] → 0 rows (preserved)
    • OPTIONAL fixed-length missing → outer row preserved with NULL
    • mixed pattern (a)-[:NOEXIST*0..1]-(b)-[:STILL_MISSING]-(c) → 0 rows
    • sanity: zero-bound on an existing label (unchanged behaviour)

Validation

  • make installcheck EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm": 36/37 pass on PG18. The single failure (age_upgrade) is pre-existing on master at 54e19fac and is unrelated to this change.
  • All 7 new regression cases pass.
  • cypher_vle (which is the suite the new tests live in) passes cleanly.
  • Manual rebuild + EXPLAIN (VERBOSE, ANALYZE) confirms the planner no longer constant-folds f.name to NULL::agtype and the query returns exactly the expected zero-hop row.

Notes for reviewers

  • The helper validates that rel->varlen is a FuncCall named "vle" with at least 5 args before reading args[3]; any deviation falls back to the existing short-circuit, so future parser refactors will be safe.
  • A_Const integer inspection follows the same direct field access (val.ival.type / val.ival.ival) used elsewhere in cypher_gram.y (e.g. line 696).
  • No bison or grammar changes; no new shift/reduce conflicts.

Closes #2382.

…che#2382)

A variable-length relationship pattern with a zero lower bound, e.g.
`(p)-[:LABEL*0..N]-(f)`, must produce the zero-hop self-binding row
(`f` = `p`) regardless of whether any edge of `LABEL` exists in the
graph. This matches Neo4j/openCypher semantics. Previously, when the
edge label did not exist in the label cache, AGE short-circuited the
entire MATCH to zero rows (or NULL-extended rows for OPTIONAL MATCH).

The fix has three parts:

1. parser/cypher_clause.c: A new helper `is_zero_lower_bound_vle()`
   inspects the FuncCall produced by `build_VLE_relation()` and reports
   whether the relationship is a zero-bound VLE. It is intentionally
   defensive about the FuncCall shape so that any future parser changes
   fall back to the existing short-circuit safely.

   `match_check_valid_label()` and `path_check_valid_label()` now treat
   a missing edge label as fatal only when the relationship requires at
   least one edge of that label. Patterns mixing a zero-bound segment
   with another impossible segment (e.g. `(a)-[:NOEXIST*0..1]-(b)-[:STILL_MISSING]-(c)`)
   still correctly resolve to zero rows because the second segment
   independently fails the label check.

2. utils/adt/age_vle.c: `is_an_edge_match()` now returns false early
   when the user requested a specific label that does not exist
   (`edge_label_name != NULL && edge_label_name_oid == InvalidOid`).
   This prevents a zero-bound traversal of `[:NOEXIST*0..N]` from
   incorrectly walking arbitrary other-label edges via the existing
   "no constraints -> match all" fast path. The zero-hop case itself
   is unaffected because it is generated by `build_VLE_zero_container()`
   without ever consulting `is_an_edge_match()`.

3. regress/sql/cypher_vle.sql: Adds seven regression cases that lock
   in the new behaviour, including the rubber-duck scenarios where
   another label exists in the graph (must NOT be matched by the
   missing-label VLE), where another segment is unsatisfiable
   (must still produce zero rows), and where the label exists
   (sanity check, unchanged behaviour).
@crprashant crprashant force-pushed the fix/2382-vle-zero-hop branch from 5e34871 to cefdd1a Compare May 1, 2026 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Variable-length relationship patterns with a zero lower bound may fail to produce the expected zero-hop self-binding.

1 participant