Fix VLE [*0..N] zero-hop self-binding when edge label is missing (#2382)#2419
Open
crprashant wants to merge 1 commit intoapache:masterfrom
Open
Fix VLE [*0..N] zero-hop self-binding when edge label is missing (#2382)#2419crprashant wants to merge 1 commit intoapache:masterfrom
crprashant wants to merge 1 commit intoapache:masterfrom
Conversation
…che#2382) A variable-length relationship pattern with a zero lower bound, e.g. `(p)-[:LABEL*0..N]-(f)`, must produce the zero-hop self-binding row (`f` = `p`) regardless of whether any edge of `LABEL` exists in the graph. This matches Neo4j/openCypher semantics. Previously, when the edge label did not exist in the label cache, AGE short-circuited the entire MATCH to zero rows (or NULL-extended rows for OPTIONAL MATCH). The fix has three parts: 1. parser/cypher_clause.c: A new helper `is_zero_lower_bound_vle()` inspects the FuncCall produced by `build_VLE_relation()` and reports whether the relationship is a zero-bound VLE. It is intentionally defensive about the FuncCall shape so that any future parser changes fall back to the existing short-circuit safely. `match_check_valid_label()` and `path_check_valid_label()` now treat a missing edge label as fatal only when the relationship requires at least one edge of that label. Patterns mixing a zero-bound segment with another impossible segment (e.g. `(a)-[:NOEXIST*0..1]-(b)-[:STILL_MISSING]-(c)`) still correctly resolve to zero rows because the second segment independently fails the label check. 2. utils/adt/age_vle.c: `is_an_edge_match()` now returns false early when the user requested a specific label that does not exist (`edge_label_name != NULL && edge_label_name_oid == InvalidOid`). This prevents a zero-bound traversal of `[:NOEXIST*0..N]` from incorrectly walking arbitrary other-label edges via the existing "no constraints -> match all" fast path. The zero-hop case itself is unaffected because it is generated by `build_VLE_zero_container()` without ever consulting `is_an_edge_match()`. 3. regress/sql/cypher_vle.sql: Adds seven regression cases that lock in the new behaviour, including the rubber-duck scenarios where another label exists in the graph (must NOT be matched by the missing-label VLE), where another segment is unsatisfiable (must still produce zero rows), and where the label exists (sanity check, unchanged behaviour).
5e34871 to
cefdd1a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2382: variable-length relationship patterns with a zero lower bound (
[*0..N]) failed to produce the zero-hop self-binding when the edge label did not exist in the graph.Per Neo4j/openCypher semantics, a pattern like
(p)-[:FRIEND*0..1]-(f)must always include the zero-length match wherefbinds to the same node asp, regardless of whether anyFRIENDedge exists. Apache AGE was treating any missing edge label as fatal for the entire MATCH.Repro (before this PR)
Root cause
Two separate label-validity gates short-circuited the MATCH when an edge label was not in the cache:
match_check_valid_label()injected aWHERE false(or volatile equivalent) at the MATCH level.path_check_valid_label()made the planner emitNULL::agtypeconstants in place of vertex bindings even when the vertex labels themselves were valid.Neither check distinguished fixed-length patterns (which legitimately require an edge of the label) from VLE patterns with a lower bound of 0 (which don't).
There was also a latent runtime bug in
is_an_edge_match(): when the user requested a label that did not exist, the InvalidOid check fell through the existing "no constraints → match all" fast-path. Once the parser-side gate was relaxed,[:NOEXIST*0..N]would have begun traversing arbitrary other-label edges. This PR fixes that path too so the zero-hop case is the only thing that matches.Fix
src/backend/parser/cypher_clause.c— new helperis_zero_lower_bound_vle()that defensively inspects thevle()FuncCall built bybuild_VLE_relation()(incypher_gram.y). Bothmatch_check_valid_label()andpath_check_valid_label()use it to skip the false-where short-circuit only for relationships that are zero-bound VLEs. Mixed patterns where another segment is genuinely unsatisfiable still short-circuit correctly.src/backend/utils/adt/age_vle.c—is_an_edge_match()returns false early when the requested label does not exist. The zero-hop self-binding is generated bybuild_VLE_zero_container()and never callsis_an_edge_match(), so this fix does not affect the zero-hop semantics.regress/sql/cypher_vle.sql— seven regression cases:[:NOEXIST*0..1]→ zero-hop only (no leakage onto other-label edges in the graph)[*0..0]→ zero-hop only[:NOEXIST*1..1]→ 0 rows (preserved)(a)-[:NOEXIST*0..1]-(b)-[:STILL_MISSING]-(c)→ 0 rowsValidation
make installcheck EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm": 36/37 pass on PG18. The single failure (age_upgrade) is pre-existing onmasterat54e19facand is unrelated to this change.cypher_vle(which is the suite the new tests live in) passes cleanly.EXPLAIN (VERBOSE, ANALYZE)confirms the planner no longer constant-foldsf.nametoNULL::agtypeand the query returns exactly the expected zero-hop row.Notes for reviewers
rel->varlenis aFuncCallnamed"vle"with at least 5 args before readingargs[3]; any deviation falls back to the existing short-circuit, so future parser refactors will be safe.A_Constinteger inspection follows the same direct field access (val.ival.type/val.ival.ival) used elsewhere incypher_gram.y(e.g. line 696).Closes #2382.