Restore contsel/contjoinsel for containment & key-existence operators (#2356)#2417
Open
crprashant wants to merge 1 commit intoapache:masterfrom
Open
Restore contsel/contjoinsel for containment & key-existence operators (#2356)#2417crprashant wants to merge 1 commit intoapache:masterfrom
crprashant wants to merge 1 commit intoapache:masterfrom
Conversation
…apache#2356) The containment (`@>`, `<@`, `@>>`, `<<@`) and key-existence (`?`, `?|`, `?&`) operators on `agtype` were bound to `matchingsel`/`matchingjoinsel` on the PG14+ source tree. `matchingsel` is built for pattern operators (LIKE/regex) and during planning invokes the operator's underlying function (`agtype_contains`) once per `pg_statistic` MCV. With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries. Restore the lighter `contsel`/`contjoinsel` helpers used by PostgreSQL core's jsonb operators (`@>`, `<@`, `?` on jsonb), which matches upstream's long-standing precedent for the same operator family. Changes: * `sql/agtype_operators.sql`, `sql/agtype_exists.sql`: 10 operators flipped from `matchingsel`/`matchingjoinsel` to `contsel`/`contjoinsel`. * `age--1.7.0--y.y.y.sql`: appended `ALTER OPERATOR ... SET (RESTRICT, JOIN)` for all 10 operators so existing installs flip on `ALTER EXTENSION age UPDATE`. * `regress/sql/containment_selectivity.sql` (+ `expected/.out`): pin the bindings via `pg_operator`, plus a "no leaked matchingsel" aggregate guard and functional smoke for all 10 operators. The guard catches future regressions if a new operator is added without the right selectivity helper. * `regress/expected/cypher_match.out`, `regress/expected/cypher_vle.out`: refresh expected to reflect new (and better) plan shapes that the lower-selectivity helper produces — `test_enable_containment` now picks Nested Loop + Index Only Scans over a Seq Scan/Hash Join, and two `MATCH p=...` and `show_list_use_vle` queries flip row order (queries had no `ORDER BY`; result set is unchanged, only ordering). * `Makefile`: register `containment_selectivity` in `REGRESS`. Validation: * Build: clean, `-Werror`. * Regression: 36/37 tests pass under `EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm"`. Only `age_upgrade` fails — pre-existing on master at 774e781 (verified by `git stash && installcheck`). * Reporter's exact methodology (LDBC-SNB-style snb_graph + pgbench on `bench_message_content`) reproduces the regression and the fix: | Metric | matchingsel | contsel | Delta | |----------------------------|-------------|---------|-------| | EXPLAIN planning time (ms) | 1.42 | 0.97 | -32% | | EXPLAIN execution time (ms)| 0.34 | 0.31 | ~equal| | pgbench TPS (8c x 30s) | 5247 | 7378 | +40.6%| Run with `default_statistics_target = 1000` to populate MCV lists, matching the reporter's analyzed-graph conditions. * Upgrade path: validated end-to-end during the benchmark — operator bindings were flipped from `matchingsel` -> `contsel` via the same `ALTER OPERATOR` statements the upgrade SQL ships, while operators remained functional throughout. Driver workflows (python/go/node/jdbc) intentionally not run: this PR only adjusts pg_operator selectivity metadata. There is no C, type, or protocol change that drivers could observe. Closes apache#2356.
9b27aec to
50beb10
Compare
Contributor
Author
|
Self-review caught a bug in the smoke test: Force-pushed amend |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restore contsel/contjoinsel for containment & key-existence operators
Closes #2356.
Why
The containment (
@>,<@,@>>,<<@) and key-existence (?,?|,?&) operators onagtypeare bound tomatchingsel/matchingjoinsel.matchingselis intended for pattern operators (LIKE/regex) — during planning it invokes the operator's underlying function (agtype_contains) once perpg_statisticMCV entry. With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries.PostgreSQL core binds jsonb's analogous operators (
@>,<@,?onjsonb) tocontsel/contjoinselfor exactly this reason. This PR restores that precedent foragtype.What
sql/agtype_operators.sql,sql/agtype_exists.sql: 10 operators flipped tocontsel/contjoinsel.age--1.7.0--y.y.y.sql: appendedALTER OPERATOR ... SET (RESTRICT, JOIN)for all 10 operators so existing installs flip onALTER EXTENSION age UPDATE.regress/sql/containment_selectivity.sql: new test that pins the bindings viapg_operator, plus a no-leaked-matchingsel aggregate guard and functional smoke for all 10 operators.regress/expected/cypher_match.out,regress/expected/cypher_vle.out: refresh expected —test_enable_containmentnow picks Nested Loop + Index Only Scans over Seq Scan/Hash Join (a direct, better plan), and twoMATCH p=...andshow_list_use_vlequeries flip row order (they have noORDER BY; result set unchanged).Makefile: register the new test inREGRESS.Validation
Regression suite
36/37 pass with
EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm". Onlyage_upgradefails — pre-existing on master at774e781b(verified bygit stash && installcheckbaseline of 32/33 with the sameage_upgradefailure).Reporter's exact methodology — reproduced
Reporter's three scripts (
generate_graph.sql,setup_func_for_workload.sql,workload_select.sql) used unchanged. Run on PG18 withdefault_statistics_target = 1000to populate MCV lists, matching the reporter's analyzed-graph conditions:The −32% planning-time delta lines up with the reporter's "~30% of execution time spent in agtype_contains during planning" observation; the +40.6% TPS gain matches the "severe TPS drop in OLTP-style workloads" they reported.
Upgrade path
Validated end-to-end during the benchmark: operator bindings were flipped from
matchingsel→contselvia the sameALTER OPERATORstatements the upgrade SQL ships, while operators remained functional throughout.pg_operatorsnapshotsDriver workflows
Intentionally not run: this PR only adjusts
pg_operatorselectivity metadata. There is no C code, type, or wire-protocol change that python / go / node / JDBC drivers could observe.Notes for reviewers
matchingseldoes provide better estimates when good statistics exist on heavily-analyzedagtypecolumns. PostgreSQL core accepts the same trade-off for jsonb. A future improvement (out of scope here) would be a customagtype_contains_selectivitymirroringjsonb_sel; happy to file as a follow-up if there's interest.containment_selectivityregression test is intentionally minimal so the diff is loud and precise if anyone re-introducesmatchingselhere. The aggregate guard catches future operator additions that forget the right helper.