[fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite#62492
[fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite#62492seawinde wants to merge 12 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
…oin MV rewrite ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: In multi-hop LEFT JOIN MV rewrite (e.g., fact LEFT JOIN dim1 LEFT JOIN dim2), when the query has a WHERE clause that null-rejects the outermost table (dim2), EliminateOuterJoin converts all LEFT JOINs to INNER. However, containsNullRejectSlot only checked filter predicates for NOT NULL proof, which only covers the outermost table slots. The intermediate table (dim1) slots had no NOT NULL evidence, causing "Predicate compensate fail". The fix reads INNER JoinEdge conditions from the query HyperGraph. After EliminateOuterJoin converts LEFT→INNER, JoinEdge objects retain their INNER type and join condition expressions even though EliminateNotNull removes filter-level NOT NULL predicates. ExpressionUtils.inferNotNullSlots extracts NOT NULL slots from these INNER join conditions, covering all intermediate join tables. ### Release note Fix multi-hop LEFT JOIN materialized view transparent rewrite failure when WHERE clause only references the outermost dimension table. ### Check List (For Author) - Test: Unit Test (NullRejectInferenceTest) / Regression test (outer_join_two_hop_null_reject) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f2f6c8a to
488c34d
Compare
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
…ll-reject inference ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: The fix in containsNullRejectSlot that infers null-rejection from INNER JoinEdge conditions enables 4 new MV rewrite success cases in the dimension_2_left_join test: - (i=0,j=8) and (i=2,j=8): View LEFT_OUTER(lineitem,orders) with query "orders LEFT JOIN lineitem WHERE l_shipdate" -> INNER. Previously, filter only null-rejected lineitem, missing orders. Now INNER JOIN condition l_orderkey=o_orderkey proves o_orderkey IS NOT NULL, null-rejecting orders. - (i=7,j=3) and (i=9,j=3): View LEFT_OUTER(orders,lineitem) with query "lineitem LEFT JOIN orders WHERE o_orderdate" -> INNER. Previously, filter only null-rejected orders, missing lineitem. Now INNER JOIN condition proves l_orderkey IS NOT NULL, null-rejecting lineitem. ### Release note None ### Check List (For Author) - Test: Regression test (dimension_2_left_join expectations updated) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ctations ### What problem does this PR solve? Problem Summary: INNER JoinEdge null-reject inference now allows more MV rewrites to succeed. Update success lists: i=0,2 add j=8; i=7,9 add j=3. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tions ### What problem does this PR solve? Problem Summary: INNER JoinEdge null-reject inference allows INNER query to rewrite with LEFT/RIGHT/FULL MV. Update join_type else block: add success when j=1 (INNER) and i in [0,2,3] (LEFT/RIGHT/FULL MV). ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve? Problem Summary: Same pattern as dimension_self_conn - INNER query now succeeds with OUTER MV. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ow succeed ### What problem does this PR solve? Problem Summary: All 14 INNER queries now succeed against LEFT/RIGHT/FULL OUTER MVs. INNER JoinEdge provides null-reject on both sides, resolving all type mismatches. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…,6,8,11]->[1,6] ### What problem does this PR solve? Problem Summary: WHERE clause null-rejects nullable side, converting LEFT->INNER. INNER JoinEdge then provides null-reject for FULL MV type mismatch resolution. Remaining failures j=1,6: couldNotPulledUp predicate mismatch. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…5,7,9,10]->[0,7] ### What problem does this PR solve? Problem Summary: Mirror of left_join_infer_and_derive changes. Remaining failures j=0,7: couldNotPulledUp predicate mismatch. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve? Problem Summary: i=2 add j=7 success; i=8 add j=3 success. INNER JoinEdge null-reject resolves LEFT MV type mismatch. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve? Problem Summary: i=3 add j=8 success; i=7 add j=2 success. Mirror of left_join_filter changes. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… gain success ### What problem does this PR solve? Problem Summary: Star topology (lineitem center -> orders, partsupp). When query has INNER JOIN on an edge where MV has LEFT JOIN, INNER JoinEdge null-reject now resolves the type mismatch. All 8 LEFT MV sections updated with additional successful query indices. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… gain success ### What problem does this PR solve? Problem Summary: Line topology (orders->lineitem->partsupp). Section 6 and 16 LEFT MVs: after partsupp elimination, orders-lineitem INNER vs LEFT mismatch resolved by null-reject. j=2,4,6 added to success lists. ### Release note None ### Check List (For Author) - Test: Regression test expectation update - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Findings:
- Blocking:
containsNullRejectSlot()now harvests null-reject evidence fromJoinEdge, but only forINNER_JOIN. This rule already supportsASOF_LEFT/RIGHT_{INNER,OUTER}_JOIN,EliminateOuterJoincan convert ASOF outer joins to ASOF inner joins, andHyperGraphComparatoralready models that inference. As written, the same root cause remains for the parallel ASOF rewrite path. - Test gap: the new end-to-end regression uses
mv_rewrite_success_without_check_chosen(...), so it can pass even when the memo plan says.mv_name not choseand both executions read the base tables. That means the only new regression case does not prove the fixed rewrite is actually selected.
Checkpoint conclusions:
- Goal of the task: Fix multi-hop outer-join MV rewrite null-reject inference after join elimination. The ordinary INNER path is addressed, but the supported ASOF path is still missing, so the goal is only partially met.
- Scope/focus: The code change is small and focused on MV null-reject inference plus tests.
- Concurrency/locking: No new concurrency, locking, or lifecycle risks were introduced here.
- Config/compatibility/persistence/data writes: None involved.
- Parallel code paths: Not fully covered; ASOF join inference is the missing sibling path.
- Special conditions: The new join-type check is too narrow (
isInnerJoin()only). - Test coverage: There is positive coverage for the ordinary INNER case, but no ASOF coverage, and the new regression does not assert that the MV is actually chosen.
- Observability/performance: No obvious new observability or performance concerns in this path.
I did not run FE unit/regression suites in this review environment.
| // INNER JOIN conditions guarantee NOT NULL on join-key slots. | ||
| // After EliminateOuterJoin converts LEFT→INNER, the JoinEdge objects in the HyperGraph | ||
| // retain the INNER type even though EliminateNotNull removes filter-level NOT NULL predicates. | ||
| for (JoinEdge joinEdge : queryStructInfo.getHyperGraph().getJoinEdges()) { |
There was a problem hiding this comment.
containsNullRejectSlot() now relies on join-edge-derived null-reject slots, but this condition only accepts INNER_JOIN. That leaves the parallel ASOF path unfixed: this rule explicitly supports ASOF_LEFT/RIGHT_{INNER,OUTER}_JOIN, EliminateOuterJoin can convert ASOF outer joins to ASOF inner joins, and HyperGraphComparator already models that inference. In those cases EliminateNotNull can still erase the filter-level IS NOT NULL predicates, and this branch will ignore the surviving ASOF inner JoinEdge conditions. Please include isAsofInnerJoin() here (and add coverage) so the bug is fixed consistently for all supported join families.
| """ | ||
|
|
||
| create_async_mv(db, mvName, mvSql) | ||
| mv_rewrite_success_without_check_chosen(querySql, mvName) |
There was a problem hiding this comment.
This new regression is supposed to guard the end-to-end rewrite, but mv_rewrite_success_without_check_chosen(...) also passes when the memo plan says .mv_orders_2hop_null_reject not chose. In that case compare_res(querySql) can still go green because both executions read the base tables. Can we tighten this to mv_rewrite_success(...) (or otherwise assert the MV is actually chosen), so the test really exercises the fixed rewrite path?
What problem does this PR solve?
Issue Number: N/A
Related PR: #30374
Problem Summary:
In multi-hop LEFT JOIN materialized view transparent rewrite (e.g.,
fact LEFT JOIN dim1 LEFT JOIN dim2), when the query has a WHERE clause that null-rejects only the outermost dimension table (e.g.,WHERE dim2.col = 'value'), the MV rewrite fails with "Predicate compensate fail".Root cause: In
AbstractMaterializedViewRule.containsNullRejectSlot(), the original code only checked filter predicates (queryPredicates) for NOT NULL evidence. After the Nereids rewrite pipeline runs:EliminateOuterJoinconverts all eligible LEFT JOINs → INNER (cascading throughInferJoinNotNullacross multiple passes)EliminateNotNullunconditionally removes all generated NOT NULL predicates (isGeneratedIsNotNull=true)By the time MV rewrite (exploration phase) runs, the query plan has INNER JOINs but zero NOT NULL filter predicates. The only surviving predicate is the user's WHERE clause (e.g.,
dim2.region_name = 'West'), which can only prove NOT NULL for outermost dim2 slots — leaving intermediate dim1 slots uncovered.Fix: Read INNER JoinEdge conditions directly from the query HyperGraph. After
EliminateOuterJoinconverts LEFT→INNER, JoinEdge objects retain their INNER type and join condition expressions even thoughEliminateNotNullremoves filter-level NOT NULL predicates.ExpressionUtils.inferNotNullSlots()extracts NOT NULL slots from these INNER join conditions, covering all intermediate join tables.AbstractMaterializedViewRule.javacontainsNullRejectSlot(): Add loop over INNER JoinEdges to collect NOT NULL slots from join conditions viainferNotNullSlots. Also addshuttleExpressionWithLineagefor correct slot-level mapping.NullRejectInferenceTest.java(new)predicatesCompensatesucceedsouter_join_two_hop_null_reject.groovy(new)2-hop example walkthrough:
Release note
Fix multi-hop LEFT JOIN materialized view transparent rewrite failure when the WHERE clause only references the outermost dimension table.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)