Skip to content

[fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite#62492

Open
seawinde wants to merge 12 commits intoapache:masterfrom
seawinde:fix-mtmv-2hop-null-reject
Open

[fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite#62492
seawinde wants to merge 12 commits intoapache:masterfrom
seawinde:fix-mtmv-2hop-null-reject

Conversation

@seawinde
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: N/A

Related PR: #30374

Problem Summary:

In multi-hop LEFT JOIN materialized view transparent rewrite (e.g., fact LEFT JOIN dim1 LEFT JOIN dim2), when the query has a WHERE clause that null-rejects only the outermost dimension table (e.g., WHERE dim2.col = 'value'), the MV rewrite fails with "Predicate compensate fail".

Root cause: In AbstractMaterializedViewRule.containsNullRejectSlot(), the original code only checked filter predicates (queryPredicates) for NOT NULL evidence. After the Nereids rewrite pipeline runs:

  1. EliminateOuterJoin converts all eligible LEFT JOINs → INNER (cascading through InferJoinNotNull across multiple passes)
  2. EliminateNotNull unconditionally removes all generated NOT NULL predicates (isGeneratedIsNotNull=true)

By the time MV rewrite (exploration phase) runs, the query plan has INNER JOINs but zero NOT NULL filter predicates. The only surviving predicate is the user's WHERE clause (e.g., dim2.region_name = 'West'), which can only prove NOT NULL for outermost dim2 slots — leaving intermediate dim1 slots uncovered.

Fix: Read INNER JoinEdge conditions directly from the query HyperGraph. After EliminateOuterJoin converts LEFT→INNER, JoinEdge objects retain their INNER type and join condition expressions even though EliminateNotNull removes filter-level NOT NULL predicates. ExpressionUtils.inferNotNullSlots() extracts NOT NULL slots from these INNER join conditions, covering all intermediate join tables.

File Change Description
AbstractMaterializedViewRule.java containsNullRejectSlot(): Add loop over INNER JoinEdges to collect NOT NULL slots from join conditions via inferNotNullSlots. Also add shuttleExpressionWithLineage for correct slot-level mapping.
NullRejectInferenceTest.java (new) FE unit test: query=2-hop INNER JOIN vs view=2-hop LEFT JOIN, verifies predicatesCompensate succeeds
outer_join_two_hop_null_reject.groovy (new) Regression test: 3 tables, async MV with 2-hop LEFT JOIN + WHERE + aggregate rollup, verifies rewrite success and result correctness

2-hop example walkthrough:

Query HyperGraph (after EliminateOuterJoin):
  JoinEdge 1 (INNER): o.store_id = d.id    → {o.store_id, d.id} NOT NULL
  JoinEdge 2 (INNER): d.id = r.store_id    → {d.id, r.store_id} NOT NULL
  FilterEdge:         r.region_name = 'West' → {r.region_name} NOT NULL

queryNullRejectSlots = {o.store_id, d.id, r.store_id, r.region_name}

requireNoNullableViewSlot (view has LEFT JOINs):
  Set 1: {d.id, d.store_name} ∩ queryNullRejectSlots → {d.id} ≠ ∅ ✓
  Set 2: {r.store_id, r.region_name} ∩ queryNullRejectSlots → {r.store_id, r.region_name} ≠ ∅ ✓

Release note

Fix multi-hop LEFT JOIN materialized view transparent rewrite failure when the WHERE clause only references the outermost dimension table.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
  • Behavior changed:

    • No.
  • Does this need documentation?

    • No.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Copy Markdown
Member Author

run buildall

@seawinde seawinde changed the title [fix](fe) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite [fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite Apr 14, 2026
…oin MV rewrite

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: In multi-hop LEFT JOIN MV rewrite (e.g.,
fact LEFT JOIN dim1 LEFT JOIN dim2), when the query has a WHERE clause
that null-rejects the outermost table (dim2), EliminateOuterJoin
converts all LEFT JOINs to INNER. However, containsNullRejectSlot only
checked filter predicates for NOT NULL proof, which only covers the
outermost table slots. The intermediate table (dim1) slots had no
NOT NULL evidence, causing "Predicate compensate fail".

The fix reads INNER JoinEdge conditions from the query HyperGraph.
After EliminateOuterJoin converts LEFT→INNER, JoinEdge objects retain
their INNER type and join condition expressions even though
EliminateNotNull removes filter-level NOT NULL predicates.
ExpressionUtils.inferNotNullSlots extracts NOT NULL slots from these
INNER join conditions, covering all intermediate join tables.

### Release note

Fix multi-hop LEFT JOIN materialized view transparent rewrite failure
when WHERE clause only references the outermost dimension table.

### Check List (For Author)

- Test: Unit Test (NullRejectInferenceTest) / Regression test (outer_join_two_hop_null_reject)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@seawinde seawinde force-pushed the fix-mtmv-2hop-null-reject branch from f2f6c8a to 488c34d Compare April 14, 2026 14:45
@seawinde
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (20/20) 🎉
Increment coverage report
Complete coverage report

seawinde and others added 11 commits April 15, 2026 14:44
…ll-reject inference

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: The fix in containsNullRejectSlot that infers null-rejection
from INNER JoinEdge conditions enables 4 new MV rewrite success cases in the
dimension_2_left_join test:

- (i=0,j=8) and (i=2,j=8): View LEFT_OUTER(lineitem,orders) with query
  "orders LEFT JOIN lineitem WHERE l_shipdate" -> INNER. Previously, filter
  only null-rejected lineitem, missing orders. Now INNER JOIN condition
  l_orderkey=o_orderkey proves o_orderkey IS NOT NULL, null-rejecting orders.

- (i=7,j=3) and (i=9,j=3): View LEFT_OUTER(orders,lineitem) with query
  "lineitem LEFT JOIN orders WHERE o_orderdate" -> INNER. Previously, filter
  only null-rejected orders, missing lineitem. Now INNER JOIN condition
  proves l_orderkey IS NOT NULL, null-rejecting lineitem.

### Release note

None

### Check List (For Author)

- Test: Regression test (dimension_2_left_join expectations updated)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ctations

### What problem does this PR solve?

Problem Summary: INNER JoinEdge null-reject inference now allows more MV rewrites to succeed.
Update success lists: i=0,2 add j=8; i=7,9 add j=3.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tions

### What problem does this PR solve?

Problem Summary: INNER JoinEdge null-reject inference allows INNER query to rewrite with LEFT/RIGHT/FULL MV.
Update join_type else block: add success when j=1 (INNER) and i in [0,2,3] (LEFT/RIGHT/FULL MV).

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Problem Summary: Same pattern as dimension_self_conn - INNER query now succeeds with OUTER MV.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ow succeed

### What problem does this PR solve?

Problem Summary: All 14 INNER queries now succeed against LEFT/RIGHT/FULL OUTER MVs.
INNER JoinEdge provides null-reject on both sides, resolving all type mismatches.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…,6,8,11]->[1,6]

### What problem does this PR solve?

Problem Summary: WHERE clause null-rejects nullable side, converting LEFT->INNER.
INNER JoinEdge then provides null-reject for FULL MV type mismatch resolution.
Remaining failures j=1,6: couldNotPulledUp predicate mismatch.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…5,7,9,10]->[0,7]

### What problem does this PR solve?

Problem Summary: Mirror of left_join_infer_and_derive changes.
Remaining failures j=0,7: couldNotPulledUp predicate mismatch.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Problem Summary: i=2 add j=7 success; i=8 add j=3 success.
INNER JoinEdge null-reject resolves LEFT MV type mismatch.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Problem Summary: i=3 add j=8 success; i=7 add j=2 success.
Mirror of left_join_filter changes.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… gain success

### What problem does this PR solve?

Problem Summary: Star topology (lineitem center -> orders, partsupp). When query has INNER JOIN
on an edge where MV has LEFT JOIN, INNER JoinEdge null-reject now resolves the type mismatch.
All 8 LEFT MV sections updated with additional successful query indices.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… gain success

### What problem does this PR solve?

Problem Summary: Line topology (orders->lineitem->partsupp). Section 6 and 16 LEFT MVs:
after partsupp elimination, orders-lineitem INNER vs LEFT mismatch resolved by null-reject.
j=2,4,6 added to success lists.

### Release note
None

### Check List (For Author)
- Test: Regression test expectation update
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@seawinde
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (20/20) 🎉
Increment coverage report
Complete coverage report

@morrySnow
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings:

  1. Blocking: containsNullRejectSlot() now harvests null-reject evidence from JoinEdge, but only for INNER_JOIN. This rule already supports ASOF_LEFT/RIGHT_{INNER,OUTER}_JOIN, EliminateOuterJoin can convert ASOF outer joins to ASOF inner joins, and HyperGraphComparator already models that inference. As written, the same root cause remains for the parallel ASOF rewrite path.
  2. Test gap: the new end-to-end regression uses mv_rewrite_success_without_check_chosen(...), so it can pass even when the memo plan says .mv_name not chose and both executions read the base tables. That means the only new regression case does not prove the fixed rewrite is actually selected.

Checkpoint conclusions:

  • Goal of the task: Fix multi-hop outer-join MV rewrite null-reject inference after join elimination. The ordinary INNER path is addressed, but the supported ASOF path is still missing, so the goal is only partially met.
  • Scope/focus: The code change is small and focused on MV null-reject inference plus tests.
  • Concurrency/locking: No new concurrency, locking, or lifecycle risks were introduced here.
  • Config/compatibility/persistence/data writes: None involved.
  • Parallel code paths: Not fully covered; ASOF join inference is the missing sibling path.
  • Special conditions: The new join-type check is too narrow (isInnerJoin() only).
  • Test coverage: There is positive coverage for the ordinary INNER case, but no ASOF coverage, and the new regression does not assert that the MV is actually chosen.
  • Observability/performance: No obvious new observability or performance concerns in this path.

I did not run FE unit/regression suites in this review environment.

// INNER JOIN conditions guarantee NOT NULL on join-key slots.
// After EliminateOuterJoin converts LEFT→INNER, the JoinEdge objects in the HyperGraph
// retain the INNER type even though EliminateNotNull removes filter-level NOT NULL predicates.
for (JoinEdge joinEdge : queryStructInfo.getHyperGraph().getJoinEdges()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containsNullRejectSlot() now relies on join-edge-derived null-reject slots, but this condition only accepts INNER_JOIN. That leaves the parallel ASOF path unfixed: this rule explicitly supports ASOF_LEFT/RIGHT_{INNER,OUTER}_JOIN, EliminateOuterJoin can convert ASOF outer joins to ASOF inner joins, and HyperGraphComparator already models that inference. In those cases EliminateNotNull can still erase the filter-level IS NOT NULL predicates, and this branch will ignore the surviving ASOF inner JoinEdge conditions. Please include isAsofInnerJoin() here (and add coverage) so the bug is fixed consistently for all supported join families.

"""

create_async_mv(db, mvName, mvSql)
mv_rewrite_success_without_check_chosen(querySql, mvName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new regression is supposed to guard the end-to-end rewrite, but mv_rewrite_success_without_check_chosen(...) also passes when the memo plan says .mv_orders_2hop_null_reject not chose. In that case compare_res(querySql) can still go green because both executions read the base tables. Can we tighten this to mv_rewrite_success(...) (or otherwise assert the MV is actually chosen), so the test really exercises the fixed rewrite path?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants