[improvement](fe) TopN lazy materialization support struct/variant nested column pruning#63736
[improvement](fe) TopN lazy materialization support struct/variant nested column pruning#63736englefly wants to merge 31 commits into
Conversation
…sted column pruning Core changes: - LazyMaterializeTopN: find projects below TopN with PreferPushDownProject or variant subColPath expressions, simplify them to expose base columns as lazy candidates. Pull up expressions above Materialize for lazy mat. - LazySlotPruning: extract shouldPruneChild() for safe override when logical properties are stale after plan restructuring - OperativeColumnDerive: skip PreferPushDownProject input slots from operative propagation so struct/variant columns can be lazy - PhysicalLazyMaterialize: propagate access paths to lazy output slots - MaterializationNode: use materializeTupleDescriptor for nested column info display (was outputTupleDesc, never set) - PlanNode: add subColLables to printNestedColumns for variant Tests cover struct (struct_element + pruned type) and variant (element_at + sub path) lazy materialization with nested column pruning. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
|
run buildall |
There was a problem hiding this comment.
I found one blocking correctness issue that should be fixed before merge.
Critical checkpoint conclusions:
- Goal/test coverage: The PR aims to enable TopN lazy materialization for nested/PPD expressions. It adds regression coverage for several
id, nested_exprshapes, but misses output-order permutations where the pulled-up expression appears before other selected columns. - Scope/focus: The feature is generally focused, but the final TopN projection reconstruction changes user-visible column order in the new pulled-up expression path.
- Concurrency/lifecycle: No new shared mutable concurrency or non-obvious lifecycle issue found in the reviewed FE paths.
- Config/compatibility: No new config or incompatible storage/protocol format change found.
- Parallel paths: Normal and
using_indexTopN lazy materialization cases were reviewed; the reported issue is specific to the new nested/PPD pull-up path. - Special checks/error handling: No ignored Status or FE exception-boundary issue found in the actual PR diff.
- Tests/results: Regression tests are deterministic, but they do not cover pulled-up nested expressions in non-tail output positions.
- Observability/performance/transactions: No additional observability, performance, transaction, or persistence blocker found for this PR.
User focus: No additional user-provided review focus was specified.
TPC-H: Total hot run time: 32544 ms |
TPC-DS: Total hot run time: 172873 ms |
FE Regression Coverage ReportIncrement line coverage |
…rojection The final PhysicalProject above Materialize appended all pulled-up expressions at the end regardless of their original position in userVisibleOutput. When a nested expression (struct_element, element_at, etc.) was not the last SELECT column, the result schema was swapped. Fix by building outputExprs in userVisibleOutput order, replacing each pulled-up slot in-place with its corresponding expression via pulledUpExprMap, so output columns match the SQL SELECT order. Add regression tests (struct/variant/map nested expr before id). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
run buildall |
|
/review |
TPC-H: Total hot run time: 31456 ms |
TPC-DS: Total hot run time: 172620 ms |
There was a problem hiding this comment.
Summary:
This PR is not ready as-is. I found one additional test issue, and the already-open inline thread about final projection ordering still appears applicable at the current head: the final projection still appends pulled-up expressions after non-pulled slots, so nested expressions before later selected columns can return columns in the wrong order. I did not duplicate that existing inline comment.
Critical checkpoint conclusions:
- Goal and proof: The goal is to extend TopN lazy materialization to nested/PPD expressions and verify nested pruning. The implementation partially targets that, but current evidence is insufficient because the added column-order
qt_checks are missing expected output, and the existing projection-order correctness issue remains. - Scope and clarity: The change is focused on TopN lazy materialization and explain/test coverage, but the plan restructuring path is complex and must preserve output ordering exactly.
- Concurrency: No new shared mutable concurrent state or FE locking path was introduced in the reviewed changes.
- Lifecycle/static initialization: No new special lifecycle or cross-TU/static initialization concerns found.
- Configuration: No new configuration item was added. Existing session variables are used.
- Compatibility: No storage format or FE-BE protocol incompatibility found in the reviewed diff.
- Parallel code paths: The change targets the Nereids TopN lazy-materialization path; I did not find another directly equivalent modified path requiring the same patch.
- Conditional checks: The added shape checks skip unsupported plan shapes; the risky final-output condition is already covered by the existing review thread.
- Test coverage: Added regression coverage exists, but the new result blocks are incomplete in the
.outfile, so the suite will fail or not validate the intended cases correctly. - Test results: The
.outfile is missing expected results for the newly added column-orderqt_labels. - Observability: Existing explain output is extended for nested columns; no additional runtime observability requirement found.
- Transaction/persistence/data writes: Not applicable to this optimizer/post-processing change.
- FE-BE variable passing: No new transmitted variable requiring scattered send-path updates found.
- Performance: No obvious hot-path performance regression beyond the existing complexity of the post-processing rewrite.
- User focus: No additional user-provided review focus was specified.
|
run buildall |
|
run buildall |
|
run buildall |
|
/review |
There was a problem hiding this comment.
Automated review summary:
Requesting changes for one new blocking regression-test issue. I read the existing inline review threads first and did not re-submit the already-known topn_lazy_nested_column_pruning output mismatch or the earlier column-order issue.
Critical checkpoint conclusions:
- Goal/test proof: the PR aims to expand TopN lazy materialization by pulling projection expressions above TopN. The implementation is accompanied by regression coverage, but one new suite cannot pass because its expected-result file is missing.
- Scope/focus: the code changes are mostly focused on Nereids TopN expression pull-up and lazy materialization.
- Concurrency/lifecycle: no new concurrent state, locking path, or special lifecycle concern was found in the reviewed changes.
- Configuration: a new experimental session variable is added and forwarded; no dynamic config issue found.
- Compatibility/persistence: no storage format, EditLog, or FE-BE thrift compatibility issue found beyond existing lazy materialization protocol usage.
- Parallel code paths: Nereids logical rewrite and post-physical lazy materialization paths were both considered; no additional parallel-path blocker found.
- Test coverage/results: blocking issue below; also, the existing thread already covers missing labels in
topn_lazy_nested_column_pruning.out. - Observability/performance/data correctness: no additional blocker found in these areas during review.
User focus: no additional user-provided review focus was specified.
TPC-H: Total hot run time: 32411 ms |
TPC-DS: Total hot run time: 171814 ms |
TPC-H: Total hot run time: 31458 ms |
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add a focused unit test for the minimal TopN pull-up pattern where an upper Project forwards a slot produced by a lower Project. The test documents that the forwarding Project must pass through the expression input slot after the lower Project removes the pulled-up expression.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- mvn -f fe/pom.xml test -pl fe-core -am -Dmaven.build.cache.enabled=false -Dtest=PullUpProjectExprUnderTopNTest#testPullUpThroughForwardedSlotFromLowerProject -DfailIfNoTests=false -DskipITs -Dcheckstyle.skip -Dspotless.check.skip=true
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add an explanatory comment for simplifyProject with the forwarded-slot pattern it handles after TopN expression pull-up.
### Release note
None
### Check List (For Author)
- Test: Manual test
- cd fe && mvn checkstyle:check -pl fe-core -DskipTests
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: PullUpProjectExprUnderTopN could restore a pulled-up expression above TopN without preserving the expression's input slots through the TopN. When combined with TopN pushdown through Project and Join, required base columns such as k2 could be pruned from the join child output while an upper Project still referenced them, producing an invalid plan. This change keeps pulled-up expression inputs as pass-through slots and makes TopN-through-Project-Join pushdown preserve Project input slots on the pushed side.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- mvn -f fe/pom.xml test -pl fe-core -am -Dmaven.build.cache.enabled=false -Dtest=PullUpProjectExprUnderTopNTest,PushDownTopNThroughJoinTest -DfailIfNoTests=false -DskipITs -Dcheckstyle.skip -Dspotless.check.skip=true
- cd fe && mvn checkstyle:check -pl fe-core -DskipTests
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Fix several TopN lazy materialization and TopN expression pull-up regressions found by review10. Lazy TopN no longer used OperativeColumnDerive, so MOR/AGG/PRIMARY key columns could be lazily materialized incorrectly; the key-column requirement is now collected in LazyMaterializeTopN and removed from OperativeColumnDerive. TopN through Join now preserves join condition slots when pushing TopN to one side. TopN expression pull-up now avoids score() and variant-slot expressions, and resolves aliases that reference unavailable pulled-up slots. The topn_expr_pullup expected output is regenerated for the added join cases.
### Release note
None
### Check List (For Author)
- Test: Regression test / Unit Test
- ./run-regression-test.sh --run -d query_p0/topn_lazy -s topn_lazy_on_data_model
- ./run-regression-test.sh --run -f regression-test/suites/variant_p0/rqg/rqg5.sql
- ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s null_column_pruning
- ./run-regression-test.sh --run -d inverted_index_p0 -s test_bm25_score
- ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s topn_expr_pullup
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PullUpProjectExprUnderTopNTest
- cd fe && mvn checkstyle:check -pl fe-core
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found a blocking correctness regression in the latest TopN expression pull-up changes. The rule can now expose internal pass-through base slots as columns of the upper project, so a query that originally returns N columns can be rewritten to return N+M columns.
Critical checkpoints:
- Goal/test: the PR aims to pull eligible project expressions above TopN while preserving query semantics and enabling lazy materialization. Current code does not fully accomplish this because the restored project can change output arity; existing tests even assert the extra output in some cases, so coverage needs correction.
- Scope/focus: the change is focused, but the pass-through-slot fix leaks implementation slots into user-visible output.
- Concurrency/lifecycle/config: no relevant concurrency, lifecycle, or config concerns found in the reviewed FE optimizer paths.
- Compatibility/persistence/transactions/data writes: no storage-format, EditLog, transaction, or data-write compatibility concerns found.
- Parallel paths: the issue is specific to the new
PullUpProjectExprUnderTopNupper-project reconstruction; previously reported LazyMaterializeTopN output-order and ordinal-mapping issues are separate known threads and were not duplicated. - Test coverage: tests need to assert the rewritten root output still matches the original query output count/order, and regression output should cover this.
- Observability/performance: no additional observability requirement; performance benefits are secondary until semantic preservation is fixed.
User focus: no additional user-provided review focus was specified.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: PullUpProjectExprUnderTopN restored pulled expressions above TopN, but also appended the input slots used only to compute those expressions into the restored upper Project output. For a root Project such as [x = a + 1, b, id], this exposed internal pass-through slot a and changed the user-visible schema to [x, a, b, id]. The fix keeps required input slots available below TopN while restoring only the original TopN output columns in the upper Project.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PullUpProjectExprUnderTopNTest
- cd fe && mvn checkstyle:check -pl fe-core
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
|
/review |
There was a problem hiding this comment.
Review summary:
I found one additional blocking issue beyond the already-known TopN output-shape/order threads: the new logical TopN expression pull-up can move volatile/unique functions across TopN.
Critical checkpoint conclusions:
- Goal/test: the PR aims to expand TopN lazy materialization. There are tests for many deterministic expression cases, but no volatile-expression negative test for the new pull-up rule.
- Scope/focus: the implementation is broad but mostly focused on TopN lazy materialization. The issue is in the core new rewrite rule and affects default-on behavior.
- Concurrency/lifecycle/config: no new concurrency or lifecycle blocker found. The new session variable is forwardable and gated, but default-on behavior increases the impact of the rewrite bug.
- Compatibility/persistence/transactions: no storage-format, EditLog, or transaction persistence changes found.
- Parallel paths: volatile-expression safeguards should match other Nereids rewrite rules that avoid moving
containsVolatileExpression()expressions. - Tests/results: regression
.outfiles are now present for the earlier missing-output concerns, but coverage is still missing for the issue in this comment. - Observability/performance: no additional observability blocker found. Performance motivation is clear, but correctness must take precedence.
User focus: no additional user-provided review focus was supplied.
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: TopN lazy materialization can wrap the original physical olap scan with PhysicalLazyMaterializeOlapScan. The topn-filter target relation may be recorded with the wrapper while the legacy OlapScanNode is generated from the inner scan, or vice versa, so relation-key lookup can miss the target and the legacy scan is not registered for TOPN OPT. Physical result sinks also expose final select-list output expressions, which can make select * mark every scan output as an operative slot if OperativeColumnDerive is run on the physical sink tree. This change skips PhysicalResultSink output expressions when deriving operative slots and makes TopnFilterContext translate lazy materialize olap scan targets through the inner scan as well.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.OperativeColumnDeriveTest,org.apache.doris.nereids.postprocess.TopNRuntimeFilterTest
- cd fe && mvn checkstyle:check -pl fe-core
- Manual test: explain select * from orders order by o_orderkey limit 10 shows TOPN OPT:1 on local cluster
- Regression test attempted: ./run-regression-test.sh --run -d nereids_tpch_p0/tpch -s topn-filter; explain checks passed but qt_complexTopn failed because current cluster returned empty results
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29408 ms |
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: TopN lazy materialization should allow payload sub-path expressions such as substring(element_at(payload, 'name'), 1) to be pulled above TopN so the scan can keep nested column pruning and defer the expression to VMaterializeNode. A previous conservative guard rejected every expression containing a VARIANT slot, which forced payload.name to be computed in the scan final projections and broke topn_lazy_nested_column_pruning. Remove that broad VARIANT guard while keeping the existing non-movable function and score() protections.
### Release note
None
### Check List (For Author)
- Test: Regression test / Unit Test
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PullUpProjectExprUnderTopNTest
- cd fe && mvn checkstyle:check -pl fe-core
- ./build.sh --fe
- ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s topn_lazy_nested_column_pruning
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TPC-H: Total hot run time: 29058 ms |
TPC-DS: Total hot run time: 169813 ms |
TPC-DS: Total hot run time: 169019 ms |
FE Regression Coverage ReportIncrement line coverage |
…ubquery When a correlated scalar subquery appears in a LEFT OUTER JOIN ON clause, a nested TopN is created by LIMIT pushdown. The inner TopN's collector used fresh blockedExprIds that did not inherit outer join condition slots, allowing expressions like element_at(var) to be pulled up from projects below the inner TopN. This triggered a cascade where downstream projects were simplified, intermediate Joins were rebuilt, and eventually the aggregate result slot (AVG) from the correlated subquery was lost during project reconstruction, crashing addUpperProject. Fix: 1. (root cause) CollectorContext now tracks outer blockedExprIds for nested TopNs via collectFromNode. The inner TopN's visitLogicalTopN merges them so join condition slots correctly block pull-up. 2. (safety net) addUpperProject handles missing slots gracefully by passing them through directly instead of crashing. Add regression test testCorrelatedSubqueryWithNestedTopN. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
run buildall |
|
/review |
|
run buildall |
|
run buildall |
|
/review |
FE UT Coverage ReportIncrement line coverage |
Extend TopN lazy materialization to defer reading complex-type base columns
(struct, variant, map, array) until after TopN filtering, and expand the
scope to all non-trivial projection expressions.
Core Changes
PullUpProjectExprUnderTopN (new CustomRewriter):
TopNs, then walks into descendants (through Join/Filter) to find Projects
with pull-able expressions. Replacer simplifies found Projects bottom-up
and adds upper Projects to restore pulled-up expressions.
not referenced by TopN order keys, no NoneMovableFunction.
that reference pulled-up outputs (cleared/removed).
LazyMaterializeTopN (simplified):
CustomRewriter, eliminating hard-coded
MERGE_SORT→Distribute→LOCAL_SORT→Projectshape walking. Now only handles MaterializeNode insertion.
OperativeColumnDerive:
complex-type base columns can be lazy.
Other:
PhysicalLazyMaterialize: propagate access paths to lazy output slotsfor nested column/subPath pruning on BE.
MaterializationNode/PlanNode: fix nested column display in EXPLAIN.NoneMovableFunction: fix missing interface name.enable_topn_expr_pullupfor rollback.Tests:
topn_expr_pullup: 15 test cases covering struct/variant/map/array, non-PPDexpressions, joins, column order preservation, negative cases.
topn_lazy_nested_column_pruning: 17 test cases for struct/variant nestedpruning + map/array lazy mat + multi-level variant nesting.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)