Skip to content

[fix](be) Fix TopN runtime filter activation#63969

Open
BiteTheDDDDt wants to merge 2 commits into
apache:masterfrom
BiteTheDDDDt:fix-topn-runtime-filter-activation
Open

[fix](be) Fix TopN runtime filter activation#63969
BiteTheDDDDt wants to merge 2 commits into
apache:masterfrom
BiteTheDDDDt:fix-topn-runtime-filter-activation

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Jun 1, 2026

What problem does this PR solve?

Problem Summary: #59088 changed TopN runtime predicate target initialization to rely on a storage column id. For targets that cannot create a storage column predicate, such as non-pushdown TopN predicates or unsupported storage columns, init_target returned before marking the target as detected. That left RuntimePredicate disabled, so the scan side ignored the TopN source even though FE had sent the source id. This PR keeps the target detected when no storage predicate is created, removes obsolete compatibility skips for missing runtime predicate descs, and adds FE/BE coverage for the source marking and no-column target paths.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./run-fe-ut.sh --run org.apache.doris.qe.runtime.ThriftPlansBuilderTest,org.apache.doris.qe.CoordinatorTest
    • ./run-be-ut.sh --run --filter=RuntimePredicateTest.*
    • cd fe && mvn checkstyle:check -pl fe-core -q
    • build-support/check-format.sh
  • Behavior changed: No
  • Does this need documentation: No

BiteTheDDDDt and others added 2 commits June 1, 2026 19:43
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: TopN runtime filters record their source node ids on the target scan node thrift, but the coordinator only marked OlapScan-related source SortNodes as runtime-predicate sources before serializing the plan. For non-Olap scans such as external file scans, the source SortNode could keep a non-runtime-predicate sort algorithm for larger limits, so BE received the ids but filtered them out because the runtime predicate source was never detected. This change marks every scan node's TopN filter source SortNode consistently in both coordinator thrift paths.

### Release note

Fix TopN runtime filter activation for non-Olap scan targets.

### Check List (For Author)

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.qe.runtime.ThriftPlansBuilderTest,org.apache.doris.qe.CoordinatorTest#testTopnFilterDescsSharedAmongInstances
    - ./build.sh --fe
- Behavior changed: Yes. TopN runtime filters on non-Olap scan targets now activate consistently with their serialized source node ids.
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#59088

Problem Summary: apache#59088 changed TopN runtime predicate target initialization to rely on a storage column id. For targets that cannot create a storage column predicate, such as non-pushdown TopN predicates or unsupported storage columns, init_target returned before marking the target as detected. That left RuntimePredicate disabled, so the scan side ignored the TopN source even though FE had sent the source id. This change keeps the target detected when no storage predicate is created, removes obsolete compatibility skips for missing runtime predicate descs, and adds FE/BE coverage for the source marking and no-column target paths.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.qe.runtime.ThriftPlansBuilderTest,org.apache.doris.qe.CoordinatorTest
    - ./run-be-ut.sh --run --filter=RuntimePredicateTest.*
    - cd fe && mvn checkstyle:check -pl fe-core -q
    - build-support/check-format.sh
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 1, 2026 13:07
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes TopN runtime predicate activation when the target cannot produce a storage-column predicate (e.g. non-pushdown TopN predicates or unsupported storage columns), ensuring the scan side still honors the TopN source id sent by FE. It also broadens FE-side “has runtime predicate” marking beyond OlapScanNode and adds FE/BE unit tests to cover the relevant paths.

Changes:

  • BE: mark TopN runtime predicate target as detected even when column_id < 0 (no storage predicate created), so RuntimePredicate can be enabled.
  • FE: mark TopN runtime predicate sources for all scan nodes (not just OlapScanNode) during plan-to-thrift.
  • Tests: add FE unit tests for source marking and BE unit tests for “no-column target” enablement.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
fe/fe-core/src/test/java/org/apache/doris/qe/runtime/ThriftPlansBuilderTest.java Adds a unit test ensuring runtime predicate marking occurs for non-Olap scan nodes.
fe/fe-core/src/test/java/org/apache/doris/qe/CoordinatorTest.java Adds a unit test ensuring fragment thrift conversion marks TopN filter sources for non-Olap scans.
fe/fe-core/src/main/java/org/apache/doris/qe/runtime/ThriftPlansBuilder.java Removes OlapScanNode restriction and exposes helper for test coverage.
fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java Removes OlapScanNode restriction when marking TopN runtime predicate presence.
be/test/runtime/runtime_predicate_test.cpp Adds BE unit tests for init-target behavior with/without a valid column id.
be/src/runtime/runtime_predicate.cpp Ensures targets are detected even when no column predicate can be created.
be/src/exec/operator/scan_operator.h Removes legacy skip for missing runtime predicate descs when reading TopN sources.
be/src/exec/operator/scan_operator.cpp Removes legacy skip for missing runtime predicate descs when initializing targets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 58 to 64
Status RuntimePredicate::init_target(
int32_t target_node_id, phmap::flat_hash_map<int, SlotDescriptor*> slot_id_to_slot_desc,
const int column_id) {
if (column_id < 0) {
_detected_target = true;
return Status::OK();
}
Comment on lines 258 to 262
std::vector<int> get_topn_filter_source_node_ids(RuntimeState* state, bool push_down) {
std::vector<int> result;
for (int id : _parent->cast<typename Derived::Parent>()._topn_filter_source_node_ids) {
if (!state->get_query_ctx()->has_runtime_predicate(id)) {
// compatible with older versions fe
continue;
}

const auto& pred = state->get_query_ctx()->get_runtime_predicate(id);
if (!pred.enable()) {
Comment on lines 1252 to 1255
for (auto id : _topn_filter_source_node_ids) {
if (!state->get_query_ctx()->has_runtime_predicate(id)) {
// compatible with older versions fe
continue;
}

int cid = -1;
if (state->get_query_ctx()->get_runtime_predicate(id).target_is_slot(node_id())) {
auto s = _slot_id_to_slot_desc[state->get_query_ctx()
Comment on lines 58 to 64
Status RuntimePredicate::init_target(
int32_t target_node_id, phmap::flat_hash_map<int, SlotDescriptor*> slot_id_to_slot_desc,
const int column_id) {
if (column_id < 0) {
_detected_target = true;
return Status::OK();
}
Comment on lines 258 to 262
std::vector<int> get_topn_filter_source_node_ids(RuntimeState* state, bool push_down) {
std::vector<int> result;
for (int id : _parent->cast<typename Derived::Parent>()._topn_filter_source_node_ids) {
if (!state->get_query_ctx()->has_runtime_predicate(id)) {
// compatible with older versions fe
continue;
}

const auto& pred = state->get_query_ctx()->get_runtime_predicate(id);
if (!pred.enable()) {
Comment on lines 1252 to 1255
for (auto id : _topn_filter_source_node_ids) {
if (!state->get_query_ctx()->has_runtime_predicate(id)) {
// compatible with older versions fe
continue;
}

int cid = -1;
if (state->get_query_ctx()->get_runtime_predicate(id).target_is_slot(node_id())) {
auto s = _slot_id_to_slot_desc[state->get_query_ctx()
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full PR against the Doris code-review checklist. I found one additional distinct issue and also confirmed the existing review threads are still blocking.

Critical checkpoint conclusions:

  • Goal and proof: The PR aims to activate TopN runtime predicates for non-Olap/no-column-predicate cases. It partially achieves this, with FE and BE unit coverage, but still misses a slot-target no-storage-predicate case for VARBINARY.
  • Scope and clarity: The FE changes are small and focused; the BE change is small but incomplete around unsupported storage-predicate columns.
  • Concurrency: RuntimePredicate is shared between sort/source update and scan/target reads. Existing inline threads already cover the unlocked _detected_target write in init_target(), which remains a blocker.
  • Lifecycle/static initialization: No new static/global lifecycle hazards found.
  • Configuration/compatibility: No new configs. Existing inline threads already cover removal of mixed-version/missing-desc compatibility guards before get_runtime_predicate(), which remains a blocker.
  • Parallel paths: Both Coordinator and ThriftPlansBuilder paths are updated for non-Olap source marking. The BE fallback path is not consistently applied to VARBINARY slot targets.
  • Tests: Unit tests were added, but they do not cover the VARBINARY slot path or missing-desc compatibility.
  • Observability/performance/transactions/persistence: No new concerns found for this PR.

User focus: No additional user-provided review focus was specified.

Because the PR still has blocking correctness/concurrency concerns, including the existing already-raised threads plus the additional VARBINARY fallback issue below, I am requesting changes.

continue;
}

int cid = -1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new column_id < 0 fallback does not cover slot targets whose storage predicate is skipped for TYPE_VARBINARY. In this loop cid starts at -1, but the later VARBINARY branch still continues before calling init_target(), so _detected_target remains false and get_topn_filter_source_node_ids() filters the runtime predicate out. RuntimePredicate::_init() and VTopNPred both support varbinary comparisons, so this should fall back to init_target(node_id(), ..., -1) instead of leaving the TopN runtime filter disabled for ORDER BY varbinary_col LIMIT ....

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29433 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 38a8f546ca3208baedebe951897213adc20f8bf3, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17611	4018	4098	4018
q2	q3	10749	1424	805	805
q4	4685	483	350	350
q5	7549	966	587	587
q6	179	175	136	136
q7	788	849	656	656
q8	9411	1606	1575	1575
q9	5856	4616	4502	4502
q10	6670	1801	1563	1563
q11	445	276	252	252
q12	631	424	289	289
q13	18120	3373	2778	2778
q14	265	255	239	239
q15	q16	814	779	709	709
q17	1012	932	1037	932
q18	6993	5752	5581	5581
q19	1286	1357	1114	1114
q20	526	395	281	281
q21	6285	2822	2731	2731
q22	471	383	335	335
Total cold run time: 100346 ms
Total hot run time: 29433 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5134	4773	4824	4773
q2	q3	4863	5331	4698	4698
q4	2100	2195	1388	1388
q5	4891	4875	4688	4688
q6	232	173	127	127
q7	1887	1794	1608	1608
q8	2408	2125	2108	2108
q9	7950	7789	7422	7422
q10	4771	4692	4212	4212
q11	534	429	365	365
q12	735	735	529	529
q13	3046	3378	2841	2841
q14	265	278	253	253
q15	q16	680	691	604	604
q17	1297	1255	1258	1255
q18	7213	6902	6875	6875
q19	1095	1120	1135	1120
q20	2302	2232	1930	1930
q21	5274	4710	4388	4388
q22	532	455	399	399
Total cold run time: 57209 ms
Total hot run time: 51583 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171009 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 38a8f546ca3208baedebe951897213adc20f8bf3, data reload: false

query5	4352	653	507	507
query6	344	217	200	200
query7	4226	545	299	299
query8	348	240	221	221
query9	8791	4112	4081	4081
query10	450	358	306	306
query11	5808	2362	2200	2200
query12	183	135	126	126
query13	1265	613	438	438
query14	6118	5459	5194	5194
query14_1	4442	4390	4419	4390
query15	213	207	184	184
query16	1008	465	424	424
query17	1115	715	581	581
query18	2449	497	352	352
query19	220	206	162	162
query20	143	132	131	131
query21	225	135	113	113
query22	13752	13591	13451	13451
query23	17456	16496	16291	16291
query23_1	16364	16382	16268	16268
query24	7566	1771	1312	1312
query24_1	1324	1297	1313	1297
query25	538	501	420	420
query26	1308	312	174	174
query27	2715	552	344	344
query28	4447	2031	2038	2031
query29	1046	647	505	505
query30	314	227	193	193
query31	1153	1076	962	962
query32	89	77	76	76
query33	552	361	298	298
query34	1204	1109	638	638
query35	786	802	710	710
query36	1379	1458	1246	1246
query37	159	104	89	89
query38	3227	3186	3076	3076
query39	945	928	907	907
query39_1	876	918	857	857
query40	233	150	127	127
query41	66	62	63	62
query42	114	111	113	111
query43	334	341	288	288
query44	
query45	212	207	195	195
query46	1053	1208	739	739
query47	2303	2333	2336	2333
query48	401	406	311	311
query49	652	497	387	387
query50	1002	348	260	260
query51	4309	4457	4310	4310
query52	109	108	95	95
query53	255	301	205	205
query54	320	285	286	285
query55	95	92	86	86
query56	312	302	309	302
query57	1453	1448	1364	1364
query58	314	283	279	279
query59	1594	1665	1475	1475
query60	331	313	320	313
query61	163	162	179	162
query62	713	670	598	598
query63	251	215	217	215
query64	2492	849	695	695
query65	
query66	1742	503	382	382
query67	29660	29764	29649	29649
query68	
query69	477	351	310	310
query70	1014	975	983	975
query71	324	291	290	290
query72	3209	3142	2402	2402
query73	855	778	447	447
query74	5117	4925	4788	4788
query75	2720	2614	2273	2273
query76	2262	1198	796	796
query77	401	410	334	334
query78	12430	12453	11995	11995
query79	1488	1060	739	739
query80	652	533	462	462
query81	449	283	241	241
query82	1397	165	120	120
query83	353	274	251	251
query84	265	140	114	114
query85	882	580	458	458
query86	399	349	324	324
query87	3429	3370	3265	3265
query88	3682	2787	2758	2758
query89	449	396	347	347
query90	2035	186	175	175
query91	182	166	139	139
query92	84	77	78	77
query93	1468	1527	924	924
query94	537	308	324	308
query95	683	386	437	386
query96	1013	761	345	345
query97	2737	2722	2634	2634
query98	247	234	245	234
query99	1152	1187	1037	1037
Total cold run time: 254719 ms
Total hot run time: 171009 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.13% (21071/38925)
Line Coverage 37.63% (199599/530429)
Region Coverage 33.92% (156468/461248)
Branch Coverage 34.86% (67968/194961)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 50.00% (2/4) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.13% (27492/38114)
Line Coverage 55.52% (293723/529052)
Region Coverage 52.61% (244991/465675)
Branch Coverage 53.82% (105336/195711)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants