Skip to content

[feature](cloud) Add table-level event-driven warm up#63832

Open
bobhan1 wants to merge 7 commits into
apache:masterfrom
bobhan1:pick-table-level-warmup
Open

[feature](cloud) Add table-level event-driven warm up#63832
bobhan1 wants to merge 7 commits into
apache:masterfrom
bobhan1:pick-table-level-warmup

Conversation

@bobhan1
Copy link
Copy Markdown
Contributor

@bobhan1 bobhan1 commented May 28, 2026

What problem does this PR solve?

Issue Number: None

Problem Summary:

This PR adds table-level event-driven cloud warm-up support and improves active incremental warm-up progress observability.

Before this change, event-driven warm-up was only controlled at compute-group granularity. Once a load-event warm-up job was enabled for a source and target compute group pair, all source-side table writes could trigger warm-up to the target compute group. That is inefficient for workloads where only selected core tables, high-frequency query tables, or selected async materialized views need to stay warm.

This PR lets users define the warm-up scope with ON TABLES when creating an event-driven load warm-up job. FE persists the normalized table filter in the warm-up job, resolves matched table ids dynamically, sends the table ids to BE, and lets BE filter warm-up rowsets by table id.

User-visible behavior:

  • WARM UP ... ON TABLES supports table-level event-driven warm-up.
  • Table filters support INCLUDE and EXCLUDE rules.
  • Rules support * and ? wildcards, for example db.table, db.*, *.orders_*, and log_db.log_?.
  • INCLUDE defines the candidate warm-up scope, and EXCLUDE removes tables from that included scope.
  • Rules are canonicalized before duplicate checks, so semantically equivalent filters do not create duplicate jobs just because rule order differs.
  • Matching covers both regular OLAP tables and async materialized views.
  • Matched table ids are refreshed as tables or async materialized views are created, dropped, or renamed.
  • The same source compute group can create independent table-level warm-up jobs to different target compute groups with different table filters.
  • SHOW WARM UP JOB exposes the table-level job type, table filter, matched tables, and SyncStats.
  • SHOW WARM UP JOB list output keeps compact SyncStats, while single-job lookup keeps detailed windowed SyncStats.

Example:

WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
    INCLUDE 'core_db.config',
    INCLUDE 'report_db.monthly_*',
    INCLUDE '*.sales_*',
    EXCLUDE '*.*_archive'
)
PROPERTIES (
    "sync_mode" = "event_driven",
    "sync_event" = "load"
);

Conflict and virtual compute group behavior:

  • Table-level load-event warm-up and cluster-level load-event warm-up are mutually exclusive for the same source and target compute group pair.
  • If a conflicting job already exists, creation returns an error that includes the conflicting job id; table-level conflicts also include the table filter.
  • Duplicate checks within the same job type still follow the existing duplicate-check logic.
  • VCG-managed cluster-level load-event warm-up creation does not fail on conflict. Because VCG jobs are created by the MS HTTP API path, FE cancels existing table-level load-event warm-up jobs with the same source and target first, then recreates the VCG-managed cluster-level job.
  • Manually creating a table-level load-event warm-up job is rejected only when both source and target compute groups are owned by the same VCG.
  • SQL still cannot use a virtual compute group directly as the source or target compute group.

Warm-up progress observation:

  • BE records per-job windowed requested, finished, and failed warm-up statistics.
  • BE exposes per-job warm-up statistics through /api/warmup_event_driven_stats.
  • FE aggregates BE statistics and caches the aggregated result in the warm-up job.
  • SyncStats includes source-side and target-side warm-up size/count progress across windows.
  • SyncStats includes trigger-time progress, so users can observe whether the target compute group is behind the latest source-side warm-up trigger.
  • FE /metrics exposes per-job active warm-up metadata, synchronized size, and trigger gap metrics for cloud event-driven warm-up jobs.

Release note

Support table-level event-driven cloud warm-up with ON TABLES filters and per-job warm-up sync statistics.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. WARM UP supports table-level ON TABLES filters for event-driven load warm-up, and warm-up job output/metrics expose table filter, matched tables, SyncStats, and trigger-gap information.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@bobhan1 bobhan1 requested a review from gavinchou as a code owner May 28, 2026 09:28
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 force-pushed the pick-table-level-warmup branch from 65920e0 to b67c9f7 Compare May 28, 2026 10:37
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 28, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31875 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b67c9f73c11b7a8e7fa7f2ba1eab5feb84fbd9ed, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17737	4084	4152	4084
q2	q3	10995	1468	826	826
q4	4827	487	349	349
q5	10674	2321	2093	2093
q6	389	189	140	140
q7	983	789	655	655
q8	9591	1796	1599	1599
q9	7101	5033	5078	5033
q10	6505	2244	1897	1897
q11	438	289	251	251
q12	655	436	313	313
q13	18214	3481	2843	2843
q14	270	263	244	244
q15	q16	826	780	714	714
q17	1005	888	1001	888
q18	7050	5670	6279	5670
q19	1246	1303	1071	1071
q20	515	427	278	278
q21	5995	2809	2622	2622
q22	460	370	305	305
Total cold run time: 105476 ms
Total hot run time: 31875 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5051	4825	4795	4795
q2	q3	5000	5258	4703	4703
q4	2193	2254	1423	1423
q5	4804	4653	4842	4653
q6	232	187	132	132
q7	1909	1643	1408	1408
q8	2243	1983	1963	1963
q9	7483	7489	7484	7484
q10	4779	4702	4225	4225
q11	540	383	353	353
q12	733	747	538	538
q13	3027	3347	2835	2835
q14	282	287	259	259
q15	q16	696	700	613	613
q17	1302	1265	1255	1255
q18	7405	7002	6860	6860
q19	1121	1130	1134	1130
q20	2246	2252	1949	1949
q21	5354	4648	4514	4514
q22	541	471	435	435
Total cold run time: 56941 ms
Total hot run time: 51527 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172324 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b67c9f73c11b7a8e7fa7f2ba1eab5feb84fbd9ed, data reload: false

query5	4306	703	524	524
query6	337	230	210	210
query7	4230	600	307	307
query8	320	238	224	224
query9	8814	4119	4132	4119
query10	459	349	310	310
query11	5793	2413	2228	2228
query12	181	137	126	126
query13	1283	590	459	459
query14	6132	5521	5179	5179
query14_1	4562	4546	4570	4546
query15	241	209	188	188
query16	1008	450	349	349
query17	1133	722	588	588
query18	2535	495	346	346
query19	210	199	159	159
query20	138	129	132	129
query21	217	143	115	115
query22	13648	13522	13336	13336
query23	17276	16547	16272	16272
query23_1	16401	16410	16421	16410
query24	7384	1802	1343	1343
query24_1	1345	1352	1338	1338
query25	554	487	422	422
query26	1300	325	174	174
query27	2743	546	338	338
query28	4465	2005	1988	1988
query29	1002	623	511	511
query30	314	245	201	201
query31	1140	1095	944	944
query32	88	76	74	74
query33	542	349	286	286
query34	1184	1181	666	666
query35	787	813	702	702
query36	1409	1413	1235	1235
query37	158	109	97	97
query38	3210	3183	3076	3076
query39	951	935	922	922
query39_1	883	910	886	886
query40	232	154	131	131
query41	72	69	69	69
query42	119	110	113	110
query43	345	346	305	305
query44	
query45	218	213	199	199
query46	1107	1245	772	772
query47	2385	2412	2262	2262
query48	399	422	320	320
query49	644	511	396	396
query50	1031	361	257	257
query51	4368	4296	4307	4296
query52	110	109	97	97
query53	271	287	211	211
query54	334	294	286	286
query55	98	93	86	86
query56	323	350	323	323
query57	1426	1411	1302	1302
query58	303	284	268	268
query59	1624	1709	1472	1472
query60	339	345	321	321
query61	179	176	181	176
query62	700	665	604	604
query63	252	213	213	213
query64	2482	868	695	695
query65	
query66	1705	479	362	362
query67	30155	29757	29615	29615
query68	
query69	474	342	307	307
query70	1033	1034	976	976
query71	318	273	267	267
query72	2981	2690	2346	2346
query73	891	816	427	427
query74	5122	4968	4821	4821
query75	2702	2620	2271	2271
query76	2298	1200	808	808
query77	418	419	345	345
query78	12459	12560	11900	11900
query79	1470	1068	740	740
query80	1178	548	461	461
query81	506	284	238	238
query82	1346	163	125	125
query83	350	283	252	252
query84	260	148	149	148
query85	951	539	474	474
query86	457	329	318	318
query87	3457	3395	3258	3258
query88	3597	2746	2720	2720
query89	465	385	346	346
query90	1811	193	176	176
query91	189	180	140	140
query92	82	81	72	72
query93	1521	1519	870	870
query94	641	358	339	339
query95	693	495	351	351
query96	1078	791	363	363
query97	2737	2762	2594	2594
query98	242	231	247	231
query99	1192	1173	1017	1017
Total cold run time: 255869 ms
Total hot run time: 172324 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 4.47% (40/894) 🎉
Increment coverage report
Complete coverage report

bobhan1 added a commit to bobhan1/doris that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up change adds a table_id argument before sync_wait_timeout_ms in CloudWarmUpManager::warm_up_rowset. After rebasing onto the latest master, the existing CloudWarmUpManagerTest calls still used the old two-argument form, so the positive-timeout test passed 1000 as table_id and left sync_wait_timeout_ms at its default -1. That made the test take the async non-positive-timeout branch, so the before-wait sync point was never reached and the spurious notify assertion failed. Update the test calls to pass table_id and sync_wait_timeout_ms explicitly.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter=CloudWarmUpManagerTest.* -j100
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 29, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31958 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ebec831e0da33fb6cc3c0b2899c553d7d928afde, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17731	4070	4024	4024
q2	q3	10846	1417	785	785
q4	4697	483	346	346
q5	7631	2285	2157	2157
q6	234	174	139	139
q7	983	797	648	648
q8	9354	1749	1810	1749
q9	5146	4985	4980	4980
q10	6409	2229	1883	1883
q11	431	266	242	242
q12	633	429	295	295
q13	18077	3440	2783	2783
q14	272	262	248	248
q15	q16	825	781	711	711
q17	1017	941	1058	941
q18	6954	5813	5632	5632
q19	1174	1139	1078	1078
q20	571	432	320	320
q21	5857	2901	2690	2690
q22	591	364	307	307
Total cold run time: 99433 ms
Total hot run time: 31958 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4755	4892	4706	4706
q2	q3	4999	5319	4662	4662
q4	2113	2209	1396	1396
q5	5006	4794	4799	4794
q6	229	178	132	132
q7	1885	1782	1559	1559
q8	2424	2191	2116	2116
q9	7958	7481	7409	7409
q10	4750	4705	4252	4252
q11	533	382	355	355
q12	722	738	515	515
q13	3026	3328	2849	2849
q14	270	279	249	249
q15	q16	668	698	616	616
q17	1286	1256	1255	1255
q18	7289	6851	6944	6851
q19	1110	1095	1097	1095
q20	2251	2298	1942	1942
q21	5293	4548	4462	4462
q22	537	449	398	398
Total cold run time: 57104 ms
Total hot run time: 51613 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172417 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ebec831e0da33fb6cc3c0b2899c553d7d928afde, data reload: false

query5	4320	672	517	517
query6	345	229	210	210
query7	4257	564	338	338
query8	328	244	225	225
query9	8808	4018	4055	4018
query10	446	361	302	302
query11	5789	2595	2239	2239
query12	180	131	126	126
query13	1291	623	467	467
query14	6094	5485	5181	5181
query14_1	4492	4504	4453	4453
query15	215	208	187	187
query16	997	503	463	463
query17	1155	765	618	618
query18	2538	513	376	376
query19	238	234	174	174
query20	141	136	134	134
query21	226	142	124	124
query22	13630	13634	13328	13328
query23	17399	16582	16309	16309
query23_1	16336	16311	16482	16311
query24	7418	1769	1304	1304
query24_1	1304	1320	1336	1320
query25	546	466	421	421
query26	1320	322	175	175
query27	2695	543	346	346
query28	4413	2001	2006	2001
query29	978	628	502	502
query30	311	231	203	203
query31	1116	1081	960	960
query32	96	75	77	75
query33	574	355	292	292
query34	1224	1133	648	648
query35	763	796	731	731
query36	1440	1448	1272	1272
query37	150	103	88	88
query38	3220	3162	3107	3107
query39	931	923	893	893
query39_1	886	870	886	870
query40	223	149	128	128
query41	66	63	62	62
query42	109	110	109	109
query43	329	332	290	290
query44	
query45	209	204	201	201
query46	1087	1204	723	723
query47	2406	2414	2315	2315
query48	404	407	299	299
query49	631	510	413	413
query50	940	351	249	249
query51	4395	4308	4309	4308
query52	105	105	95	95
query53	254	280	202	202
query54	327	275	259	259
query55	93	100	85	85
query56	303	316	304	304
query57	1443	1448	1360	1360
query58	296	271	272	271
query59	1585	1657	1441	1441
query60	322	336	315	315
query61	164	155	161	155
query62	697	654	592	592
query63	244	206	209	206
query64	2416	821	629	629
query65	
query66	1744	469	356	356
query67	29847	29813	29548	29548
query68	
query69	460	351	311	311
query70	1005	980	974	974
query71	304	279	266	266
query72	3058	2712	2444	2444
query73	898	764	422	422
query74	5102	4978	4818	4818
query75	2705	2632	2237	2237
query76	2266	1141	762	762
query77	406	416	341	341
query78	12547	12473	11857	11857
query79	1481	999	783	783
query80	636	544	464	464
query81	454	275	244	244
query82	1369	158	126	126
query83	356	280	252	252
query84	262	146	141	141
query85	957	546	462	462
query86	408	349	352	349
query87	3439	3373	3226	3226
query88	3626	2716	2706	2706
query89	441	389	349	349
query90	2059	192	179	179
query91	185	174	142	142
query92	78	79	78	78
query93	1527	1479	862	862
query94	559	376	306	306
query95	694	471	348	348
query96	1051	814	347	347
query97	2767	2762	2655	2655
query98	235	235	242	235
query99	1211	1167	1044	1044
Total cold run time: 254755 ms
Total hot run time: 172417 ms

bobhan1 added a commit to bobhan1/doris that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance tests used tight wall-clock thresholds for the 200K and 500K wildcard match-all cases. CI machines can run these scale tests slightly slower than local runs even though the matching implementation remains efficient. Relax the 200K threshold from 1s to 1.5s and the 500K threshold from 2s to 3s while keeping the existing functional assertions and smaller or more selective performance checks.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 29, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.06% (1854/2375)
Line Coverage 64.54% (33342/51663)
Region Coverage 65.24% (16533/25343)
Branch Coverage 55.77% (8841/15854)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 77.03% (664/862) 🎉
Increment coverage report
Complete coverage report

bobhan1 added a commit to bobhan1/doris that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance test for 200K tables with 15 include/exclude rules still used a tight 2s wall-clock threshold. CI can exceed that threshold under load while the matcher remains functionally correct. Relax the threshold to 3s and keep the matched-table assertion unchanged.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 29, 2026

run buildall

gavinchou
gavinchou previously approved these changes May 29, 2026
bobhan1 added 5 commits May 29, 2026 14:42
Issue Number: None

Related PR: None

Problem Summary: Add table-level event-driven warm-up support for cloud warm-up jobs. The change extends WARM UP ... ON TABLES parsing and validation, persists normalized include and exclude table filters, resolves matching table ids dynamically, prevents conflicting cluster-level and table-level load-event jobs, propagates table ids through BE warm-up requests, records per-job source and target warm-up progress metrics, and exposes compact and detailed SyncStats through SHOW WARM UP JOB and FE metrics. Virtual compute group rebuilds cancel existing table-level load-event jobs before recreating managed cluster-level jobs.

Support table-level event-driven cloud warm-up with ON TABLES filters and warm-up sync statistics.

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.OnTablesFilterTest,org.apache.doris.cloud.CloudWarmUpJobTableFilterTest,org.apache.doris.cloud.CacheHotspotManagerTableFilterTest,org.apache.doris.cloud.WarmUpStatsTest,org.apache.doris.cloud.WarmUpClusterOnTablesParseTest,org.apache.doris.cloud.catalog.CloudInstanceStatusCheckerTest,org.apache.doris.metric.MetricsTest#testCloudWarmUpSyncJobMetricsReadStatsDirectlyFromJob+testEventDrivenCloudWarmUpSyncJobTriggerGapMetric
    - Unit Test: ./run-be-ut.sh --run --filter=CloudWarmUpManagerFilterTest.*:MBvarWindowedAdderTest.* -j100
    - Manual test: build-support/check-format.sh
    - Manual test: ./build.sh --be --fe --cloud -j100
    - Manual test: docker build -f docker/runtime/doris-compose/Dockerfile -t bh-cluster-2 .
    - Manual test: ./run-regression-test.sh --clean --compile
    - Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d regression-test/suites/cloud_p0/cache/multi_cluster/warm_up/on_tables -runMode=cloud -image bh-cluster-2 -dockerSuiteParallel 1 (18/19 passed; test_warm_up_event_on_tables_overlap_and_mv failed due test SQL duplicate MV column name before the test was fixed)
    - Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d regression-test/suites/cloud_p0/cache/multi_cluster/warm_up/on_tables -s test_warm_up_event_on_tables_overlap_and_mv -runMode=cloud -image bh-cluster-2 -dockerSuiteParallel 1
- Behavior changed: Yes. WARM UP supports ON TABLES filters for event-driven load warm-up and SHOW WARM UP JOB exposes table filter, matched tables, and sync stats.
- Does this need documentation: Yes. Documentation for the new ON TABLES syntax and metrics should be added separately.
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up change adds a table_id argument before sync_wait_timeout_ms in CloudWarmUpManager::warm_up_rowset. After rebasing onto the latest master, the existing CloudWarmUpManagerTest calls still used the old two-argument form, so the positive-timeout test passed 1000 as table_id and left sync_wait_timeout_ms at its default -1. That made the test take the async non-positive-timeout branch, so the before-wait sync point was never reached and the spurious notify assertion failed. Update the test calls to pass table_id and sync_wait_timeout_ms explicitly.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter=CloudWarmUpManagerTest.* -j100
- Behavior changed: No.
- Does this need documentation: No.
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance tests used tight wall-clock thresholds for the 200K and 500K wildcard match-all cases. CI machines can run these scale tests slightly slower than local runs even though the matching implementation remains efficient. Relax the 200K threshold from 1s to 1.5s and the 500K threshold from 2s to 3s while keeping the existing functional assertions and smaller or more selective performance checks.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance test for 200K tables with 15 include/exclude rules still used a tight 2s wall-clock threshold. CI can exceed that threshold under load while the matcher remains functionally correct. Relax the threshold to 3s and keep the matched-table assertion unchanged.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
static constexpr int WINDOW_30M = 1800;
static constexpr int WINDOW_1H = 3600;

MBvarWindowedAdder g_warmup_ed_finish_segment_num("warmup_ed_finish_segment_num", {"job_id"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any memory issues if there are many jobs.
how does bvar implement "windows", does it recored every smaples of the adder every second?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "ed" mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the bvar implementation again.

bvar::Window does not record every update written to the Adder. For bvar::Adder, the underlying sampler samples the cumulative adder value roughly once per second, and the window value is calculated from the difference between the latest sampled cumulative value and the oldest sampled cumulative value in the requested window.

The 5m/30m/1h windows created for the same Adder also share the same underlying sampler. The sampler queue is sized by the largest window, so here it keeps about 3600 + 1 samples, not 300 + 1800 + 3600 samples and not one sample per warm-up event.

Rough estimate:

  • One Sample<int64_t> stores data and time_us, so it is about 16 bytes.
  • The largest window is 1h, so one sampler queue is about (3600 + 1) * 16 ~= 56KB.
  • Source-side stats have 4 windowed adders, about 4 * 56KB ~= 224KB/job for sampler queues.
  • Target-side stats have 8 windowed adders, about 8 * 56KB ~= 448KB/job for sampler queues.
  • If the same BE process observes both sides, the sampler queue storage is roughly (4 + 8) * 56KB ~= 672KB/job, plus small object/map/string overhead.

So this is proportional to the number of job_id dimensions seen by a BE process, not proportional to the number of rowsets/segments/events. The overall memory usage should be small for the expected number of warm-up jobs. This state is also BE-process-local memory only; it is not persisted and will be released after BE restart.

Comment thread be/src/cloud/cloud_warm_up_manager.cpp
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31398 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6c32a2f1e75c81c0cea00fbeaee02321c690459b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17659	4016	3981	3981
q2	q3	10914	1484	831	831
q4	4766	487	353	353
q5	10288	2293	2166	2166
q6	390	177	137	137
q7	996	781	653	653
q8	9660	1690	1626	1626
q9	7003	4990	5022	4990
q10	6448	2248	1905	1905
q11	449	276	252	252
q12	643	431	299	299
q13	18119	3522	2807	2807
q14	271	262	247	247
q15	q16	831	795	712	712
q17	970	860	980	860
q18	6986	5859	5582	5582
q19	1191	1264	1089	1089
q20	525	407	258	258
q21	5537	2635	2337	2337
q22	439	356	313	313
Total cold run time: 104085 ms
Total hot run time: 31398 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4395	4306	4328	4306
q2	q3	4561	4958	4340	4340
q4	2080	2198	1396	1396
q5	4437	4537	5284	4537
q6	256	197	145	145
q7	1954	1833	1710	1710
q8	2539	2273	2349	2273
q9	8152	8054	7878	7878
q10	4808	4785	4302	4302
q11	589	436	383	383
q12	732	756	574	574
q13	3254	3676	2927	2927
q14	295	300	273	273
q15	q16	693	733	662	662
q17	1443	1377	1325	1325
q18	7839	7326	7402	7326
q19	1124	1119	1128	1119
q20	2231	2228	1939	1939
q21	5349	4605	4401	4401
q22	522	454	417	417
Total cold run time: 57253 ms
Total hot run time: 52233 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31974 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a67fe9761c9259e6df78005ce6f432eda5dfba74, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17668	4062	3998	3998
q2	q3	10656	1448	860	860
q4	4696	475	349	349
q5	7708	2333	2132	2132
q6	253	177	140	140
q7	950	791	650	650
q8	9452	1724	1770	1724
q9	5479	4986	4946	4946
q10	6447	2207	1884	1884
q11	452	279	251	251
q12	676	430	295	295
q13	18166	3383	2742	2742
q14	272	257	242	242
q15	q16	848	778	721	721
q17	935	995	918	918
q18	7076	5696	5615	5615
q19	1176	1280	1205	1205
q20	568	437	293	293
q21	5890	2889	2691	2691
q22	449	365	318	318
Total cold run time: 99817 ms
Total hot run time: 31974 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4806	4745	4745	4745
q2	q3	4986	5333	4604	4604
q4	2183	2208	1380	1380
q5	4910	4658	4671	4658
q6	243	178	128	128
q7	1896	1716	1543	1543
q8	2439	2131	2126	2126
q9	7712	7325	7369	7325
q10	4759	4698	4235	4235
q11	524	388	355	355
q12	728	749	526	526
q13	2983	3411	2839	2839
q14	271	274	251	251
q15	q16	677	695	607	607
q17	1298	1257	1253	1253
q18	7295	6965	7102	6965
q19	1142	1075	1094	1075
q20	2210	2213	1937	1937
q21	5276	4555	4365	4365
q22	514	472	409	409
Total cold run time: 56852 ms
Total hot run time: 51326 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172895 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6c32a2f1e75c81c0cea00fbeaee02321c690459b, data reload: false

query5	4323	665	548	548
query6	344	223	200	200
query7	4225	567	316	316
query8	323	237	233	233
query9	8821	4154	4074	4074
query10	445	346	307	307
query11	5834	2461	2267	2267
query12	187	132	125	125
query13	1286	598	437	437
query14	6079	5489	5197	5197
query14_1	4501	4514	4473	4473
query15	217	207	192	192
query16	1050	491	476	476
query17	1171	741	631	631
query18	2610	513	367	367
query19	216	223	171	171
query20	145	137	132	132
query21	223	142	130	130
query22	13717	13654	13443	13443
query23	17323	16700	16238	16238
query23_1	16517	16416	16307	16307
query24	7530	1776	1331	1331
query24_1	1314	1346	1340	1340
query25	569	484	425	425
query26	1314	326	180	180
query27	2704	523	337	337
query28	4422	2033	2020	2020
query29	962	607	504	504
query30	309	236	204	204
query31	1126	1087	964	964
query32	87	80	74	74
query33	532	355	299	299
query34	1176	1144	653	653
query35	776	813	677	677
query36	1394	1398	1257	1257
query37	152	102	88	88
query38	3247	3199	3076	3076
query39	938	920	935	920
query39_1	897	870	892	870
query40	228	146	125	125
query41	71	65	62	62
query42	111	113	110	110
query43	335	344	308	308
query44	
query45	213	209	200	200
query46	1092	1178	756	756
query47	2351	2426	2206	2206
query48	393	438	313	313
query49	641	530	384	384
query50	964	365	249	249
query51	4364	4356	4291	4291
query52	105	108	95	95
query53	256	287	208	208
query54	308	267	251	251
query55	97	94	91	91
query56	314	314	303	303
query57	1445	1433	1341	1341
query58	309	271	281	271
query59	1568	1713	1494	1494
query60	325	324	317	317
query61	160	153	156	153
query62	714	652	576	576
query63	259	202	210	202
query64	2384	822	650	650
query65	
query66	1677	488	393	393
query67	29871	29750	29657	29657
query68	
query69	466	348	318	318
query70	1056	1023	996	996
query71	310	281	285	281
query72	3035	2688	2416	2416
query73	881	760	437	437
query74	5132	4994	4802	4802
query75	2687	2612	2285	2285
query76	2300	1148	803	803
query77	406	419	344	344
query78	12537	12457	11919	11919
query79	1421	1044	801	801
query80	656	561	461	461
query81	450	293	250	250
query82	1381	154	122	122
query83	364	282	256	256
query84	258	137	112	112
query85	911	614	554	554
query86	400	358	357	357
query87	3410	3372	3250	3250
query88	3651	2796	2803	2796
query89	455	395	351	351
query90	1940	188	205	188
query91	199	189	186	186
query92	84	82	79	79
query93	1540	1563	923	923
query94	572	383	333	333
query95	721	499	369	369
query96	1080	797	360	360
query97	2733	2744	2594	2594
query98	277	232	235	232
query99	1151	1156	1024	1024
Total cold run time: 254903 ms
Total hot run time: 172895 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171939 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a67fe9761c9259e6df78005ce6f432eda5dfba74, data reload: false

query5	4325	653	509	509
query6	334	224	199	199
query7	4229	572	321	321
query8	322	227	218	218
query9	8786	4066	4052	4052
query10	457	356	296	296
query11	5791	2397	2207	2207
query12	176	131	127	127
query13	1271	614	456	456
query14	6127	5498	5157	5157
query14_1	4460	4467	4487	4467
query15	218	206	187	187
query16	1023	451	449	449
query17	1172	765	617	617
query18	2729	498	375	375
query19	227	206	168	168
query20	135	139	136	136
query21	216	137	120	120
query22	13706	13607	13308	13308
query23	17338	16574	16294	16294
query23_1	16488	16320	16430	16320
query24	7414	1789	1315	1315
query24_1	1352	1313	1325	1313
query25	566	478	414	414
query26	1318	319	185	185
query27	2654	533	355	355
query28	4317	1990	2001	1990
query29	981	632	487	487
query30	305	228	196	196
query31	1126	1079	952	952
query32	87	76	71	71
query33	538	345	297	297
query34	1184	1121	664	664
query35	771	800	703	703
query36	1387	1370	1239	1239
query37	158	105	95	95
query38	3195	3188	3092	3092
query39	925	918	894	894
query39_1	903	884	907	884
query40	227	151	122	122
query41	65	64	63	63
query42	109	112	112	112
query43	325	337	288	288
query44	
query45	214	199	196	196
query46	1100	1248	764	764
query47	2370	2410	2306	2306
query48	402	407	299	299
query49	641	496	398	398
query50	963	353	250	250
query51	4378	4332	4326	4326
query52	103	104	93	93
query53	259	281	202	202
query54	308	277	265	265
query55	91	89	85	85
query56	299	311	317	311
query57	1450	1425	1319	1319
query58	301	263	273	263
query59	1561	1650	1414	1414
query60	318	326	331	326
query61	157	154	156	154
query62	702	646	592	592
query63	238	201	201	201
query64	2380	785	640	640
query65	
query66	1667	494	354	354
query67	29691	29758	29418	29418
query68	
query69	471	340	303	303
query70	960	989	1003	989
query71	305	274	264	264
query72	3133	2910	2424	2424
query73	816	759	457	457
query74	5119	4979	4821	4821
query75	2708	2607	2289	2289
query76	2251	1161	802	802
query77	425	409	338	338
query78	12423	12511	11954	11954
query79	1467	1000	712	712
query80	631	551	473	473
query81	452	280	242	242
query82	1371	155	122	122
query83	360	276	245	245
query84	271	142	108	108
query85	899	538	467	467
query86	383	352	323	323
query87	3425	3354	3234	3234
query88	3653	2743	2759	2743
query89	438	400	346	346
query90	1860	175	178	175
query91	178	169	147	147
query92	84	79	69	69
query93	1533	1479	901	901
query94	553	343	266	266
query95	696	474	339	339
query96	1020	770	331	331
query97	2741	2728	2625	2625
query98	233	228	225	225
query99	1193	1152	1026	1026
Total cold run time: 253967 ms
Total hot run time: 171939 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 39.27% (161/410) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.96% (21011/38939)
Line Coverage 37.50% (199213/531219)
Region Coverage 33.74% (156056/462466)
Branch Coverage 34.77% (67952/195417)

### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The aggregated warm-up rowset failure message included the tablet id and rowset id but omitted the table id, making table-level event-driven warm-up failures harder to diagnose. Pass table_id into the aggregated failure builder and include it in the error text. Extend the helper unit tests to assert the table id is reported.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter=CloudWarmUpManagerFilterTest.* -j100
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Jun 1, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.08% (1856/2377)
Line Coverage 64.54% (33361/51689)
Region Coverage 65.22% (16535/25354)
Branch Coverage 55.77% (8846/15862)

### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The 500K table-filter performance unit test can exceed the previous 3s threshold under CI load even though the matcher behavior remains correct. Relax the assertion to 4s to avoid treating small runtime variance as a test failure.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Jun 1, 2026

run buildall

@bobhan1 bobhan1 requested a review from gavinchou June 1, 2026 04:18
@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.32% (1904/2431)
Line Coverage 64.75% (33957/52442)
Region Coverage 65.27% (17495/26803)
Branch Coverage 53.89% (9272/17206)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 39.42% (162/411) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.99% (21039/38968)
Line Coverage 37.53% (199419/531295)
Region Coverage 33.78% (156144/462268)
Branch Coverage 34.80% (67952/195269)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 32235 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 40f2522b7ee691077b8986e09e63852b436a6d57, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17766	4192	4113	4113
q2	q3	10833	1432	818	818
q4	4773	490	349	349
q5	9912	2380	2184	2184
q6	408	188	143	143
q7	949	802	642	642
q8	9585	1746	1642	1642
q9	7112	5017	5009	5009
q10	6492	2269	1881	1881
q11	432	272	244	244
q12	690	434	312	312
q13	18181	3472	2762	2762
q14	273	262	249	249
q15	q16	817	797	712	712
q17	1016	988	1016	988
q18	7231	5801	6317	5801
q19	1239	1324	1129	1129
q20	538	435	267	267
q21	6229	2830	2677	2677
q22	450	381	313	313
Total cold run time: 104926 ms
Total hot run time: 32235 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4924	4912	4953	4912
q2	q3	5034	5262	4632	4632
q4	2156	2246	1418	1418
q5	4893	4850	4700	4700
q6	237	182	142	142
q7	1874	1820	1615	1615
q8	2297	2013	2019	2013
q9	7481	7490	7487	7487
q10	4723	4713	4304	4304
q11	558	397	382	382
q12	745	747	526	526
q13	3044	3366	2736	2736
q14	275	285	257	257
q15	q16	686	700	611	611
q17	1320	1292	1281	1281
q18	7411	6884	7139	6884
q19	1188	1148	1115	1115
q20	2249	2217	1970	1970
q21	5410	4774	4617	4617
q22	517	481	413	413
Total cold run time: 57022 ms
Total hot run time: 52015 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172315 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 40f2522b7ee691077b8986e09e63852b436a6d57, data reload: false

query5	4306	668	541	541
query6	338	225	203	203
query7	4220	577	313	313
query8	323	240	247	240
query9	8830	4106	4095	4095
query10	452	356	300	300
query11	5769	2335	2159	2159
query12	191	134	125	125
query13	1286	618	451	451
query14	6150	5569	5288	5288
query14_1	4576	4560	4533	4533
query15	217	208	186	186
query16	1045	462	475	462
query17	1163	747	636	636
query18	2787	502	374	374
query19	230	223	181	181
query20	150	136	130	130
query21	220	147	136	136
query22	13800	13603	13593	13593
query23	17411	16583	16276	16276
query23_1	16299	16522	16388	16388
query24	7627	1827	1356	1356
query24_1	1340	1353	1340	1340
query25	583	505	461	461
query26	1319	332	184	184
query27	2693	586	352	352
query28	4418	2082	2028	2028
query29	1036	669	543	543
query30	317	231	205	205
query31	1134	1096	991	991
query32	93	86	80	80
query33	573	372	316	316
query34	1180	1143	669	669
query35	814	805	722	722
query36	1387	1384	1242	1242
query37	156	108	97	97
query38	3239	3197	3100	3100
query39	924	920	899	899
query39_1	886	884	898	884
query40	225	150	130	130
query41	70	63	63	63
query42	114	112	111	111
query43	351	361	299	299
query44	
query45	213	208	200	200
query46	1109	1234	757	757
query47	2323	2384	2207	2207
query48	421	380	288	288
query49	642	499	395	395
query50	1001	359	260	260
query51	4358	4355	4302	4302
query52	107	111	99	99
query53	258	277	207	207
query54	332	272	261	261
query55	96	93	90	90
query56	302	316	317	316
query57	1433	1414	1319	1319
query58	308	281	278	278
query59	1639	1750	1463	1463
query60	326	338	323	323
query61	161	165	163	163
query62	696	650	581	581
query63	242	200	204	200
query64	2452	799	634	634
query65	
query66	1686	472	377	377
query67	29832	29788	29648	29648
query68	
query69	471	352	313	313
query70	1040	1075	1020	1020
query71	319	282	275	275
query72	3021	2721	2427	2427
query73	886	786	443	443
query74	5132	4993	4791	4791
query75	2709	2632	2284	2284
query76	2323	1181	814	814
query77	431	433	354	354
query78	12569	12536	11902	11902
query79	1537	1087	761	761
query80	825	552	453	453
query81	480	279	240	240
query82	1303	164	124	124
query83	365	292	259	259
query84	263	145	113	113
query85	926	548	468	468
query86	452	357	342	342
query87	3473	3389	3231	3231
query88	3692	2764	2786	2764
query89	457	391	342	342
query90	1792	193	187	187
query91	183	174	137	137
query92	85	78	76	76
query93	1546	1505	914	914
query94	610	356	314	314
query95	699	394	350	350
query96	1065	821	356	356
query97	2766	2740	2615	2615
query98	240	228	228	228
query99	1198	1152	1027	1027
Total cold run time: 255992 ms
Total hot run time: 172315 ms

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Jun 1, 2026

run cloud_p0

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Jun 1, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two correctness issues in the table-level event-driven warm-up flow. The new ON TABLES semantics can be bypassed during rolling upgrades because old source BEs ignore the optional table_ids thrift field and treat the job as cluster-level, and recycle-cache events currently pass table_id=0 so they bypass the same filter even on upgraded BEs.

Critical checkpoint conclusions:

  • Goal/tests: The PR implements table-filtered event-driven cloud warm-up with stats and broad unit/regression coverage, but the two paths below still violate the table filter contract.
  • Scope: The implementation is focused on warm-up filtering/stats, but it touches FE parser/job lifecycle, BE warm-up dispatch, thrift/proto, metrics, and tests.
  • Concurrency/lifecycle: Existing job daemons, BE warm-up thread pool, and BE metric globals are involved. I did not find a new lock-order deadlock, but the recycle path intentionally disables filtering.
  • Configs: New refresh/display configs are mutable and observed by the relevant daemons/display logic.
  • Compatibility: The new optional FE-to-BE table_ids field needs explicit mixed-version handling before table-level jobs can be safely created in rolling upgrades.
  • Parallel paths: warm_up_rowset applies table_id filtering, but recycle_cache does not.
  • Tests: Coverage is extensive, but it does not cover mixed-version behavior or recycle-cache filtering for unmatched tables.
  • Observability: New stats/logging are generally sufficient; existing prior comments already cover bvar memory/naming and table_id logging.

User focus: No additional user-provided review focus was specified.

Comment thread fe/fe-core/src/main/java/org/apache/doris/cloud/CloudWarmUpJob.java
@@ -802,18 +929,18 @@ void CloudWarmUpManager::_recycle_cache(int64_t tablet_id,
auto dns_cache = ExecEnv::GetInstance()->dns_cache();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing table_id=0 disables the new table-level filter for recycle-cache events. When rowsets from an unmatched table are GC'd or compacted on the source, every table-level warm-up job can still receive PRecycleCacheRequest and evict target cache for tables outside its ON TABLES filter. The callers are CloudTablet methods and can use the tablet's table id, so please propagate it through recycle_cache/_recycle_cache and apply the same filtering as warm_up_rowset.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 3.77% (39/1035) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 32281 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e9ef9da91a90b5477b1c626ea0a4e5a22347ad5e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17723	4109	4079	4079
q2	q3	10913	1394	851	851
q4	4753	488	348	348
q5	9445	2360	2174	2174
q6	346	180	140	140
q7	955	789	640	640
q8	9655	1752	1706	1706
q9	7025	5026	4965	4965
q10	6498	2288	1878	1878
q11	452	288	260	260
q12	694	421	308	308
q13	18167	3981	2811	2811
q14	266	256	233	233
q15	q16	822	778	713	713
q17	975	874	967	874
q18	6906	5903	6144	5903
q19	1257	1316	1125	1125
q20	553	419	283	283
q21	5920	2855	2683	2683
q22	457	381	307	307
Total cold run time: 103782 ms
Total hot run time: 32281 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4794	4783	4725	4725
q2	q3	4934	5166	4809	4809
q4	2118	2219	1443	1443
q5	4796	4705	4672	4672
q6	243	184	133	133
q7	1865	1755	1597	1597
q8	2491	2184	2034	2034
q9	7455	7420	7415	7415
q10	4784	4718	4255	4255
q11	540	392	361	361
q12	740	746	530	530
q13	3042	3409	2767	2767
q14	289	288	255	255
q15	q16	690	716	621	621
q17	1307	1273	1268	1268
q18	7312	6817	7071	6817
q19	1135	1141	1083	1083
q20	2232	2233	1958	1958
q21	5311	4652	4496	4496
q22	518	475	426	426
Total cold run time: 56596 ms
Total hot run time: 51665 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171370 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e9ef9da91a90b5477b1c626ea0a4e5a22347ad5e, data reload: false

query5	4342	659	510	510
query6	331	240	204	204
query7	4224	574	315	315
query8	336	243	220	220
query9	8791	4147	4117	4117
query10	457	351	290	290
query11	5821	2351	2152	2152
query12	208	136	130	130
query13	1299	613	458	458
query14	6171	5503	5185	5185
query14_1	4485	4529	4447	4447
query15	226	219	189	189
query16	1046	482	445	445
query17	1173	752	630	630
query18	2737	500	367	367
query19	244	214	182	182
query20	149	138	139	138
query21	234	144	119	119
query22	13602	13509	13545	13509
query23	17522	16663	16316	16316
query23_1	16396	16352	16299	16299
query24	7460	1768	1306	1306
query24_1	1330	1280	1313	1280
query25	549	498	424	424
query26	1363	321	171	171
query27	2653	587	346	346
query28	4422	2038	2000	2000
query29	989	643	506	506
query30	313	235	200	200
query31	1144	1108	952	952
query32	87	76	75	75
query33	545	353	291	291
query34	1216	1141	653	653
query35	771	801	712	712
query36	1393	1471	1288	1288
query37	162	110	91	91
query38	3206	3175	3067	3067
query39	939	916	894	894
query39_1	872	879	915	879
query40	228	156	131	131
query41	67	62	63	62
query42	118	109	109	109
query43	334	341	290	290
query44	
query45	212	206	200	200
query46	1113	1207	745	745
query47	2421	2458	2246	2246
query48	403	426	322	322
query49	654	504	391	391
query50	992	343	252	252
query51	4331	4355	4460	4355
query52	109	106	98	98
query53	266	287	212	212
query54	318	270	261	261
query55	95	92	83	83
query56	306	318	319	318
query57	1470	1415	1319	1319
query58	315	281	279	279
query59	1615	1786	1465	1465
query60	326	346	327	327
query61	168	162	163	162
query62	724	653	598	598
query63	250	208	212	208
query64	2371	818	647	647
query65	
query66	1683	488	358	358
query67	29794	29654	29568	29568
query68	
query69	456	346	297	297
query70	1005	1017	991	991
query71	310	283	274	274
query72	3039	2782	2532	2532
query73	869	820	441	441
query74	5089	4934	4772	4772
query75	2742	2627	2271	2271
query76	2294	1145	768	768
query77	411	425	344	344
query78	12575	12296	11840	11840
query79	1406	1061	797	797
query80	679	571	512	512
query81	452	284	249	249
query82	1411	164	127	127
query83	377	290	256	256
query84	274	151	122	122
query85	1001	624	559	559
query86	415	343	323	323
query87	3390	3372	3207	3207
query88	3640	2757	2723	2723
query89	452	398	345	345
query90	1930	192	183	183
query91	177	165	145	145
query92	81	88	71	71
query93	1536	1580	821	821
query94	542	360	335	335
query95	704	403	463	403
query96	1022	787	356	356
query97	2737	2737	2627	2627
query98	242	232	231	231
query99	1192	1145	1032	1032
Total cold run time: 255342 ms
Total hot run time: 171370 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants