Skip to content

[fix](variant) Avoid mutating shared variant columns#64092

Merged
yiguolei merged 1 commit into
apache:masterfrom
eldenmoon:fix-local-shuffle-shared-columns
Jun 5, 2026
Merged

[fix](variant) Avoid mutating shared variant columns#64092
yiguolei merged 1 commit into
apache:masterfrom
eldenmoon:fix-local-shuffle-shared-columns

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented Jun 3, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Queries that evaluate VARIANT expressions after local exchange can share input blocks across downstream pipeline tasks. Variant casts and Variant serialization finalized source columns in-place, so one consumer could mutate a shared input column while another consumer still expected the original column shape and row count. This made local-shuffle query results unstable and could trigger later operators to observe changed Variant column contents or sizes. This change confines the fix to Variant handling by using private finalized Variant copies for cast and serialization paths instead of mutating the source column.

Release note

None

Check List (For Author)

  • Test:
    • PATH=/tmp/doris-clang-format-bin:$PATH build-support/clang-format.sh
    • git diff --check HEAD^
    • ./build.sh --be
    • ./run-be-ut.sh --run --filter='ColumnVariantTest.serialize_does_not_finalize_source_column:ColumnVariantTest.block_serialize_does_not_finalize_source_column:FunctionVariantCast.CastFromVariantDoesNotFinalizeSourceColumn:FunctionVariantCast.CastFromVariant'
    • Manual test: with column_nullable.cpp and column_nullable_test.cpp reverted from this PR, local 1FE+4BE first local-shuffle repro passed 16x100 concurrent executions on BE version doris-0.0.0-5078f25a971fc
    • Manual test: with column_nullable.cpp and column_nullable_test.cpp reverted from this PR, local 1FE+4BE second local-shuffle repro matched local-off baseline for 100 iterations on BE version doris-0.0.0-5078f25a971fc
  • Behavior changed: No
  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon eldenmoon force-pushed the fix-local-shuffle-shared-columns branch from 9566c5e to ff10f78 Compare June 3, 2026 18:08
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon eldenmoon force-pushed the fix-local-shuffle-shared-columns branch from ff10f78 to 2269005 Compare June 3, 2026 19:05
Copilot AI review requested due to automatic review settings June 3, 2026 19:05
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@eldenmoon eldenmoon force-pushed the fix-local-shuffle-shared-columns branch from 2269005 to 00d2917 Compare June 3, 2026 19:37
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29484 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2269005d14331dd173361027407dbeecb5d92c36, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17765	4205	4103	4103
q2	q3	10775	1399	803	803
q4	4687	478	347	347
q5	7540	864	586	586
q6	187	174	137	137
q7	783	848	657	657
q8	9693	1626	1682	1626
q9	6632	4568	4581	4568
q10	6777	1828	1520	1520
q11	447	285	253	253
q12	679	446	306	306
q13	18147	3419	2799	2799
q14	280	263	254	254
q15	q16	827	767	703	703
q17	1267	1032	843	843
q18	7030	5654	5665	5654
q19	1513	1419	1101	1101
q20	535	404	269	269
q21	6174	2827	2649	2649
q22	482	393	306	306
Total cold run time: 102220 ms
Total hot run time: 29484 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5165	4864	4983	4864
q2	q3	4895	5281	4620	4620
q4	2147	2201	1437	1437
q5	4914	4836	4687	4687
q6	243	174	134	134
q7	1854	1759	1579	1579
q8	2511	2008	1981	1981
q9	7396	7466	7417	7417
q10	4760	4675	4189	4189
q11	565	392	362	362
q12	759	741	540	540
q13	3041	3481	2816	2816
q14	279	279	257	257
q15	q16	677	703	615	615
q17	1302	1266	1265	1265
q18	7499	6814	6739	6739
q19	1154	1092	1106	1092
q20	2225	2229	1956	1956
q21	5352	4614	4521	4521
q22	545	481	428	428
Total cold run time: 57283 ms
Total hot run time: 51499 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169084 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2269005d14331dd173361027407dbeecb5d92c36, data reload: false

query5	4313	657	483	483
query6	442	195	180	180
query7	4819	584	313	313
query8	382	216	199	199
query9	8778	4009	3960	3960
query10	473	302	259	259
query11	5872	2357	2198	2198
query12	153	108	96	96
query13	1257	585	435	435
query14	6471	5410	5027	5027
query14_1	4364	4364	4377	4364
query15	207	195	180	180
query16	1017	454	426	426
query17	1107	692	575	575
query18	2527	473	338	338
query19	199	178	140	140
query20	110	104	106	104
query21	216	137	115	115
query22	13564	13515	13323	13323
query23	17245	16495	16154	16154
query23_1	16328	16367	16332	16332
query24	7513	1761	1286	1286
query24_1	1333	1335	1278	1278
query25	542	451	380	380
query26	1292	316	169	169
query27	2690	532	337	337
query28	4449	2015	1985	1985
query29	1091	627	479	479
query30	315	234	203	203
query31	1126	1085	957	957
query32	104	59	57	57
query33	523	310	252	252
query34	1183	1090	662	662
query35	767	783	695	695
query36	1418	1408	1238	1238
query37	156	105	90	90
query38	3226	3168	3042	3042
query39	925	934	904	904
query39_1	875	893	900	893
query40	246	123	103	103
query41	65	63	62	62
query42	95	94	95	94
query43	317	320	275	275
query44	
query45	201	189	180	180
query46	1065	1217	744	744
query47	2429	2422	2243	2243
query48	425	413	309	309
query49	640	495	365	365
query50	981	352	272	272
query51	4390	4286	4347	4286
query52	92	92	79	79
query53	247	274	204	204
query54	312	230	214	214
query55	80	77	72	72
query56	266	256	237	237
query57	1455	1404	1325	1325
query58	251	230	223	223
query59	1597	1725	1481	1481
query60	294	263	245	245
query61	210	157	158	157
query62	700	647	580	580
query63	233	184	190	184
query64	2536	787	615	615
query65	
query66	1764	458	338	338
query67	29681	29724	29554	29554
query68	
query69	429	304	258	258
query70	978	969	930	930
query71	289	218	212	212
query72	2884	2689	2425	2425
query73	839	757	419	419
query74	5144	4937	4779	4779
query75	2670	2593	2299	2299
query76	2357	1153	769	769
query77	360	387	307	307
query78	12407	12309	11841	11841
query79	1435	1009	794	794
query80	622	506	408	408
query81	452	284	256	256
query82	561	197	122	122
query83	349	273	254	254
query84	258	143	111	111
query85	897	539	441	441
query86	375	294	295	294
query87	3377	3374	3224	3224
query88	3630	2747	2756	2747
query89	433	381	327	327
query90	1943	185	182	182
query91	181	174	139	139
query92	64	63	57	57
query93	1511	1419	926	926
query94	516	391	311	311
query95	662	388	351	351
query96	1084	765	344	344
query97	2695	2698	2567	2567
query98	210	204	221	204
query99	1180	1183	1036	1036
Total cold run time: 250877 ms
Total hot run time: 169084 ms

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29117 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 00d2917e6fd89a22088bad1c91971078d849e513, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17665	4075	4068	4068
q2	q3	10770	1380	795	795
q4	4684	472	343	343
q5	7559	913	597	597
q6	196	180	144	144
q7	775	874	636	636
q8	9364	1561	1609	1561
q9	5914	4457	4504	4457
q10	6734	1827	1552	1552
q11	427	285	260	260
q12	621	419	288	288
q13	18145	3393	2766	2766
q14	273	262	240	240
q15	q16	813	785	713	713
q17	998	879	874	874
q18	6712	5758	5491	5491
q19	1321	1345	1059	1059
q20	516	413	266	266
q21	6246	2865	2688	2688
q22	474	400	319	319
Total cold run time: 100207 ms
Total hot run time: 29117 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5132	4802	4837	4802
q2	q3	4842	5298	4744	4744
q4	2174	2224	1402	1402
q5	4835	4909	4684	4684
q6	244	176	129	129
q7	1843	1793	1575	1575
q8	2434	2149	2238	2149
q9	7988	7707	7393	7393
q10	4739	4671	4227	4227
q11	545	383	364	364
q12	725	734	528	528
q13	2998	3372	2823	2823
q14	281	276	263	263
q15	q16	673	699	617	617
q17	1307	1258	1274	1258
q18	7308	6649	6883	6649
q19	1115	1097	1081	1081
q20	2224	2213	1935	1935
q21	5298	4649	4491	4491
q22	534	453	413	413
Total cold run time: 57239 ms
Total hot run time: 51527 ms

@eldenmoon
Copy link
Copy Markdown
Member Author

run p0

@eldenmoon
Copy link
Copy Markdown
Member Author

run cloud_p0

@eldenmoon
Copy link
Copy Markdown
Member Author

run nonConcurrent

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169178 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 00d2917e6fd89a22088bad1c91971078d849e513, data reload: false

query5	4317	629	483	483
query6	453	203	181	181
query7	4807	518	306	306
query8	370	219	210	210
query9	8761	4062	4055	4055
query10	450	333	265	265
query11	5893	2376	2149	2149
query12	161	104	100	100
query13	1280	612	428	428
query14	6400	5473	5151	5151
query14_1	4484	4492	4450	4450
query15	203	199	174	174
query16	964	435	412	412
query17	917	680	560	560
query18	2484	491	340	340
query19	203	186	144	144
query20	108	107	104	104
query21	223	145	126	126
query22	13643	13563	13435	13435
query23	17295	16558	16219	16219
query23_1	16263	16347	16366	16347
query24	7539	1772	1326	1326
query24_1	1345	1324	1328	1324
query25	550	467	371	371
query26	1323	328	168	168
query27	2702	571	358	358
query28	4554	2010	2013	2010
query29	1078	637	476	476
query30	293	224	200	200
query31	1110	1074	948	948
query32	108	64	61	61
query33	546	313	266	266
query34	1178	1152	677	677
query35	784	789	684	684
query36	1363	1371	1229	1229
query37	153	111	90	90
query38	3218	3133	3043	3043
query39	932	918	884	884
query39_1	875	867	868	867
query40	222	124	102	102
query41	71	63	63	63
query42	94	95	91	91
query43	329	335	286	286
query44	
query45	194	188	183	183
query46	1068	1203	745	745
query47	2397	2363	2203	2203
query48	413	418	303	303
query49	621	482	342	342
query50	990	364	262	262
query51	4361	4310	4284	4284
query52	88	97	80	80
query53	248	268	189	189
query54	268	219	194	194
query55	78	77	71	71
query56	231	248	211	211
query57	1411	1392	1326	1326
query58	247	219	213	213
query59	1690	1754	1441	1441
query60	292	250	236	236
query61	165	176	173	173
query62	710	663	594	594
query63	236	189	185	185
query64	2633	858	683	683
query65	
query66	1826	469	351	351
query67	29756	29088	29501	29088
query68	
query69	432	323	282	282
query70	991	1014	954	954
query71	302	232	227	227
query72	3206	2927	2470	2470
query73	872	730	435	435
query74	5136	4997	4772	4772
query75	2649	2583	2229	2229
query76	2340	1169	794	794
query77	356	392	297	297
query78	12247	12422	11868	11868
query79	1396	1058	765	765
query80	1286	475	384	384
query81	515	278	238	238
query82	643	161	122	122
query83	346	274	253	253
query84	310	141	116	116
query85	918	550	442	442
query86	423	291	271	271
query87	3362	3364	3212	3212
query88	3644	2743	2750	2743
query89	431	379	331	331
query90	1890	187	179	179
query91	178	162	139	139
query92	66	62	58	58
query93	1577	1544	916	916
query94	751	357	327	327
query95	669	389	426	389
query96	1046	801	361	361
query97	2701	2686	2558	2558
query98	218	207	214	207
query99	1162	1170	1055	1055
Total cold run time: 252391 ms
Total hot run time: 169178 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 28323 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 00d2917e6fd89a22088bad1c91971078d849e513, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17687	4026	3963	3963
q2	q3	10791	1366	807	807
q4	4683	476	346	346
q5	7540	867	575	575
q6	178	171	138	138
q7	771	855	625	625
q8	9454	1488	1553	1488
q9	6354	4452	4418	4418
q10	6862	1804	1529	1529
q11	447	274	249	249
q12	662	420	290	290
q13	18192	3415	2755	2755
q14	272	254	236	236
q15	q16	820	773	705	705
q17	1292	1090	809	809
q18	6780	5846	5494	5494
q19	1467	1267	957	957
q20	516	390	260	260
q21	6071	2524	2376	2376
q22	433	356	303	303
Total cold run time: 101272 ms
Total hot run time: 28323 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4304	4276	4250	4250
q2	q3	4492	4882	4319	4319
q4	2083	2163	1392	1392
q5	4407	4290	4317	4290
q6	221	170	127	127
q7	1731	1599	1737	1599
q8	2609	2112	2187	2112
q9	7886	7949	8051	7949
q10	4822	4786	4289	4289
q11	576	444	408	408
q12	761	748	530	530
q13	3394	3575	3004	3004
q14	307	336	276	276
q15	q16	731	714	648	648
q17	1328	1329	1300	1300
q18	8025	7253	7089	7089
q19	1109	1088	1106	1088
q20	2216	2198	1946	1946
q21	5242	4518	4424	4424
q22	527	472	410	410
Total cold run time: 56771 ms
Total hot run time: 51450 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169386 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 00d2917e6fd89a22088bad1c91971078d849e513, data reload: false

query5	4319	646	488	488
query6	456	220	183	183
query7	4861	550	294	294
query8	371	210	197	197
query9	8751	3998	3995	3995
query10	442	314	265	265
query11	5932	2343	2181	2181
query12	166	104	124	104
query13	1255	583	429	429
query14	6357	5333	5042	5042
query14_1	4327	4358	4356	4356
query15	206	192	177	177
query16	1035	438	416	416
query17	1099	696	573	573
query18	2503	468	334	334
query19	207	188	150	150
query20	114	117	105	105
query21	220	139	115	115
query22	13672	13599	13376	13376
query23	17254	16445	16225	16225
query23_1	16199	16323	16273	16273
query24	7606	1773	1315	1315
query24_1	1342	1310	1317	1310
query25	575	493	415	415
query26	1346	322	178	178
query27	2662	577	347	347
query28	4506	2041	2064	2041
query29	1139	623	499	499
query30	318	238	203	203
query31	1126	1078	949	949
query32	112	62	68	62
query33	524	327	262	262
query34	1182	1126	635	635
query35	775	782	684	684
query36	1425	1394	1246	1246
query37	158	112	93	93
query38	3188	3133	3044	3044
query39	932	934	914	914
query39_1	888	883	879	879
query40	226	127	105	105
query41	71	71	71	71
query42	97	98	94	94
query43	315	320	281	281
query44	
query45	196	186	182	182
query46	1102	1227	747	747
query47	2376	2321	2235	2235
query48	405	434	282	282
query49	663	482	379	379
query50	995	361	244	244
query51	4309	4314	4209	4209
query52	90	91	80	80
query53	264	273	191	191
query54	292	234	209	209
query55	79	76	71	71
query56	265	247	226	226
query57	1505	1441	1370	1370
query58	258	227	229	227
query59	1637	1717	1493	1493
query60	302	258	229	229
query61	160	152	156	152
query62	709	671	599	599
query63	235	190	187	187
query64	2604	798	616	616
query65	
query66	1807	475	343	343
query67	29175	29653	29431	29431
query68	
query69	421	298	281	281
query70	1000	982	954	954
query71	288	223	208	208
query72	3208	2609	2272	2272
query73	842	829	430	430
query74	5109	4911	4795	4795
query75	2635	2573	2248	2248
query76	2341	1143	793	793
query77	352	380	272	272
query78	12318	12548	11983	11983
query79	1428	1005	756	756
query80	1268	460	407	407
query81	524	284	246	246
query82	631	159	122	122
query83	310	275	244	244
query84	258	140	108	108
query85	903	564	447	447
query86	422	308	301	301
query87	3356	3349	3200	3200
query88	3646	2778	2733	2733
query89	444	370	332	332
query90	1917	180	176	176
query91	175	163	138	138
query92	62	60	57	57
query93	1563	1388	875	875
query94	707	344	313	313
query95	694	395	354	354
query96	1085	785	355	355
query97	2710	2711	2566	2566
query98	215	207	204	204
query99	1179	1174	1053	1053
Total cold run time: 251987 ms
Total hot run time: 169386 ms

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed for PR #64092.

No blocking issues found. The changes are focused on avoiding in-place mutation of shared BE column trees after local exchange, and the implementation consistently uses private finalized/deep-cloned copies for Variant serialization/casts and a detached probe block for lazy nested-loop join materialization.

Critical checkpoint conclusions:

  • Goal and tests: The code addresses the stated shared-column mutation problem. Added/updated unit coverage exercises nullable CRC hashing, Variant cast/serialization non-mutation, finalized Variant deep copy, and the relevant lazy NLJ nullable append behavior described in the PR.
  • Scope/focus: The change is reasonably small for the affected paths and avoids broad behavioral changes or storage/protocol format changes.
  • Concurrency/COW: No new thread or dependency behavior is introduced. The reviewed paths avoid mutating columns that may be shared across downstream pipeline tasks; recursive/deep cloning and lazy probe-block detachment look appropriate.
  • Lifecycle/static initialization: No new static initialization or unusual lifecycle hazard found.
  • Configuration/compatibility: No new config items and no incompatible serialization layout changes found.
  • Parallel paths: Block-level Variant serialization pre-finalizes unfinalized top-level Variant columns, while datatype-level Variant serialization still protects nested/fallback callers. Cast and lazy NLJ paths are covered separately.
  • Error handling: New Status-returning paths propagate errors with RETURN_IF_ERROR/RETURN_IF_CATCH_EXCEPTION. No ignored Status found in the diff.
  • Memory/performance: The fixes intentionally trade extra cloning for correctness on shared columns. Block serialization avoids cloning the same top-level Variant twice; nested Variant datatype callers may still clone independently for sizing/serialization, but I did not find a correctness issue.
  • Observability/transactions/persistence: Not applicable; no transaction, persistence, or user-facing observability changes.
  • User focus points: No additional user-provided review focus was present.

Review note: I inspected the diff and related call paths and ran git diff --check for the PR diff; I did not rerun the full BE unit/build suite in this review runner.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.95% (27509/38231)
Line Coverage 55.48% (294416/530680)
Region Coverage 51.97% (244398/470296)
Branch Coverage 53.33% (106050/198871)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.95% (27509/38231)
Line Coverage 55.48% (294407/530680)
Region Coverage 51.96% (244345/470296)
Branch Coverage 53.32% (106037/198871)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.96% (27510/38231)
Line Coverage 55.49% (294463/530680)
Region Coverage 51.98% (244469/470296)
Branch Coverage 53.34% (106072/198871)

@eldenmoon
Copy link
Copy Markdown
Member Author

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.96% (27510/38231)
Line Coverage 55.49% (294462/530680)
Region Coverage 52.00% (244553/470296)
Branch Coverage 53.34% (106075/198871)

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.96% (27510/38231)
Line Coverage 55.49% (294462/530680)
Region Coverage 52.00% (244553/470296)
Branch Coverage 53.34% (106075/198871)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.02% (27535/38231)
Line Coverage 55.53% (294684/530680)
Region Coverage 52.02% (244631/470296)
Branch Coverage 53.38% (106154/198871)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.03% (27536/38231)
Line Coverage 55.53% (294704/530680)
Region Coverage 52.02% (244658/470296)
Branch Coverage 53.38% (106161/198871)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.95% (27508/38231)
Line Coverage 55.47% (294389/530680)
Region Coverage 51.94% (244288/470296)
Branch Coverage 53.32% (106038/198871)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.95% (27508/38231)
Line Coverage 55.47% (294387/530680)
Region Coverage 51.96% (244369/470296)
Branch Coverage 53.32% (106041/198871)

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.90% (186/196) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.95% (27508/38231)
Line Coverage 55.47% (294387/530680)
Region Coverage 51.96% (244369/470296)
Branch Coverage 53.32% (106041/198871)

@eldenmoon eldenmoon force-pushed the fix-local-shuffle-shared-columns branch from 00d2917 to 9eb7865 Compare June 4, 2026 13:09
@eldenmoon
Copy link
Copy Markdown
Member Author

eldenmoon commented Jun 4, 2026

Latest local verification after updating head to b5fde9f3e01e3:

  • Minimized the fix again and removed the production ColumnVariant::clone_finalized() rewrite from this PR. The current PR has no production diff in be/src/core/column/column_variant.cpp or be/src/core/column/column_variant.h; Variant safety is handled at cast/serialization call sites by using private finalized copies.
  • Verified the ColumnVariant production change is not required: with column_variant.cpp/h reverted, these tests pass: ColumnVariantTest.clone_finalized_deep_copies_columns, ColumnVariantTest.serialize_does_not_finalize_source_column, ColumnVariantTest.block_serialize_does_not_finalize_source_column, FunctionVariantCast.CastFromVariantDoesNotFinalizeSourceColumn, FunctionVariantCast.CastFromVariant.
  • Red/green E2E remains the same: local 1FE+4BE local-shuffle query fails on the reverted parent and passes on this fix; nullable nested-loop join query fails on the reverted parent and passes on this fix.
  • Formatting/checks rerun: build-support/clang-format.sh ..., git diff --check, git diff --cached --check.

@eldenmoon eldenmoon force-pushed the fix-local-shuffle-shared-columns branch 3 times, most recently from b5fde9f to 761a984 Compare June 4, 2026 14:25
@eldenmoon eldenmoon changed the title [fix](be) Avoid mutating shared local shuffle columns [fix](be) Avoid mutating shared variant columns Jun 4, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Queries that evaluate VARIANT expressions after local exchange can share input blocks across downstream pipeline tasks. Variant casts and Variant serialization finalized source columns in-place, so one consumer could mutate a shared input column while another consumer still expected the original column shape and row count. This made local-shuffle query results unstable and could trigger later operators to observe changed Variant column contents or sizes. This change confines the fix to Variant handling by using private finalized Variant copies for cast and serialization paths instead of mutating the source column.

### Release note

None

### Check List (For Author)

- Test:
    - PATH=/tmp/doris-clang-format-bin:$PATH build-support/clang-format.sh
    - git diff --check HEAD^
    - ./build.sh --be
    - ./run-be-ut.sh --run --filter='ColumnVariantTest.serialize_does_not_finalize_source_column:ColumnVariantTest.block_serialize_does_not_finalize_source_column:FunctionVariantCast.CastFromVariantDoesNotFinalizeSourceColumn:FunctionVariantCast.CastFromVariant'
    - Manual test: with column_nullable.cpp and column_nullable_test.cpp reverted from this PR, local 1FE+4BE first local-shuffle repro passed 16x100 concurrent executions
    - Manual test: with column_nullable.cpp and column_nullable_test.cpp reverted from this PR, local 1FE+4BE second local-shuffle repro matched local-off baseline for 100 iterations
- Behavior changed: No
- Does this need documentation: No
@eldenmoon eldenmoon force-pushed the fix-local-shuffle-shared-columns branch from 761a984 to 5078f25 Compare June 4, 2026 14:38
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon eldenmoon changed the title [fix](be) Avoid mutating shared variant columns [fix](variant) Avoid mutating shared variant columns Jun 5, 2026
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29109 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5078f25a971fce7ce62b76ebd8dcc312ef81378e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17585	3978	3956	3956
q2	q3	10767	1360	835	835
q4	4684	473	342	342
q5	7513	865	596	596
q6	186	170	136	136
q7	774	856	654	654
q8	9379	1553	1562	1553
q9	5882	4443	4515	4443
q10	6779	1827	1526	1526
q11	424	264	248	248
q12	637	422	290	290
q13	18217	3342	2789	2789
q14	264	270	246	246
q15	q16	795	765	717	717
q17	995	983	1055	983
q18	6818	5732	5582	5582
q19	1358	1268	1071	1071
q20	493	410	257	257
q21	6443	2857	2573	2573
q22	457	366	312	312
Total cold run time: 100450 ms
Total hot run time: 29109 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5119	4735	4819	4735
q2	q3	4913	5261	4857	4857
q4	2112	2192	1407	1407
q5	4874	4861	4700	4700
q6	229	176	124	124
q7	1868	1789	1550	1550
q8	2389	2120	2093	2093
q9	7834	7622	7439	7439
q10	4763	4685	4218	4218
q11	543	386	355	355
q12	725	741	524	524
q13	2978	3343	2785	2785
q14	286	278	255	255
q15	q16	676	699	628	628
q17	1278	1253	1249	1249
q18	7293	6979	6742	6742
q19	1144	1117	1085	1085
q20	2234	2221	1939	1939
q21	5277	4533	4389	4389
q22	501	454	409	409
Total cold run time: 57036 ms
Total hot run time: 51483 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 85.96% (49/57) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.93% (21055/39043)
Line Coverage 37.60% (200208/532494)
Region Coverage 33.69% (157056/466190)
Branch Coverage 34.63% (68683/198309)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169655 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5078f25a971fce7ce62b76ebd8dcc312ef81378e, data reload: false

query5	4316	644	486	486
query6	450	197	179	179
query7	4808	580	299	299
query8	367	232	217	217
query9	8791	4051	4023	4023
query10	458	316	266	266
query11	5928	2330	2169	2169
query12	158	103	99	99
query13	1267	623	428	428
query14	6389	5372	5074	5074
query14_1	4392	4398	4401	4398
query15	212	197	178	178
query16	1013	449	425	425
query17	934	710	623	623
query18	2455	494	356	356
query19	202	193	146	146
query20	112	109	110	109
query21	218	144	121	121
query22	13677	13554	13327	13327
query23	17394	16453	16150	16150
query23_1	16257	16397	16313	16313
query24	7438	1774	1310	1310
query24_1	1344	1332	1351	1332
query25	595	495	421	421
query26	1322	338	175	175
query27	2649	537	332	332
query28	4502	2018	2021	2018
query29	1091	647	517	517
query30	322	251	204	204
query31	1118	1089	973	973
query32	108	65	61	61
query33	539	326	260	260
query34	1183	1152	689	689
query35	766	781	680	680
query36	1367	1376	1231	1231
query37	147	98	92	92
query38	3223	3194	3073	3073
query39	925	914	905	905
query39_1	891	874	889	874
query40	229	140	103	103
query41	67	62	62	62
query42	96	94	92	92
query43	313	324	275	275
query44	
query45	200	186	189	186
query46	1068	1173	745	745
query47	2366	2389	2290	2290
query48	406	374	308	308
query49	656	483	357	357
query50	959	355	265	265
query51	4352	4337	4267	4267
query52	88	89	77	77
query53	245	272	188	188
query54	269	216	207	207
query55	78	77	71	71
query56	243	252	226	226
query57	1432	1433	1327	1327
query58	255	214	200	200
query59	1608	1649	1434	1434
query60	295	250	236	236
query61	166	155	160	155
query62	696	663	586	586
query63	241	188	186	186
query64	2575	815	637	637
query65	
query66	1821	470	346	346
query67	29892	29766	29682	29682
query68	
query69	417	318	264	264
query70	973	989	934	934
query71	290	220	221	220
query72	2950	2706	2368	2368
query73	854	796	438	438
query74	5113	4947	4760	4760
query75	2663	2574	2247	2247
query76	2357	1175	787	787
query77	358	381	296	296
query78	12387	12419	11878	11878
query79	1438	1098	751	751
query80	698	481	416	416
query81	478	284	246	246
query82	580	157	122	122
query83	365	278	254	254
query84	308	142	112	112
query85	913	553	448	448
query86	407	302	300	300
query87	3365	3382	3201	3201
query88	3631	2738	2725	2725
query89	437	384	334	334
query90	1839	191	182	182
query91	176	173	138	138
query92	62	61	55	55
query93	1517	1464	822	822
query94	599	360	301	301
query95	683	382	439	382
query96	1049	851	346	346
query97	2718	2669	2574	2574
query98	221	211	208	208
query99	1166	1167	1021	1021
Total cold run time: 251396 ms
Total hot run time: 169655 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 92.98% (53/57) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.84% (27483/38257)
Line Coverage 55.43% (294443/531239)
Region Coverage 52.14% (245519/470850)
Branch Coverage 53.33% (106212/199166)

@morrySnow
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary: I found no blocking or inline-worthy issues in this PR. The change is focused on avoiding in-place finalization of shared Variant columns during serialization and cast paths, and the implementation uses private finalized clones without changing the serialized format or cast semantics.

Critical checkpoint conclusions:

  • Goal/test proof: The PR addresses shared-block mutation in Variant serialization/casts. Added BE unit tests cover serialization, block serialization, clone detachment, and Variant-to-int/string casts not finalizing the source column.
  • Scope: The modification is small and limited to Variant serialization/cast handling plus targeted tests.
  • Concurrency: No new threads or locks are introduced. The relevant concurrency issue is shared input block reuse after local exchange, and the clone-before-finalize approach avoids mutating shared columns.
  • Lifecycle/static initialization: No new static/global lifecycle concerns found.
  • Configuration/compatibility: No config items, protocol changes, or storage/serialization format changes found.
  • Parallel paths: Both DataTypeVariant block serialization and Variant cast paths are updated. String/JSONB delegated casts run on a copied block with finalized input; root-value casts use the finalized clone root while preserving original null propagation.
  • Conditional checks: The new finalization checks are tied to the concrete non-finalized Variant state and match the stated failure mode.
  • Tests/results: Added unit tests are relevant. I did not run the full BE test suite in this review runner.
  • Observability: No additional observability appears necessary for this internal correctness fix.
  • Transactions/persistence/data writes: Not applicable.
  • FE/BE variable passing: Not applicable.
  • Performance: The change can clone/finalize non-finalized Variant columns where previous code mutated in place, but this is the intended safety tradeoff for shared blocks and is limited to non-finalized inputs.

User focus: No additional user-provided review focus was specified.

@yiguolei yiguolei merged commit 6e37a86 into apache:master Jun 5, 2026
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants