Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](spill) spill and reserve #47462

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Jan 26, 2025

What problem does this PR solve?

Problem Summary:

A brand-new spilling triggering strategy:

  1. Use workload groups to control/manage the memory usage of queries.
  2. Trigger spilling when the memory reservation attempt fails.

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 26, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg mrhhsg marked this pull request as draft January 26, 2025 10:20
@mrhhsg
Copy link
Member Author

mrhhsg commented Jan 26, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32548 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bbb25f1b0e644ce1b6368ccaba3ed10d144321ce, data reload: false

------ Round 1 ----------------------------------
q1	17797	5532	5403	5403
q2	2073	305	174	174
q3	10877	1213	726	726
q4	10219	961	546	546
q5	7652	2390	2168	2168
q6	200	172	134	134
q7	931	763	600	600
q8	9225	1383	1171	1171
q9	5228	4953	4898	4898
q10	6854	2344	1901	1901
q11	473	282	260	260
q12	346	365	215	215
q13	17759	3731	3151	3151
q14	230	229	216	216
q15	547	461	469	461
q16	638	627	598	598
q17	555	879	333	333
q18	7246	6703	6558	6558
q19	2001	965	539	539
q20	303	320	190	190
q21	2870	2166	1983	1983
q22	377	341	323	323
Total cold run time: 104401 ms
Total hot run time: 32548 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5571	5487	5449	5449
q2	242	339	243	243
q3	2267	2659	2350	2350
q4	1445	1811	1418	1418
q5	4341	4762	4714	4714
q6	170	165	126	126
q7	2077	2007	1857	1857
q8	2659	2890	2689	2689
q9	7349	7233	7324	7233
q10	3065	3337	2763	2763
q11	609	518	487	487
q12	657	739	612	612
q13	3606	4025	3296	3296
q14	293	296	282	282
q15	535	480	480	480
q16	660	714	667	667
q17	1263	1741	1263	1263
q18	7794	7586	7479	7479
q19	800	851	1069	851
q20	2027	2102	1920	1920
q21	5876	5238	5022	5022
q22	634	626	587	587
Total cold run time: 53940 ms
Total hot run time: 51788 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193144 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bbb25f1b0e644ce1b6368ccaba3ed10d144321ce, data reload: false

query1	1309	933	949	933
query2	6247	2093	2102	2093
query3	11101	4725	4800	4725
query4	32690	23660	23070	23070
query5	4694	593	458	458
query6	304	199	195	195
query7	3994	493	320	320
query8	300	252	237	237
query9	9296	2656	2635	2635
query10	473	307	277	277
query11	17877	15362	14970	14970
query12	171	116	106	106
query13	1575	528	408	408
query14	9577	7163	7283	7163
query15	243	210	185	185
query16	7572	672	517	517
query17	1550	754	572	572
query18	1449	387	313	313
query19	202	219	174	174
query20	120	114	117	114
query21	214	133	108	108
query22	4606	4550	4389	4389
query23	34401	33484	33503	33484
query24	6430	2361	2369	2361
query25	497	469	407	407
query26	741	279	160	160
query27	2143	505	343	343
query28	5274	2506	2471	2471
query29	550	564	430	430
query30	221	189	161	161
query31	956	895	819	819
query32	84	58	59	58
query33	479	360	316	316
query34	786	899	529	529
query35	832	875	771	771
query36	1020	1060	974	974
query37	127	106	88	88
query38	4405	4158	4247	4158
query39	1492	1455	1456	1455
query40	215	127	110	110
query41	60	57	56	56
query42	124	109	105	105
query43	524	541	499	499
query44	1352	851	865	851
query45	183	186	172	172
query46	879	1066	666	666
query47	1901	1948	1871	1871
query48	374	423	333	333
query49	715	551	436	436
query50	642	699	409	409
query51	4337	4355	4311	4311
query52	104	100	96	96
query53	248	278	192	192
query54	508	529	448	448
query55	89	79	88	79
query56	283	287	251	251
query57	1188	1184	1139	1139
query58	252	236	249	236
query59	3236	3204	3027	3027
query60	289	286	265	265
query61	126	124	121	121
query62	808	736	692	692
query63	236	205	192	192
query64	2993	1057	693	693
query65	3473	3272	3299	3272
query66	759	397	295	295
query67	16158	15588	15530	15530
query68	8486	902	555	555
query69	477	288	257	257
query70	1185	1170	1105	1105
query71	438	279	256	256
query72	5830	3961	4086	3961
query73	655	756	370	370
query74	10293	9180	8833	8833
query75	3975	3170	2656	2656
query76	3552	1177	781	781
query77	762	434	275	275
query78	9942	10230	9266	9266
query79	3655	843	588	588
query80	692	516	450	450
query81	507	276	242	242
query82	656	148	125	125
query83	200	183	154	154
query84	285	100	71	71
query85	767	363	311	311
query86	364	317	277	277
query87	4412	4441	4391	4391
query88	4789	2181	2141	2141
query89	417	335	287	287
query90	1805	191	191	191
query91	135	141	106	106
query92	65	57	50	50
query93	2322	928	546	546
query94	662	417	298	298
query95	338	263	261	261
query96	493	626	284	284
query97	3382	3396	3310	3310
query98	217	200	191	191
query99	1506	1405	1261	1261
Total cold run time: 291322 ms
Total hot run time: 193144 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bbb25f1b0e644ce1b6368ccaba3ed10d144321ce, data reload: false

query1	0.03	0.04	0.03
query2	0.08	0.05	0.05
query3	0.23	0.05	0.06
query4	1.65	0.08	0.09
query5	0.55	0.55	0.54
query6	1.19	0.72	0.74
query7	0.02	0.01	0.02
query8	0.07	0.05	0.05
query9	0.56	0.50	0.51
query10	0.56	0.56	0.56
query11	0.17	0.12	0.13
query12	0.15	0.12	0.13
query13	0.61	0.61	0.59
query14	2.71	2.86	2.88
query15	0.92	0.86	0.83
query16	0.38	0.39	0.40
query17	1.04	1.04	1.07
query18	0.19	0.18	0.19
query19	1.94	1.88	1.97
query20	0.01	0.01	0.01
query21	15.41	0.98	0.65
query22	0.77	0.80	0.72
query23	14.96	1.50	0.68
query24	2.22	0.36	0.23
query25	0.15	0.09	0.08
query26	0.27	0.19	0.17
query27	0.08	0.08	0.08
query28	13.42	1.29	0.55
query29	12.66	4.11	3.41
query30	0.24	0.08	0.07
query31	2.85	0.62	0.40
query32	3.23	0.59	0.49
query33	3.01	3.04	3.05
query34	16.43	5.20	4.52
query35	4.64	4.68	4.58
query36	0.62	0.49	0.48
query37	0.20	0.16	0.16
query38	0.15	0.15	0.15
query39	0.05	0.05	0.04
query40	0.18	0.14	0.12
query41	0.09	0.05	0.05
query42	0.07	0.05	0.05
query43	0.05	0.04	0.05
Total cold run time: 104.81 s
Total hot run time: 31.37 s

@mrhhsg mrhhsg force-pushed the spill_rebased branch 2 times, most recently from 9dce485 to ccc257c Compare February 14, 2025 08:08
@mrhhsg
Copy link
Member Author

mrhhsg commented Feb 14, 2025

run buildall

@mrhhsg
Copy link
Member Author

mrhhsg commented Feb 14, 2025

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.25% (1061/1290)
Line Coverage: 65.79% (17575/26713)
Region Coverage: 65.34% (8662/13257)
Branch Coverage: 55.21% (4666/8452)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6613edc989fbdcfcdc3ba88dc6198c37980bf465_6613edc989fbdcfcdc3ba88dc6198c37980bf465_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 31551 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6613edc989fbdcfcdc3ba88dc6198c37980bf465, data reload: false

------ Round 1 ----------------------------------
q1	17623	5279	5148	5148
q2	2047	305	171	171
q3	10390	1283	749	749
q4	10209	1021	521	521
q5	7539	2319	2361	2319
q6	193	175	137	137
q7	903	756	613	613
q8	9340	1372	1151	1151
q9	4957	4651	4760	4651
q10	6828	2292	1875	1875
q11	462	279	262	262
q12	344	354	219	219
q13	17765	3713	3095	3095
q14	221	227	224	224
q15	523	471	457	457
q16	655	619	580	580
q17	597	874	341	341
q18	7003	6207	6080	6080
q19	1439	967	538	538
q20	311	332	194	194
q21	2825	2246	1926	1926
q22	370	334	300	300
Total cold run time: 102544 ms
Total hot run time: 31551 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5190	5131	5161	5131
q2	239	335	226	226
q3	2159	2721	2328	2328
q4	1483	1817	1363	1363
q5	4299	4157	4185	4157
q6	209	165	126	126
q7	1900	1841	1740	1740
q8	2631	2643	2549	2549
q9	7247	7187	7133	7133
q10	3009	3273	2801	2801
q11	585	539	485	485
q12	701	788	657	657
q13	3477	3930	3270	3270
q14	270	307	266	266
q15	511	476	446	446
q16	667	698	652	652
q17	1135	1590	1369	1369
q18	7509	7357	7403	7357
q19	807	868	939	868
q20	1984	2009	1873	1873
q21	5540	5064	4745	4745
q22	618	579	537	537
Total cold run time: 52170 ms
Total hot run time: 50079 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190786 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6613edc989fbdcfcdc3ba88dc6198c37980bf465, data reload: false

query1	1308	976	931	931
query2	6249	1885	1960	1885
query3	10964	4425	4594	4425
query4	56741	25630	23329	23329
query5	5212	496	477	477
query6	383	194	187	187
query7	5142	503	281	281
query8	331	242	236	236
query9	6586	2627	2620	2620
query10	451	313	266	266
query11	15100	15137	14891	14891
query12	156	107	102	102
query13	1214	521	408	408
query14	10124	6895	6266	6266
query15	221	200	177	177
query16	7137	624	470	470
query17	1088	742	580	580
query18	1601	422	322	322
query19	206	209	177	177
query20	126	130	126	126
query21	214	130	117	117
query22	4360	4633	4371	4371
query23	34241	33551	33297	33297
query24	5817	2466	2431	2431
query25	467	502	395	395
query26	697	279	152	152
query27	1821	481	337	337
query28	2896	2528	2490	2490
query29	585	565	462	462
query30	226	203	171	171
query31	901	900	790	790
query32	72	61	62	61
query33	485	370	295	295
query34	761	859	510	510
query35	804	817	751	751
query36	952	1010	906	906
query37	117	102	76	76
query38	4204	4387	4153	4153
query39	1478	1483	1450	1450
query40	228	121	105	105
query41	55	52	51	51
query42	123	112	99	99
query43	511	533	497	497
query44	1319	823	829	823
query45	178	175	170	170
query46	865	1066	655	655
query47	1816	1874	1833	1833
query48	388	419	309	309
query49	705	531	429	429
query50	726	760	420	420
query51	4261	4284	4245	4245
query52	108	105	97	97
query53	235	272	201	201
query54	481	509	427	427
query55	88	80	82	80
query56	314	273	269	269
query57	1174	1179	1165	1165
query58	248	244	249	244
query59	2959	3047	2852	2852
query60	281	284	272	272
query61	128	123	124	123
query62	745	743	713	713
query63	226	193	195	193
query64	1771	1065	699	699
query65	3355	3161	3149	3149
query66	782	395	298	298
query67	15884	15456	15246	15246
query68	7510	866	501	501
query69	536	288	264	264
query70	1224	1121	1083	1083
query71	489	300	260	260
query72	5770	3666	3782	3666
query73	1317	734	359	359
query74	8988	9099	8946	8946
query75	3705	3160	2698	2698
query76	4169	1167	730	730
query77	613	358	270	270
query78	10148	10086	9345	9345
query79	2949	816	582	582
query80	709	516	476	476
query81	522	280	244	244
query82	528	123	96	96
query83	312	171	211	171
query84	290	98	75	75
query85	801	349	312	312
query86	415	306	303	303
query87	4531	4463	4420	4420
query88	3793	2207	2184	2184
query89	404	317	288	288
query90	1822	194	191	191
query91	139	144	109	109
query92	69	58	56	56
query93	2113	1049	575	575
query94	662	401	303	303
query95	349	264	247	247
query96	476	557	268	268
query97	3329	3431	3259	3259
query98	229	209	204	204
query99	1442	1419	1266	1266
Total cold run time: 301745 ms
Total hot run time: 190786 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6613edc989fbdcfcdc3ba88dc6198c37980bf465, data reload: false

query1	0.04	0.04	0.03
query2	0.10	0.05	0.05
query3	0.28	0.05	0.06
query4	1.61	0.07	0.08
query5	0.55	0.55	0.54
query6	1.20	0.73	0.74
query7	0.02	0.02	0.02
query8	0.05	0.05	0.05
query9	0.63	0.53	0.53
query10	0.58	0.58	0.58
query11	0.26	0.12	0.12
query12	0.24	0.12	0.13
query13	0.62	0.62	0.61
query14	2.72	2.80	2.81
query15	0.98	0.88	0.86
query16	0.37	0.37	0.39
query17	1.07	1.00	1.03
query18	0.18	0.17	0.19
query19	1.93	1.92	1.99
query20	0.02	0.01	0.02
query21	15.36	0.97	0.66
query22	0.92	1.04	0.75
query23	14.71	1.46	0.73
query24	7.49	0.86	0.38
query25	0.18	0.09	0.09
query26	0.62	0.23	0.18
query27	0.09	0.08	0.09
query28	11.07	1.13	0.54
query29	12.57	4.13	3.42
query30	0.27	0.08	0.06
query31	2.81	0.60	0.41
query32	3.22	0.58	0.50
query33	3.02	3.06	3.06
query34	16.56	5.08	4.40
query35	4.52	4.42	4.53
query36	0.63	0.51	0.50
query37	0.21	0.17	0.17
query38	0.17	0.16	0.15
query39	0.05	0.04	0.04
query40	0.20	0.15	0.15
query41	0.10	0.06	0.05
query42	0.06	0.04	0.05
query43	0.05	0.05	0.05
Total cold run time: 108.33 s
Total hot run time: 31.46 s

@mrhhsg
Copy link
Member Author

mrhhsg commented Feb 15, 2025

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.25% (1061/1290)
Line Coverage: 65.73% (17579/26746)
Region Coverage: 65.27% (8662/13271)
Branch Coverage: 55.15% (4667/8462)
Coverage Report: http://coverage.selectdb-in.cc/coverage/85b6160dc75487d3d60304943d05388aa020513b_85b6160dc75487d3d60304943d05388aa020513b_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 31894 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 85b6160dc75487d3d60304943d05388aa020513b, data reload: false

------ Round 1 ----------------------------------
q1	17683	5208	5524	5208
q2	2055	310	168	168
q3	10802	1254	747	747
q4	10276	1013	539	539
q5	8177	2387	2361	2361
q6	189	173	133	133
q7	896	755	607	607
q8	9320	1305	1097	1097
q9	4888	4739	4866	4739
q10	6817	2322	1886	1886
q11	478	282	262	262
q12	362	389	218	218
q13	17777	3707	3090	3090
q14	247	242	212	212
q15	513	478	472	472
q16	629	605	585	585
q17	588	863	349	349
q18	6856	6162	6235	6162
q19	1896	965	552	552
q20	314	322	195	195
q21	2852	2227	1988	1988
q22	370	337	324	324
Total cold run time: 103985 ms
Total hot run time: 31894 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5183	5104	5129	5104
q2	236	335	230	230
q3	2150	2666	2322	2322
q4	1482	1825	1371	1371
q5	4240	4143	4188	4143
q6	254	167	126	126
q7	1902	1877	1751	1751
q8	2617	2680	2586	2586
q9	7204	7146	7173	7146
q10	3033	3224	2797	2797
q11	572	511	495	495
q12	698	788	596	596
q13	3321	4008	3315	3315
q14	291	304	270	270
q15	509	469	450	450
q16	622	703	653	653
q17	1159	1635	1310	1310
q18	7653	7420	7355	7355
q19	800	779	836	779
q20	1984	2052	1866	1866
q21	5582	5059	4778	4778
q22	642	571	534	534
Total cold run time: 52134 ms
Total hot run time: 49977 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190712 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 85b6160dc75487d3d60304943d05388aa020513b, data reload: false

query1	1324	948	949	948
query2	6155	1907	1847	1847
query3	11134	4759	4628	4628
query4	54449	25816	23279	23279
query5	5161	554	469	469
query6	328	194	207	194
query7	4883	507	293	293
query8	319	254	228	228
query9	5801	2652	2645	2645
query10	434	311	246	246
query11	15537	15038	14910	14910
query12	162	111	110	110
query13	1045	511	395	395
query14	10523	6345	6785	6345
query15	208	204	188	188
query16	7103	689	499	499
query17	1093	718	587	587
query18	1577	429	326	326
query19	211	196	219	196
query20	121	126	121	121
query21	216	126	109	109
query22	4546	4408	4307	4307
query23	34005	33274	33393	33274
query24	5897	2455	2415	2415
query25	466	458	402	402
query26	728	277	153	153
query27	1840	491	331	331
query28	2778	2531	2492	2492
query29	570	559	427	427
query30	214	187	157	157
query31	888	867	816	816
query32	78	64	63	63
query33	450	372	303	303
query34	773	874	492	492
query35	821	819	731	731
query36	951	1004	902	902
query37	121	96	68	68
query38	4175	4303	4113	4113
query39	1516	1418	1466	1418
query40	227	116	110	110
query41	52	49	49	49
query42	121	106	107	106
query43	504	525	482	482
query44	1387	810	817	810
query45	179	178	173	173
query46	886	1078	672	672
query47	1847	1874	1805	1805
query48	406	426	315	315
query49	704	538	415	415
query50	721	769	437	437
query51	4369	4274	4286	4274
query52	103	102	94	94
query53	229	262	196	196
query54	492	487	420	420
query55	87	82	80	80
query56	275	281	267	267
query57	1183	1191	1122	1122
query58	253	239	242	239
query59	2764	2914	2662	2662
query60	291	269	272	269
query61	122	126	120	120
query62	744	720	689	689
query63	233	188	188	188
query64	2019	1085	663	663
query65	3283	3234	3214	3214
query66	724	397	298	298
query67	16149	15681	15460	15460
query68	7075	897	510	510
query69	531	293	270	270
query70	1191	1148	1122	1122
query71	495	294	260	260
query72	6099	3600	3737	3600
query73	1457	736	355	355
query74	8952	8979	9062	8979
query75	3782	3144	2697	2697
query76	4206	1182	741	741
query77	602	360	282	282
query78	10158	10133	9294	9294
query79	2241	832	584	584
query80	632	574	436	436
query81	503	279	237	237
query82	613	124	107	107
query83	171	177	148	148
query84	282	98	75	75
query85	757	350	300	300
query86	378	290	301	290
query87	4383	4403	4351	4351
query88	3635	2211	2199	2199
query89	405	318	287	287
query90	1890	192	191	191
query91	135	144	107	107
query92	69	60	62	60
query93	1574	1054	585	585
query94	646	384	297	297
query95	346	263	252	252
query96	476	557	268	268
query97	3329	3410	3336	3336
query98	224	214	204	204
query99	1455	1360	1232	1232
Total cold run time: 297647 ms
Total hot run time: 190712 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 85b6160dc75487d3d60304943d05388aa020513b, data reload: false

query1	0.04	0.03	0.03
query2	0.10	0.04	0.04
query3	0.28	0.06	0.06
query4	1.61	0.08	0.07
query5	0.55	0.57	0.56
query6	1.20	0.73	0.72
query7	0.02	0.01	0.02
query8	0.05	0.05	0.04
query9	0.62	0.52	0.52
query10	0.57	0.58	0.56
query11	0.25	0.12	0.13
query12	0.25	0.13	0.13
query13	0.63	0.62	0.61
query14	2.70	2.67	2.67
query15	1.00	0.87	0.88
query16	0.38	0.38	0.37
query17	1.03	1.05	1.05
query18	0.18	0.19	0.17
query19	1.90	1.78	2.02
query20	0.01	0.01	0.01
query21	15.35	0.97	0.65
query22	0.93	1.00	0.80
query23	14.70	1.50	0.72
query24	7.62	0.88	0.39
query25	0.17	0.10	0.09
query26	0.61	0.23	0.18
query27	0.08	0.08	0.08
query28	11.05	1.18	0.56
query29	12.60	4.07	3.46
query30	0.28	0.08	0.07
query31	2.81	0.60	0.41
query32	3.23	0.58	0.50
query33	2.99	3.07	3.07
query34	16.56	5.14	4.46
query35	4.51	4.51	4.50
query36	0.63	0.50	0.49
query37	0.21	0.18	0.16
query38	0.17	0.16	0.16
query39	0.05	0.04	0.05
query40	0.20	0.16	0.16
query41	0.10	0.05	0.05
query42	0.07	0.06	0.05
query43	0.05	0.04	0.05
Total cold run time: 108.34 s
Total hot run time: 31.48 s

@mrhhsg
Copy link
Member Author

mrhhsg commented Feb 15, 2025

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.25% (1061/1290)
Line Coverage: 65.77% (17590/26746)
Region Coverage: 65.32% (8668/13271)
Branch Coverage: 55.25% (4675/8462)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2b1cf1157d3b6e4b28be72dd5337d6ab6905237e_2b1cf1157d3b6e4b28be72dd5337d6ab6905237e_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 31885 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2b1cf1157d3b6e4b28be72dd5337d6ab6905237e, data reload: false

------ Round 1 ----------------------------------
q1	17575	5235	5180	5180
q2	2050	300	178	178
q3	10824	1240	781	781
q4	10351	1011	544	544
q5	9154	2370	2386	2370
q6	196	167	133	133
q7	888	759	610	610
q8	9321	1342	1125	1125
q9	4891	4829	4799	4799
q10	6832	2308	1886	1886
q11	467	294	263	263
q12	347	355	221	221
q13	17774	3706	3043	3043
q14	225	223	208	208
q15	522	474	465	465
q16	646	628	581	581
q17	590	849	347	347
q18	6746	6160	6118	6118
q19	1444	945	543	543
q20	320	343	192	192
q21	2926	2198	1986	1986
q22	371	339	312	312
Total cold run time: 104460 ms
Total hot run time: 31885 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5665	5170	5172	5170
q2	236	324	230	230
q3	2153	2675	2315	2315
q4	1456	1819	1403	1403
q5	4206	4101	4168	4101
q6	210	166	124	124
q7	1866	1918	1757	1757
q8	2615	2701	2597	2597
q9	7200	7216	7126	7126
q10	3052	3232	2766	2766
q11	588	507	488	488
q12	692	766	602	602
q13	3377	3931	3254	3254
q14	268	308	276	276
q15	515	467	469	467
q16	644	689	654	654
q17	1141	1584	1357	1357
q18	7514	7470	7456	7456
q19	825	806	875	806
q20	1996	2011	1869	1869
q21	5553	5097	4683	4683
q22	667	589	543	543
Total cold run time: 52439 ms
Total hot run time: 50044 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190515 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2b1cf1157d3b6e4b28be72dd5337d6ab6905237e, data reload: false

query1	1301	948	947	947
query2	6096	1860	1874	1860
query3	11144	4453	4536	4453
query4	55170	24632	22971	22971
query5	5291	513	491	491
query6	390	208	193	193
query7	5273	534	292	292
query8	335	264	238	238
query9	7272	2641	2635	2635
query10	458	314	261	261
query11	15359	15241	14904	14904
query12	153	107	106	106
query13	1257	522	392	392
query14	10699	6645	6558	6558
query15	196	190	183	183
query16	6972	692	475	475
query17	1062	701	566	566
query18	1525	409	310	310
query19	191	187	153	153
query20	123	131	120	120
query21	209	124	105	105
query22	4611	4547	4423	4423
query23	34054	33410	33462	33410
query24	5608	2461	2409	2409
query25	451	460	391	391
query26	649	283	161	161
query27	1747	510	346	346
query28	2765	2519	2459	2459
query29	613	570	451	451
query30	214	198	162	162
query31	891	880	797	797
query32	76	63	68	63
query33	464	355	316	316
query34	748	870	512	512
query35	814	837	763	763
query36	941	992	914	914
query37	119	108	75	75
query38	4238	4193	4124	4124
query39	1523	1474	1518	1474
query40	222	125	110	110
query41	57	60	56	56
query42	121	106	108	106
query43	516	517	496	496
query44	1290	814	808	808
query45	193	176	169	169
query46	912	1057	653	653
query47	1846	1886	1792	1792
query48	390	412	324	324
query49	697	531	447	447
query50	750	749	417	417
query51	4342	4322	4294	4294
query52	111	106	97	97
query53	241	257	191	191
query54	487	487	415	415
query55	78	80	81	80
query56	262	286	257	257
query57	1178	1184	1146	1146
query58	278	246	249	246
query59	2751	2923	2767	2767
query60	286	273	272	272
query61	120	119	121	119
query62	702	737	677	677
query63	235	193	187	187
query64	1402	1056	681	681
query65	3291	3236	3233	3233
query66	760	392	296	296
query67	15866	15536	15203	15203
query68	6662	878	502	502
query69	549	342	269	269
query70	1203	1094	1116	1094
query71	487	297	262	262
query72	5955	3637	3654	3637
query73	1181	741	354	354
query74	9199	9148	8971	8971
query75	3677	3158	2712	2712
query76	4246	1193	749	749
query77	615	355	284	284
query78	10035	9978	9241	9241
query79	3725	822	570	570
query80	699	537	446	446
query81	510	274	238	238
query82	658	126	95	95
query83	314	171	151	151
query84	291	95	72	72
query85	795	348	384	348
query86	368	297	278	278
query87	4502	4580	4218	4218
query88	2833	2248	2223	2223
query89	434	314	289	289
query90	1968	190	191	190
query91	135	136	105	105
query92	77	60	56	56
query93	2204	1051	578	578
query94	683	410	302	302
query95	341	269	257	257
query96	483	561	276	276
query97	3336	3414	3265	3265
query98	214	212	205	205
query99	1452	1411	1276	1276
Total cold run time: 300217 ms
Total hot run time: 190515 ms

@mrhhsg
Copy link
Member Author

mrhhsg commented Feb 16, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31730 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 97d1d4dfd5676b60718c5ef3049376e767cec281, data reload: false

------ Round 1 ----------------------------------
q1	17603	5471	5104	5104
q2	2063	302	171	171
q3	10474	1249	760	760
q4	10219	1010	544	544
q5	7570	2469	2301	2301
q6	201	169	139	139
q7	923	770	632	632
q8	9306	1352	1170	1170
q9	4944	4711	4680	4680
q10	6843	2335	1916	1916
q11	464	277	254	254
q12	354	365	232	232
q13	17765	3776	3096	3096
q14	233	239	223	223
q15	527	470	472	470
q16	630	628	579	579
q17	585	855	347	347
q18	7023	6225	6138	6138
q19	1592	952	541	541
q20	323	326	196	196
q21	2948	2107	1934	1934
q22	380	341	303	303
Total cold run time: 102970 ms
Total hot run time: 31730 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5158	5122	5117	5117
q2	247	330	232	232
q3	2170	2692	2364	2364
q4	1451	1854	1371	1371
q5	4245	4114	4166	4114
q6	209	171	130	130
q7	1874	1832	1793	1793
q8	2612	2582	2599	2582
q9	7223	7185	7197	7185
q10	3020	3189	2803	2803
q11	606	516	510	510
q12	707	796	641	641
q13	3365	3992	4072	3992
q14	283	299	274	274
q15	511	482	470	470
q16	660	682	666	666
q17	1157	1645	1325	1325
q18	7593	7367	7350	7350
q19	801	821	920	821
q20	1965	2021	1881	1881
q21	5580	5089	4821	4821
q22	648	604	555	555
Total cold run time: 52085 ms
Total hot run time: 50997 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 43.93% (11775/26804)
Line Coverage: 33.81% (98697/291920)
Region Coverage: 32.96% (50418/152948)
Branch Coverage: 28.57% (25290/88520)
Coverage Report: http://coverage.selectdb-in.cc/coverage/97d1d4dfd5676b60718c5ef3049376e767cec281_97d1d4dfd5676b60718c5ef3049376e767cec281/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 191307 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 97d1d4dfd5676b60718c5ef3049376e767cec281, data reload: false

query1	1285	940	939	939
query2	6234	1968	1981	1968
query3	11165	4480	4618	4480
query4	56700	25340	23466	23466
query5	5102	546	523	523
query6	394	210	188	188
query7	5377	508	295	295
query8	328	244	231	231
query9	7056	2620	2638	2620
query10	435	323	262	262
query11	15308	15133	14861	14861
query12	156	103	104	103
query13	1274	520	383	383
query14	10554	6700	6634	6634
query15	200	191	182	182
query16	7052	670	504	504
query17	1097	688	563	563
query18	1505	403	328	328
query19	212	194	171	171
query20	134	124	120	120
query21	208	122	110	110
query22	4348	4451	4424	4424
query23	33911	33367	33395	33367
query24	5707	2426	2435	2426
query25	501	498	443	443
query26	654	304	168	168
query27	1623	514	352	352
query28	2903	2511	2489	2489
query29	621	578	469	469
query30	216	195	160	160
query31	899	875	825	825
query32	74	74	62	62
query33	493	373	316	316
query34	728	864	505	505
query35	827	851	768	768
query36	975	996	888	888
query37	124	101	79	79
query38	4172	4213	4239	4213
query39	1464	1443	1473	1443
query40	208	117	105	105
query41	52	51	53	51
query42	121	105	112	105
query43	510	537	511	511
query44	1306	835	818	818
query45	184	178	163	163
query46	910	1095	659	659
query47	1895	1846	1799	1799
query48	408	434	326	326
query49	704	524	436	436
query50	708	731	417	417
query51	4282	4407	4268	4268
query52	106	103	101	101
query53	238	258	190	190
query54	485	489	428	428
query55	85	81	82	81
query56	270	273	263	263
query57	1155	1200	1113	1113
query58	246	238	241	238
query59	2720	2830	2681	2681
query60	280	264	256	256
query61	122	121	120	120
query62	766	717	710	710
query63	238	189	201	189
query64	1410	1047	696	696
query65	3253	3121	3141	3121
query66	785	427	306	306
query67	15930	15595	15179	15179
query68	8073	911	532	532
query69	563	294	264	264
query70	1241	1136	1147	1136
query71	503	311	263	263
query72	5673	3573	3769	3573
query73	1120	751	369	369
query74	9159	8866	8867	8866
query75	3769	3213	2700	2700
query76	4378	1165	742	742
query77	599	378	288	288
query78	9989	10257	9324	9324
query79	2211	887	623	623
query80	643	525	453	453
query81	492	274	233	233
query82	670	127	96	96
query83	175	174	165	165
query84	285	95	74	74
query85	789	350	303	303
query86	372	338	300	300
query87	4432	4465	4315	4315
query88	3407	2236	2190	2190
query89	436	329	305	305
query90	1982	200	192	192
query91	138	145	111	111
query92	77	62	56	56
query93	1156	1024	592	592
query94	675	395	306	306
query95	349	269	262	262
query96	507	548	281	281
query97	3331	3412	3242	3242
query98	242	201	196	196
query99	1543	1383	1297	1297
Total cold run time: 300372 ms
Total hot run time: 191307 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.08 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 97d1d4dfd5676b60718c5ef3049376e767cec281, data reload: false

query1	0.04	0.04	0.03
query2	0.10	0.04	0.05
query3	0.28	0.05	0.05
query4	1.61	0.07	0.07
query5	0.55	0.53	0.54
query6	1.20	0.72	0.73
query7	0.02	0.02	0.01
query8	0.06	0.05	0.04
query9	0.62	0.51	0.52
query10	0.60	0.58	0.57
query11	0.26	0.12	0.12
query12	0.25	0.13	0.13
query13	0.63	0.61	0.62
query14	2.64	2.70	2.69
query15	1.01	0.87	0.88
query16	0.38	0.37	0.38
query17	1.04	1.04	1.03
query18	0.19	0.18	0.20
query19	1.93	1.78	1.98
query20	0.01	0.01	0.01
query21	15.37	0.94	0.65
query22	0.92	1.03	0.79
query23	14.70	1.46	0.72
query24	5.53	0.54	0.27
query25	0.17	0.09	0.08
query26	0.56	0.22	0.18
query27	0.09	0.08	0.08
query28	10.98	1.20	0.54
query29	12.57	4.07	3.35
query30	0.28	0.09	0.06
query31	2.84	0.61	0.41
query32	3.22	0.58	0.50
query33	3.13	3.01	3.24
query34	16.54	5.11	4.46
query35	4.57	4.47	4.54
query36	0.63	0.51	0.50
query37	0.20	0.17	0.16
query38	0.17	0.15	0.14
query39	0.04	0.04	0.04
query40	0.19	0.17	0.17
query41	0.11	0.06	0.05
query42	0.07	0.05	0.05
query43	0.05	0.05	0.04
Total cold run time: 106.35 s
Total hot run time: 31.08 s

@mrhhsg
Copy link
Member Author

mrhhsg commented Feb 21, 2025

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.25% (1061/1290)
Line Coverage: 65.74% (17584/26746)
Region Coverage: 65.28% (8663/13271)
Branch Coverage: 55.21% (4672/8462)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c76b210304f78c8cfc67e907ece3ce30526044b6_c76b210304f78c8cfc67e907ece3ce30526044b6_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 31701 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c76b210304f78c8cfc67e907ece3ce30526044b6, data reload: false

------ Round 1 ----------------------------------
q1	17600	5314	5094	5094
q2	2057	302	181	181
q3	10475	1305	734	734
q4	10244	1053	547	547
q5	7610	2523	2311	2311
q6	196	167	130	130
q7	914	741	623	623
q8	9314	1357	1190	1190
q9	4940	4631	4659	4631
q10	6840	2322	1894	1894
q11	486	275	260	260
q12	356	359	218	218
q13	17764	3663	3058	3058
q14	235	228	217	217
q15	507	475	475	475
q16	637	615	590	590
q17	602	887	356	356
q18	6531	6221	6175	6175
q19	2000	980	564	564
q20	308	324	183	183
q21	2837	2218	1968	1968
q22	360	333	302	302
Total cold run time: 102813 ms
Total hot run time: 31701 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5239	5177	5100	5100
q2	247	337	230	230
q3	2155	2687	2316	2316
q4	1438	1870	1384	1384
q5	4271	4175	4250	4175
q6	206	167	122	122
q7	1872	1796	1762	1762
q8	2645	2697	2590	2590
q9	7333	7140	7221	7140
q10	2998	3176	2784	2784
q11	579	515	480	480
q12	677	764	639	639
q13	3518	3938	3240	3240
q14	296	292	264	264
q15	499	462	460	460
q16	630	686	622	622
q17	1129	1583	1360	1360
q18	7738	7475	7392	7392
q19	805	887	902	887
q20	1993	2029	1867	1867
q21	5394	5071	4847	4847
q22	633	599	535	535
Total cold run time: 52295 ms
Total hot run time: 50196 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183923 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c76b210304f78c8cfc67e907ece3ce30526044b6, data reload: false

query1	977	381	389	381
query2	6566	1909	1852	1852
query3	6794	223	213	213
query4	26463	23403	23478	23403
query5	4358	632	514	514
query6	289	193	186	186
query7	4595	509	299	299
query8	276	235	217	217
query9	8586	2516	2524	2516
query10	460	311	264	264
query11	15634	15114	14966	14966
query12	154	111	104	104
query13	1646	519	379	379
query14	9382	6391	6175	6175
query15	211	198	180	180
query16	7207	644	489	489
query17	1212	737	569	569
query18	1964	401	305	305
query19	200	191	161	161
query20	123	115	120	115
query21	208	127	106	106
query22	4201	4210	4209	4209
query23	33941	32815	33044	32815
query24	7922	2381	2418	2381
query25	564	482	420	420
query26	1264	290	165	165
query27	2476	487	351	351
query28	4193	2437	2411	2411
query29	820	572	487	487
query30	235	186	159	159
query31	919	832	767	767
query32	75	65	60	60
query33	565	363	304	304
query34	771	868	494	494
query35	810	821	749	749
query36	956	986	891	891
query37	119	104	73	73
query38	4054	4181	4042	4042
query39	1424	1377	1395	1377
query40	213	123	107	107
query41	54	51	50	50
query42	121	108	103	103
query43	490	494	469	469
query44	1276	797	796	796
query45	174	166	167	166
query46	861	1029	632	632
query47	1755	1809	1738	1738
query48	389	416	325	325
query49	780	520	437	437
query50	691	736	425	425
query51	4151	4149	4117	4117
query52	107	107	96	96
query53	228	258	184	184
query54	483	487	417	417
query55	83	78	78	78
query56	264	268	242	242
query57	1132	1146	1065	1065
query58	252	233	268	233
query59	2608	2741	2477	2477
query60	278	272	263	263
query61	118	119	121	119
query62	774	767	650	650
query63	237	194	190	190
query64	4513	1025	693	693
query65	3238	3126	3143	3126
query66	1137	405	308	308
query67	15887	15651	15198	15198
query68	8365	883	514	514
query69	457	295	265	265
query70	1238	1099	1120	1099
query71	452	292	286	286
query72	5213	3604	3658	3604
query73	742	807	348	348
query74	9207	9097	9008	9008
query75	3590	3199	2750	2750
query76	3643	1167	744	744
query77	813	365	288	288
query78	10004	10132	9235	9235
query79	3287	854	606	606
query80	716	530	453	453
query81	502	274	245	245
query82	663	130	94	94
query83	212	187	164	164
query84	294	100	74	74
query85	802	354	309	309
query86	339	313	290	290
query87	4511	4474	4312	4312
query88	3350	2182	2165	2165
query89	439	319	290	290
query90	1926	193	193	193
query91	134	147	119	119
query92	79	65	57	57
query93	2063	1021	583	583
query94	677	423	308	308
query95	353	271	259	259
query96	502	544	285	285
query97	3294	3336	3273	3273
query98	229	212	206	206
query99	1437	1401	1244	1244
Total cold run time: 274441 ms
Total hot run time: 183923 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c76b210304f78c8cfc67e907ece3ce30526044b6, data reload: false

query1	0.03	0.04	0.04
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.61	0.10	0.10
query5	0.56	0.55	0.54
query6	1.21	0.72	0.73
query7	0.02	0.02	0.01
query8	0.04	0.03	0.04
query9	0.58	0.54	0.53
query10	0.57	0.57	0.56
query11	0.15	0.11	0.10
query12	0.14	0.11	0.11
query13	0.62	0.59	0.60
query14	2.67	2.80	2.67
query15	0.93	0.88	0.85
query16	0.38	0.38	0.41
query17	1.03	1.04	1.01
query18	0.22	0.20	0.19
query19	1.88	2.00	1.81
query20	0.01	0.01	0.01
query21	15.39	0.87	0.55
query22	0.75	1.21	0.67
query23	14.93	1.39	0.61
query24	7.45	1.21	1.24
query25	0.46	0.26	0.09
query26	0.48	0.16	0.13
query27	0.05	0.04	0.05
query28	10.05	0.81	0.44
query29	12.55	4.06	3.29
query30	0.24	0.09	0.06
query31	2.82	0.58	0.38
query32	3.23	0.55	0.47
query33	3.03	3.04	3.02
query34	15.70	5.11	4.50
query35	4.56	4.49	4.60
query36	0.66	0.49	0.50
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.02	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.04	0.03
Total cold run time: 105.79 s
Total hot run time: 31.11 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 44.56% (11879/26658)
Line Coverage: 34.07% (99401/291788)
Region Coverage: 33.19% (50779/152996)
Branch Coverage: 28.75% (25494/88676)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c76b210304f78c8cfc67e907ece3ce30526044b6_c76b210304f78c8cfc67e907ece3ce30526044b6/report/index.html

"{} exceed limit {} or {} less than low water mark {}", process_memory_used_str(),
MemInfo::mem_limit_str(), sys_mem_available_str(),
"{} exceed limit {} or {} less than low water mark {}",
process_memory_used_details_str(), MemInfo::mem_limit_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里用 details 的内存信息可能不合适,两个问题:1. process_limit_exceeded_errmsg_str 会输出到 mysql client 展示给用户,打出 details 会增加用户理解成过本(可能问题也不大…)。 2. mysql client 一行的长度是有限制的,太长会被截断。

是否打 details @yiguolei 大佬可以确认下

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们先输出,等spill 稳定了,在改小吧

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:
打details后,检查下 mysql client 是否会截断错误信息

@@ -81,6 +81,11 @@ std::shared_ptr<MemTrackerLimiter> MemTrackerLimiter::create_shared(MemTrackerLi
const std::string& label,
int64_t byte_limit) {
auto tracker = std::make_shared<MemTrackerLimiter>(type, label, byte_limit);
// Write tracker is only used to tracker the size, so limit == -1
auto write_tracker = std::make_shared<MemTrackerLimiter>(type, "Memtable" + label, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一个Load任务,会有两个 Type::Load 的 tracker 在全局的 Map 中,目前 Momory GC 中 cancel query/load 的逻辑会有问题(虽然做了容错,只会影响日志,最终行为没问题)。后续将 Momory GC 改成依赖 resource ctx 后就没问题。

另外既然一个Load任务有两个Tracker, MemoryProfile 中应该在 Load Memory 中区分 Framgnet 和 Memtable 内存

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在应该是认为这2个没关系,我们去cancel的时候也认为memtable 的buffer 部分不算到query 里的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:
MemoryProfile 中应该在 Load Memory 中区分 Framgnet 和 Memtable 内存

@@ -278,40 +279,57 @@ inline void ThreadMemTrackerMgr::flush_untracked_mem() {
_stop_consume = false;
}

inline doris::Status ThreadMemTrackerMgr::try_reserve(int64_t size) {
inline doris::Status ThreadMemTrackerMgr::try_reserve(int64_t size,
bool only_check_process_memory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么 FlushToken::_try_reserve_memory 需要 only_check_process_memory=true 呢

只要让 FlushToken::_try_reserve_memory scoped 的 tracker limit = -1,workload group ptr = null,自然 try reserve 就只检查 process memory 了

现在为了支持 only_check_process_memory,在 memory tracker 和 work load group 分别新增了一个 reserve 方法,代码看着更乱了

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacktengg 看看

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为memtable fush exectuor 是BE上所有workload group 共享的,在其中reserve的内存也是只算在整个BE process上,不单独计入某个workload group或者查询。
修改 tracker limit = -1或者workload group ptr = null,当前风险较高,先保持现状吧,后续再优化。

}
// If the query is a pure load task(streamload, routine load, group commit), then it should not use
// memlimit per query to limit their memory usage.
if (is_pure_load_task()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在也不支持限制单个 Load 任务的 memtable 内存大小,那限制 Load 任务的内存只能用 Workload Group 了吧
用户线上经常出现各种类型的导入任务把内存打满。

}

size_t minimum_operator_memory_required_bytes() const {
if (_query_options.__isset.minimum_operator_memory_required_kb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

除了 join、agg、sort 外的其他算子,reserve memory 也要检查这个最小值是否满足么,这会不会增加复杂度啊,只让这3个算子检查是不是就行

return thread_mem_tracker_mgr->try_reserve(size, false);
}

void release_reserved_memory() const { thread_mem_tracker_mgr->shrink_reserved(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们统一叫 shrink_reserved_memory 吧

@@ -223,6 +224,21 @@ class ThreadContext {
// to nullptr, but the object it points to is not initialized. At this time, when the memory
// is released somewhere, the hook is triggered to cause the crash.
std::unique_ptr<ThreadMemTrackerMgr> thread_mem_tracker_mgr;

[[nodiscard]] std::shared_ptr<MemTrackerLimiter> thread_mem_tracker() const {
return thread_mem_tracker_mgr->limiter_mem_tracker();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之所以让 thread_mem_tracker_mgr 作为 public 属性,就是不想在 ThreadContext 中新增方法,现在外部都是直接用的 thread_context()->thread_mem_tracker_mgr->limiter_mem_tracker()。 目的是尽量让 ThreadContext 这个类简单,后续不再修改,不过这个设计模式也值得商榷。

@@ -58,6 +61,10 @@ class IOContext : public std::enable_shared_from_this<IOContext> {
shuffle_send_bytes_counter_ = ADD_COUNTER(profile_, "ShuffleSendBytes", TUnit::BYTES);
shuffle_send_rows_counter_ =
ADD_COUNTER(profile_, "ShuffleSendRowsCounter_", TUnit::UNIT);
spill_write_bytes_to_local_storage_counter_ =
ADD_COUNTER(profile_, "SpillWriteBytesToLocalStorage", TUnit::UNIT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单位应该是 TUnit::BYTES

spill_write_bytes_to_local_storage_counter_ =
ADD_COUNTER(profile_, "SpillWriteBytesToLocalStorage", TUnit::UNIT);
spill_read_bytes_from_local_storage_counter_ =
ADD_COUNTER(profile_, "SpillReadBytesFromLocalStorage", TUnit::UNIT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,单位应该是 TUnit::BYTES

@mrhhsg mrhhsg marked this pull request as ready for review February 24, 2025 07:06
@mrhhsg
Copy link
Member Author

mrhhsg commented Feb 24, 2025

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.25% (1061/1290)
Line Coverage: 65.78% (17593/26746)
Region Coverage: 65.32% (8669/13271)
Branch Coverage: 55.24% (4674/8462)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c76b210304f78c8cfc67e907ece3ce30526044b6_c76b210304f78c8cfc67e907ece3ce30526044b6_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 31602 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c76b210304f78c8cfc67e907ece3ce30526044b6, data reload: false

------ Round 1 ----------------------------------
q1	17678	5350	5171	5171
q2	2050	295	177	177
q3	10626	1329	725	725
q4	10325	1026	542	542
q5	8649	2518	2294	2294
q6	206	175	137	137
q7	917	757	595	595
q8	9310	1290	1240	1240
q9	4965	4568	4556	4556
q10	6850	2311	1890	1890
q11	482	277	258	258
q12	353	351	217	217
q13	17780	3677	3121	3121
q14	226	226	209	209
q15	526	459	456	456
q16	640	608	590	590
q17	584	852	339	339
q18	6844	6226	6173	6173
q19	1296	957	541	541
q20	305	323	192	192
q21	2835	2118	1883	1883
q22	376	327	296	296
Total cold run time: 103823 ms
Total hot run time: 31602 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5141	5096	5104	5096
q2	257	334	228	228
q3	2146	2665	2260	2260
q4	1461	1821	1370	1370
q5	4240	4097	4180	4097
q6	206	164	124	124
q7	1875	1836	1779	1779
q8	2620	2581	2540	2540
q9	7259	6986	7153	6986
q10	2984	3225	2828	2828
q11	578	510	476	476
q12	683	742	584	584
q13	3501	3888	3263	3263
q14	277	292	265	265
q15	500	466	464	464
q16	628	682	654	654
q17	1123	1528	1384	1384
q18	7640	7354	7288	7288
q19	796	949	1094	949
q20	2011	1999	1892	1892
q21	5600	5010	4785	4785
q22	613	624	559	559
Total cold run time: 52139 ms
Total hot run time: 49871 ms

1. improve exchange sink low mem mode;
2. improve exchange memory usage counter
3. fix string hash table memory usage;
4. fix deadlock if disable_memory_gc;
5. improve some log prints.
@jacktengg
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-DS: Total hot run time: 183548 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c76b210304f78c8cfc67e907ece3ce30526044b6, data reload: false

query1	991	396	382	382
query2	6520	1894	1845	1845
query3	6794	220	215	215
query4	26551	23675	23372	23372
query5	4325	673	481	481
query6	313	198	185	185
query7	4610	506	299	299
query8	304	244	235	235
query9	8614	2536	2544	2536
query10	474	328	252	252
query11	15637	15190	14870	14870
query12	161	108	110	108
query13	1674	511	403	403
query14	9854	6665	6266	6266
query15	211	191	174	174
query16	7342	624	491	491
query17	1202	701	570	570
query18	1812	398	321	321
query19	230	191	162	162
query20	124	115	115	115
query21	210	131	110	110
query22	4110	4295	4024	4024
query23	33922	32987	32799	32799
query24	7713	2349	2427	2349
query25	570	454	367	367
query26	1235	272	156	156
query27	2220	501	356	356
query28	3923	2451	2413	2413
query29	722	533	432	432
query30	236	182	169	169
query31	934	839	752	752
query32	74	63	63	63
query33	561	353	307	307
query34	785	849	491	491
query35	783	809	772	772
query36	938	958	891	891
query37	113	96	75	75
query38	4070	4261	4079	4079
query39	1437	1402	1405	1402
query40	201	115	101	101
query41	53	51	51	51
query42	122	104	106	104
query43	498	507	479	479
query44	1263	788	780	780
query45	175	166	159	159
query46	864	1052	643	643
query47	1725	1745	1683	1683
query48	384	419	320	320
query49	770	513	426	426
query50	662	749	404	404
query51	4171	4219	4083	4083
query52	105	107	95	95
query53	224	257	186	186
query54	488	495	409	409
query55	80	78	81	78
query56	260	287	235	235
query57	1105	1113	1059	1059
query58	241	250	235	235
query59	2591	2558	2554	2554
query60	275	284	253	253
query61	119	117	121	117
query62	783	719	651	651
query63	236	186	189	186
query64	4423	1002	712	712
query65	3210	3105	3144	3105
query66	1129	420	313	313
query67	15610	15492	15258	15258
query68	8330	877	522	522
query69	467	293	263	263
query70	1231	1117	1102	1102
query71	466	284	263	263
query72	5391	3592	3671	3592
query73	721	730	339	339
query74	8985	9195	8920	8920
query75	3703	3185	2698	2698
query76	3685	1161	723	723
query77	799	378	275	275
query78	9886	10232	9313	9313
query79	1695	898	615	615
query80	677	562	454	454
query81	485	275	244	244
query82	202	126	104	104
query83	213	182	154	154
query84	294	107	77	77
query85	809	352	309	309
query86	345	315	288	288
query87	4665	4519	4322	4322
query88	2872	2249	2182	2182
query89	403	314	286	286
query90	2076	200	201	200
query91	136	139	107	107
query92	79	59	54	54
query93	1130	1053	580	580
query94	640	419	302	302
query95	352	273	264	264
query96	502	536	276	276
query97	3292	3427	3256	3256
query98	221	204	199	199
query99	1448	1418	1297	1297
Total cold run time: 270266 ms
Total hot run time: 183548 ms

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.25% (1061/1290)
Line Coverage: 65.78% (17593/26746)
Region Coverage: 65.32% (8668/13271)
Branch Coverage: 55.20% (4671/8462)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ee4b33e6b5a433f1a8f3b9cdcc4b2c8898fe06fa_ee4b33e6b5a433f1a8f3b9cdcc4b2c8898fe06fa_cloud/report/index.html

RETURN_IF_ERROR(disable_runtime_filters(state));
}
}
// if (state->fuzzy_disable_runtime_filter_in_be()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just delete this

LOG(WARNING) << "PipelineFragmentContext is cancelled due to timeout:";
while (pos < total_size) {
tmp_size = std::min(max_log_size, total_size - pos);
LOG(WARNING) << "===" << std::string(dbg_str.data() + pos, tmp_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug log need to be deleted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is only printed when query is timeout, to help debug possible problems of query stuck. Can keep it for now.

@doris-robot
Copy link

TPC-H: Total hot run time: 31860 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ee4b33e6b5a433f1a8f3b9cdcc4b2c8898fe06fa, data reload: false

------ Round 1 ----------------------------------
q1	17666	5353	5084	5084
q2	2044	296	180	180
q3	10402	1297	737	737
q4	10217	1023	540	540
q5	7524	2474	2313	2313
q6	190	166	132	132
q7	906	760	634	634
q8	9320	1338	1237	1237
q9	4897	4689	4645	4645
q10	6841	2326	1906	1906
q11	465	286	268	268
q12	364	365	221	221
q13	17781	3670	3094	3094
q14	223	237	220	220
q15	523	486	465	465
q16	631	613	584	584
q17	595	874	346	346
q18	6916	6287	6223	6223
q19	1211	940	569	569
q20	316	321	206	206
q21	2922	2304	1955	1955
q22	376	339	301	301
Total cold run time: 102330 ms
Total hot run time: 31860 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5124	5115	5100	5100
q2	235	327	235	235
q3	2149	2712	2316	2316
q4	1427	1852	1348	1348
q5	4284	4133	4153	4133
q6	207	160	125	125
q7	1876	1819	1787	1787
q8	2616	2700	2574	2574
q9	7252	7109	7196	7109
q10	2999	3199	2773	2773
q11	579	508	492	492
q12	719	776	638	638
q13	3556	3898	3264	3264
q14	298	306	268	268
q15	518	467	478	467
q16	664	703	648	648
q17	1144	1615	1373	1373
q18	7653	7270	7315	7270
q19	862	848	944	848
q20	1954	2013	1902	1902
q21	5466	5058	4954	4954
q22	619	579	553	553
Total cold run time: 52201 ms
Total hot run time: 50177 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184587 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ee4b33e6b5a433f1a8f3b9cdcc4b2c8898fe06fa, data reload: false

query1	999	381	397	381
query2	6574	1875	1847	1847
query3	6798	221	211	211
query4	25989	23657	23320	23320
query5	4323	670	497	497
query6	310	195	201	195
query7	4612	499	294	294
query8	308	235	219	219
query9	8614	2511	2489	2489
query10	466	300	248	248
query11	15701	15089	14957	14957
query12	152	108	103	103
query13	1647	515	378	378
query14	9462	6321	6609	6321
query15	214	203	176	176
query16	7152	617	497	497
query17	1179	717	575	575
query18	1961	409	302	302
query19	196	190	161	161
query20	119	120	120	120
query21	208	135	103	103
query22	4311	4253	4270	4253
query23	33991	33032	33070	33032
query24	7803	2391	2386	2386
query25	552	488	463	463
query26	1237	280	158	158
query27	2118	501	320	320
query28	3886	2439	2422	2422
query29	750	542	419	419
query30	236	190	160	160
query31	928	875	794	794
query32	73	68	65	65
query33	555	355	304	304
query34	776	855	514	514
query35	813	822	776	776
query36	946	977	878	878
query37	160	97	75	75
query38	4254	4108	4085	4085
query39	1449	1397	1385	1385
query40	212	113	104	104
query41	62	61	61	61
query42	123	109	97	97
query43	486	512	474	474
query44	1317	796	788	788
query45	181	168	162	162
query46	873	1035	633	633
query47	1756	1777	1698	1698
query48	388	421	304	304
query49	803	509	449	449
query50	678	714	439	439
query51	4179	4161	4161	4161
query52	110	106	94	94
query53	223	248	187	187
query54	495	472	427	427
query55	78	78	81	78
query56	259	256	269	256
query57	1132	1125	1065	1065
query58	260	228	256	228
query59	2617	2718	2458	2458
query60	278	322	262	262
query61	131	121	119	119
query62	792	727	666	666
query63	227	201	195	195
query64	4364	1006	673	673
query65	3187	3153	3121	3121
query66	1128	399	301	301
query67	15914	15632	15351	15351
query68	8613	877	516	516
query69	460	293	259	259
query70	1207	1133	1100	1100
query71	449	278	282	278
query72	5423	3640	3750	3640
query73	757	746	363	363
query74	9222	9244	8890	8890
query75	3744	3135	2686	2686
query76	3608	1175	737	737
query77	798	353	287	287
query78	9869	10260	9432	9432
query79	2337	835	610	610
query80	616	532	448	448
query81	485	283	254	254
query82	666	124	98	98
query83	174	174	166	166
query84	247	101	77	77
query85	783	395	308	308
query86	339	299	273	273
query87	4465	4447	4432	4432
query88	3236	2175	2152	2152
query89	405	325	292	292
query90	1931	195	194	194
query91	135	138	114	114
query92	75	63	66	63
query93	1108	1081	579	579
query94	658	469	298	298
query95	347	263	266	263
query96	486	550	262	262
query97	3297	3485	3304	3304
query98	222	206	203	203
query99	1449	1399	1275	1275
Total cold run time: 271686 ms
Total hot run time: 184587 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ee4b33e6b5a433f1a8f3b9cdcc4b2c8898fe06fa, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.10
query5	0.56	0.55	0.55
query6	1.20	0.73	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.03
query9	0.58	0.54	0.53
query10	0.56	0.58	0.59
query11	0.15	0.11	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	2.78	2.71	2.80
query15	0.92	0.86	0.85
query16	0.38	0.38	0.37
query17	1.02	1.04	1.02
query18	0.22	0.20	0.20
query19	1.89	1.97	1.87
query20	0.01	0.02	0.01
query21	15.36	0.90	0.55
query22	0.78	1.17	0.72
query23	15.09	1.38	0.67
query24	7.29	1.45	1.06
query25	0.51	0.25	0.12
query26	0.50	0.16	0.15
query27	0.05	0.05	0.06
query28	9.22	0.88	0.45
query29	12.61	4.05	3.35
query30	0.27	0.09	0.07
query31	2.81	0.58	0.40
query32	3.25	0.55	0.47
query33	3.01	3.00	2.97
query34	15.72	5.15	4.56
query35	4.53	4.57	4.52
query36	0.67	0.50	0.49
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.16	0.14	0.13
query41	0.08	0.02	0.03
query42	0.03	0.02	0.03
query43	0.04	0.04	0.03
Total cold run time: 105.23 s
Total hot run time: 31.4 s

@jacktengg
Copy link
Contributor

run performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants