-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](spill) spill and reserve #47462
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
TPC-H: Total hot run time: 32548 ms
|
TPC-DS: Total hot run time: 193144 ms
|
ClickBench: Total hot run time: 31.37 s
|
9dce485
to
ccc257c
Compare
run buildall |
ccc257c
to
6613edc
Compare
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 31551 ms
|
TPC-DS: Total hot run time: 190786 ms
|
ClickBench: Total hot run time: 31.46 s
|
6613edc
to
85b6160
Compare
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 31894 ms
|
TPC-DS: Total hot run time: 190712 ms
|
ClickBench: Total hot run time: 31.48 s
|
85b6160
to
2b1cf11
Compare
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 31885 ms
|
TPC-DS: Total hot run time: 190515 ms
|
2b1cf11
to
1187a61
Compare
run buildall |
TPC-H: Total hot run time: 31730 ms
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 191307 ms
|
ClickBench: Total hot run time: 31.08 s
|
97d1d4d
to
c76b210
Compare
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 31701 ms
|
TPC-DS: Total hot run time: 183923 ms
|
ClickBench: Total hot run time: 31.11 s
|
TeamCity be ut coverage result: |
"{} exceed limit {} or {} less than low water mark {}", process_memory_used_str(), | ||
MemInfo::mem_limit_str(), sys_mem_available_str(), | ||
"{} exceed limit {} or {} less than low water mark {}", | ||
process_memory_used_details_str(), MemInfo::mem_limit_str(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里用 details 的内存信息可能不合适,两个问题:1. process_limit_exceeded_errmsg_str 会输出到 mysql client 展示给用户,打出 details 会增加用户理解成过本(可能问题也不大…)。 2. mysql client 一行的长度是有限制的,太长会被截断。
是否打 details @yiguolei 大佬可以确认下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们先输出,等spill 稳定了,在改小吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO:
打details后,检查下 mysql client 是否会截断错误信息
@@ -81,6 +81,11 @@ std::shared_ptr<MemTrackerLimiter> MemTrackerLimiter::create_shared(MemTrackerLi | |||
const std::string& label, | |||
int64_t byte_limit) { | |||
auto tracker = std::make_shared<MemTrackerLimiter>(type, label, byte_limit); | |||
// Write tracker is only used to tracker the size, so limit == -1 | |||
auto write_tracker = std::make_shared<MemTrackerLimiter>(type, "Memtable" + label, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一个Load任务,会有两个 Type::Load 的 tracker 在全局的 Map 中,目前 Momory GC 中 cancel query/load 的逻辑会有问题(虽然做了容错,只会影响日志,最终行为没问题)。后续将 Momory GC 改成依赖 resource ctx 后就没问题。
另外既然一个Load任务有两个Tracker, MemoryProfile 中应该在 Load Memory 中区分 Framgnet 和 Memtable 内存
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在应该是认为这2个没关系,我们去cancel的时候也认为memtable 的buffer 部分不算到query 里的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO:
MemoryProfile 中应该在 Load Memory 中区分 Framgnet 和 Memtable 内存
@@ -278,40 +279,57 @@ inline void ThreadMemTrackerMgr::flush_untracked_mem() { | |||
_stop_consume = false; | |||
} | |||
|
|||
inline doris::Status ThreadMemTrackerMgr::try_reserve(int64_t size) { | |||
inline doris::Status ThreadMemTrackerMgr::try_reserve(int64_t size, | |||
bool only_check_process_memory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么 FlushToken::_try_reserve_memory 需要 only_check_process_memory=true 呢
只要让 FlushToken::_try_reserve_memory scoped 的 tracker limit = -1,workload group ptr = null,自然 try reserve 就只检查 process memory 了
现在为了支持 only_check_process_memory,在 memory tracker 和 work load group 分别新增了一个 reserve 方法,代码看着更乱了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacktengg 看看
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为memtable fush exectuor 是BE上所有workload group 共享的,在其中reserve的内存也是只算在整个BE process上,不单独计入某个workload group或者查询。
修改 tracker limit = -1或者workload group ptr = null,当前风险较高,先保持现状吧,后续再优化。
} | ||
// If the query is a pure load task(streamload, routine load, group commit), then it should not use | ||
// memlimit per query to limit their memory usage. | ||
if (is_pure_load_task()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在也不支持限制单个 Load 任务的 memtable 内存大小,那限制 Load 任务的内存只能用 Workload Group 了吧
用户线上经常出现各种类型的导入任务把内存打满。
} | ||
|
||
size_t minimum_operator_memory_required_bytes() const { | ||
if (_query_options.__isset.minimum_operator_memory_required_kb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
除了 join、agg、sort 外的其他算子,reserve memory 也要检查这个最小值是否满足么,这会不会增加复杂度啊,只让这3个算子检查是不是就行
be/src/runtime/thread_context.h
Outdated
return thread_mem_tracker_mgr->try_reserve(size, false); | ||
} | ||
|
||
void release_reserved_memory() const { thread_mem_tracker_mgr->shrink_reserved(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们统一叫 shrink_reserved_memory 吧
@@ -223,6 +224,21 @@ class ThreadContext { | |||
// to nullptr, but the object it points to is not initialized. At this time, when the memory | |||
// is released somewhere, the hook is triggered to cause the crash. | |||
std::unique_ptr<ThreadMemTrackerMgr> thread_mem_tracker_mgr; | |||
|
|||
[[nodiscard]] std::shared_ptr<MemTrackerLimiter> thread_mem_tracker() const { | |||
return thread_mem_tracker_mgr->limiter_mem_tracker(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之所以让 thread_mem_tracker_mgr 作为 public 属性,就是不想在 ThreadContext 中新增方法,现在外部都是直接用的 thread_context()->thread_mem_tracker_mgr->limiter_mem_tracker()
。 目的是尽量让 ThreadContext 这个类简单,后续不再修改,不过这个设计模式也值得商榷。
@@ -58,6 +61,10 @@ class IOContext : public std::enable_shared_from_this<IOContext> { | |||
shuffle_send_bytes_counter_ = ADD_COUNTER(profile_, "ShuffleSendBytes", TUnit::BYTES); | |||
shuffle_send_rows_counter_ = | |||
ADD_COUNTER(profile_, "ShuffleSendRowsCounter_", TUnit::UNIT); | |||
spill_write_bytes_to_local_storage_counter_ = | |||
ADD_COUNTER(profile_, "SpillWriteBytesToLocalStorage", TUnit::UNIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单位应该是 TUnit::BYTES
spill_write_bytes_to_local_storage_counter_ = | ||
ADD_COUNTER(profile_, "SpillWriteBytesToLocalStorage", TUnit::UNIT); | ||
spill_read_bytes_from_local_storage_counter_ = | ||
ADD_COUNTER(profile_, "SpillReadBytesFromLocalStorage", TUnit::UNIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,单位应该是 TUnit::BYTES
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 31602 ms
|
1. improve exchange sink low mem mode; 2. improve exchange memory usage counter 3. fix string hash table memory usage; 4. fix deadlock if disable_memory_gc; 5. improve some log prints.
c249993
to
ee4b33e
Compare
run buildall |
TPC-DS: Total hot run time: 183548 ms
|
TeamCity cloud ut coverage result: |
RETURN_IF_ERROR(disable_runtime_filters(state)); | ||
} | ||
} | ||
// if (state->fuzzy_disable_runtime_filter_in_be()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just delete this
LOG(WARNING) << "PipelineFragmentContext is cancelled due to timeout:"; | ||
while (pos < total_size) { | ||
tmp_size = std::min(max_log_size, total_size - pos); | ||
LOG(WARNING) << "===" << std::string(dbg_str.data() + pos, tmp_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug log need to be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is only printed when query is timeout, to help debug possible problems of query stuck. Can keep it for now.
TPC-H: Total hot run time: 31860 ms
|
TPC-DS: Total hot run time: 184587 ms
|
ClickBench: Total hot run time: 31.4 s
|
run performance |
What problem does this PR solve?
Problem Summary:
A brand-new spilling triggering strategy:
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)