You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: ensure reserved memory for computing tasks on compute node starting (#7670)
The total memory of a CN consists of:
1. computing memory (both stream & batch)
2. storage memory (block cache, meta cache, etc.)
3. memory for system usage
That is to say, we have **_CN total memory_ = _computing memory_ + _storage memory_ + _system memory_**, and both _CN total memory_ and _storage memory_ are configured by the user currently. This PR is to ensure that _computing memory_ and _system memory_ are correctly reserved,, i.e. **_computing memory_ + _system memory_ = _CN total memory_ - _storage memory_ > a given amount of memory**. We set this "given amount of memory" as 1G for now (512M for computing and 512M for system). The check is performed on CN starting.
Approved-By: fuyufjh
Approved-By: hzxa21
/// Check whether the compute node has enough memory to perform computing tasks. Apart from storage,
332
+
/// it must reserve at least `MIN_COMPUTE_MEMORY_MB` for computing and `SYSTEM_RESERVED_MEMORY_MB`
333
+
/// for other system usage. Otherwise, it is not allowed to start.
334
+
fnvalidate_compute_node_memory_config(
335
+
cn_total_memory_bytes:usize,
336
+
storage_config:&StorageConfig,
337
+
){
338
+
let storage_memory_mb = storage_config.total_storage_memory_limit_mb();
339
+
if storage_memory_mb << 20 > cn_total_memory_bytes {
340
+
panic!(
341
+
"The storage memory exceeds the total compute node memory:\nTotal compute node memory: {}\nStorage memory: {}\nAt least 1 GB memory should be reserved apart from the storage memory. Please increase the total compute node memory or decrease the storage memory in configurations and restart the compute node.",
"No enough memory for computing and other system usage:\nTotal compute node memory: {}\nStorage memory: {}\nAt least 1 GB memory should be reserved apart from the storage memory. Please increase the total compute node memory or decrease the storage memory in configurations and restart the compute node.",
0 commit comments