Description
Is your feature request related to a problem? Please describe
I want to propose this idea and looking for some opinions from the community. Hopefully it doesn't sound terrible.
Context:
As of now, we have search and indexing workload(and others as well) running on the same node under the same process with their own threadpool defined. A lot of times certain "expensive" search queries cause CPU to spike till 100% on some/all nodes in a cluster and causes cluster instability. This can happen when all search threads seems to occupy majority/all CPU cores, also considering that we define number of search threads greater than available CPU cores on a node for performance reasons (formula: ((allocatedProcessors * 3) / 2) + 1
).
We don't have a nice way as of today for users to isolate a search workload(for example) to not take up more than 70%(lets say) CPU, thereby providing resiliency and avoid node drops.
We have had discussion around defining "search" only nodes but it is complex and might be expensive for users to maintain a different node type. Existing Search backpressure feature also doesn't solve this holistically and plus relies on a cancellation mechanism(for search tasks after CPU goes beyond X%) which has its own limitations and may not work necessarily.
Unfortunately there is no way in JAVA to isolate heap for specific group of threads, otherwise this could have been extended to JVM heap as well.
Use case:
User can say that they want to allocate 70% of CPU to search workloads via some cluster setting. And we will accordingly map search threadpool to X% CPU cores or something like that.
Describe the solution you'd like
In Java, a concept called Java affinity exists where you can map of group of threads to certain CPU cores. And it is mostly used in cases where OS's thread scheduling does not provide optimal performance. But we can use it for isolating a group of threads to only run on X cpu cores on a Y core node(for example).
There also exists a library around this which provides a way to assign a group of threads to certain CPU cores(by reading /proc/cpuinfo
) using a ThreadFactory.
I haven't done a POC yet as not sure of the feasibility.
Related component
Search:Resiliency
Describe alternatives you've considered
Have different processes for search/indexing but this seems to be way more complex.
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status