-
Notifications
You must be signed in to change notification settings - Fork 1.4k
High CPU usage in scheduler #2033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When you say 'CPU in the scheduler', I assume you mean the host Linux kernel scheduler? Which platform (or platforms) do you see this behavior on? |
I mean scheduler in broader meaning. The example above uses futex syscall to wait/wakeup threads. So it involves (1) futex implementation in sentry; (2) golang sheduler; (3) maybe also linux host CFS.
kvm platorm. |
For reference, this workload tends to suffer from golang/go#43997. |
Background: channel is designed to be a data transfer pipe between go routines. A scenario in gVisor is: block/wake a G (stands for a task) by using channel, wherein there's no need for data transfer. Channel is relatively too heavy for this case. Channel manages Gs in a list, and has capacity. When the channel is full, the sender will bock, and the channel is clear, the receiver is blocked. The receiver pushed to chan list, then schedule to run next. The sender uses goready to wake Gs in the chan list. A summary on our idea: introduce a new set of APIs for goroutine wake/block. Details: We propose three new APIs: GetG() - used to get the address of goroutine. By address, find the goroutine G easier at go program. WakeG() - can be used to wake one G, which can be in running/blocked status. BlockG() - can be used to block goroutine by itself. How we use this in gVisor: In futex()/epoll_wait(), we can modify it to use the new mechanism for block and wake. Between sentry and go runtime, we maintain the status of task Gs. Let's use futex as an example, add running status at goruntime, NoWake,Waked,Blocked. At sentry, one task/G can use BlockG() to block, like <-chan. Other tasks/Gs can use WakeG() to wake the task/G which is blocked by BlockG() , like chan <-. Based on a basic prototype of Go and gVisor, we use the program in google/gvisor#2033(comment) as the test program. We can see 22% improvement by test case: google/gvisor#2033. Signed-off-by: liushi <[email protected]>
Background: channel is designed to be a data transfer pipe between go routines. A scenario in gVisor is: block/wake a G (stands for a task) by using channel, wherein there's no need for data transfer. Channel is relatively too heavy for this case. Channel manages Gs in a list, and has capacity. When the channel is full, the sender will bock, and the channel is clear, the receiver is blocked. The receiver pushed to chan list, then schedule to run next. The sender uses goready to wake Gs in the chan list. A summary on our idea: introduce a new set of APIs for goroutine wake/block. Details: We propose three new APIs: GetG() - used to get the address of goroutine. By address, find the goroutine G easier at go program. WakeG() - can be used to wake one G, which can be in running/blocked status. BlockG() - can be used to block goroutine by itself. How we use this in gVisor: GuhuangLS/gvisor@97e0e6c In futex()/epoll_wait(), we can modify it to use the new mechanism for block and wake. Between sentry and go runtime, we maintain the status of task Gs. Let's use futex as an example, add running status at goruntime, NoWake,Waked,Blocked. At sentry, one task/G can use BlockG() to block, like <-chan. Other tasks/Gs can use WakeG() to wake the task/G which is blocked by BlockG() , like chan <-. Based on a basic prototype of Go and gVisor, we use the program in google/gvisor#2033(comment) as the test program. We can see 22% improvement by test case: google/gvisor#2033. Signed-off-by: liushi <[email protected]>
A friendly reminder that this issue had no activity for 120 days. |
golang/go@ecfce58 must have helped with this. Is this still an issue? |
A friendly reminder that this issue had no activity for 120 days. |
This issue has been closed due to lack of activity. |
With below program, we find that the scheduler shows ~380% CPU usage vs ~200% in runc.
How to test:
$ docker run -it --cpu-period=1000 --cpu-quota=8000 ...
$ gcc -o threads thread.c -lpthread
$ ./threads 1024 100000
The text was updated successfully, but these errors were encountered: