Skip to content

Cpu of workers not taken into account #3688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
abergmeier-dsfishlabs opened this issue Sep 5, 2017 · 8 comments
Closed

Cpu of workers not taken into account #3688

abergmeier-dsfishlabs opened this issue Sep 5, 2017 · 8 comments
Labels
duplicate P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request under investigation

Comments

@abergmeier-dsfishlabs
Copy link
Contributor

abergmeier-dsfishlabs commented Sep 5, 2017

Description of the problem / feature request / question:

I have 2 types of workers, which I specify as:

ctx.action(
    execution_requirements = {
        "supports-workers": "1",
        "cpu:3": "", # Foo uses up to 3 threads
    },
    mnemonic = "Foo",

and

ctx.action(
    execution_requirements = {
        "supports-workers": "1",
        "cpu:1": "", # Bar is single threaded
    },
    mnemonic = "Bar",

My machine has 4 cores. So I would expect Bazel to try to get load average very near (possibly a bit lower) to 4.
When I trigger > 4 targets of Foo + > 4 targets of Bar, what I actually see is the average being ~12 (assuming 4 Foo workers running with 3 threads). The current running workers are 4 though. This seems to indicate that the cpu value is not properly taken into account or calculation is wrong.

If possible, provide a minimal example to reproduce the problem:

Environment info

  • Operating System: Ubuntu 16.04.3
  • Bazel version (output of bazel info release): 0.5.4
@meteorcloudy meteorcloudy added category: performance P2 We'll consider working on this in future. (Assignee optional) under investigation labels Sep 6, 2017
@meteorcloudy
Copy link
Member

@philwo , you are the worker expert, could you take look at this?

@philwo
Copy link
Member

philwo commented Sep 6, 2017

Yes, the "cpu" tag is not supported / used for anything in Bazel at the moment.

I agree that it would be a nice thing to add.

@abergmeier-dsfishlabs
Copy link
Contributor Author

Yes, the "cpu" tag is not supported / used for anything in Bazel at the moment.
I agree that it would be a nice thing to add.

What strategies are you using then to prevent overtaxing of CPU?

@philwo
Copy link
Member

philwo commented Sep 6, 2017

Buy even bigger workstations 😆 No, this is actually a problem for us as well. Internally there was a request to add a flag like --worker_max_instances=Javac=2. In other words, people wanted to tweak the limit of workers per mnemonic, not only in general.

What you propose here sounds useful too, but is actually not worker specific - there is just no way to specify the resource usage of Skylark actions. They always use the default set of resources (

public static final ResourceSet DEFAULT_RESOURCE_SET =
). If the action had a higher CPU resource demand, Bazel would already do the right thing, without further changes to the worker strategy code.

Adding @laurentlb for the Skylark side.

@abergmeier-dsfishlabs
Copy link
Contributor Author

abergmeier-dsfishlabs commented Sep 7, 2017

Internally there was a request to add a flag like --worker_max_instances=Javac=2

We currently use worker_max_instances to at least "prevent" overtaxing. It is horrible, though, because sometimes we have a situation where e.g. Javac could use all resources since we only have Javac tasks to be executed and then it takes forever because out of 20 cores only 2 are used by these 2 workers.

Even worse for us is that we have offline Image compression, where single actions can take up to 20 minutes.

If it was up to me I would deprecate worker_max_instances.

What you propose here sounds useful too, but is actually not worker specific - there is just no way to specify the resource usage of Skylark actions.

Being able to specify a resource set from Skylark would be really good @laurentlb.
On the long run this however is also insufficient because CPU percentages do not map well between e.g. peak Hz on ARM and AMD or between heterogenic mobile processors and homogenic (probably for limited time) desktop processors.

And then I currently have tests, which need to reserve a Vulkan context.

So I would like to be able to do something like:

# Content of .bazelrc
startup --resource_available=Vulkan=2 # We have 2 Vulkan capable GPUs available on that machine
startup --resource_available=Ram=1024MB # Executables may take up to 1024MB of RAM
startup --resource_available=Cpu=1.4GHz*2 # 2 Slow ARM cores available
startup --resource_available=Cpu=2.2GHz*2 # 2 Fast ARM cores available
startup --resource_available=Network=1GB/s*2 # 2 Network cards with 1GB each

and:

# Content of .bzl
ctx.action(
    executable = image_compressor,
    execution_requirements = {
        "//resources:Vulkan": 1,
        "//resources:Ram": "512MB",
        "//resources:Cpu": "0.6GHz*1",
    },
)

or maybe better:

# Content of .bzl
needed_resources = ctx.actions.resources()
needed_resources.put("Vulkan", 1)
needed_resources.put("Ram", ctx.resources.mb(512))
needed_resources.put("Cpu", ctx.resources.ghz(0.6) * 1) # "Multiplication" should be optional

ctx.action(
    executable = image_compressor,
    execution_requirements = {
    } + needed_resources,
)

@jin jin added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Execution labels Jan 14, 2019
@EricCousineau-TRI
Copy link
Contributor

Dumb question:
Test rules now seem to respect the cpu:n tags, per the docs and from testing it directly.

Is there any update on supporting cpu:n on actions / genrules? Is it still a no-op?

Also, I assume this issue is related: #6477

@kastiglione
Copy link
Contributor

kastiglione commented Dec 30, 2019

@EricCousineau-TRI Last night I did some tests of our build with this patch that adds cpu:n support to Starlark actions: kastiglione@0a1e78b

the build times weren't better, I don't know if I'll continue with it.

@jmmv
Copy link
Contributor

jmmv commented May 14, 2020

I think we can say that this is a dup of #6477, which tracks the more general request of exposing resource sets to Starlark.

@jmmv jmmv closed this as completed May 14, 2020
@jmmv jmmv added the duplicate label May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request under investigation
Projects
None yet
Development

No branches or pull requests

9 participants