Cpu of workers not taken into account #3688

abergmeier-dsfishlabs · 2017-09-05T12:17:37Z

Description of the problem / feature request / question:

I have 2 types of workers, which I specify as:

ctx.action(
    execution_requirements = {
        "supports-workers": "1",
        "cpu:3": "", # Foo uses up to 3 threads
    },
    mnemonic = "Foo",

and

ctx.action(
    execution_requirements = {
        "supports-workers": "1",
        "cpu:1": "", # Bar is single threaded
    },
    mnemonic = "Bar",

My machine has 4 cores. So I would expect Bazel to try to get load average very near (possibly a bit lower) to 4.
When I trigger > 4 targets of Foo + > 4 targets of Bar, what I actually see is the average being ~12 (assuming 4 Foo workers running with 3 threads). The current running workers are 4 though. This seems to indicate that the cpu value is not properly taken into account or calculation is wrong.

If possible, provide a minimal example to reproduce the problem:

Environment info

Operating System: Ubuntu 16.04.3
Bazel version (output of bazel info release): 0.5.4

The text was updated successfully, but these errors were encountered:

meteorcloudy · 2017-09-06T09:27:35Z

@philwo , you are the worker expert, could you take look at this?

philwo · 2017-09-06T11:18:22Z

Yes, the "cpu" tag is not supported / used for anything in Bazel at the moment.

I agree that it would be a nice thing to add.

abergmeier-dsfishlabs · 2017-09-06T14:29:34Z

Yes, the "cpu" tag is not supported / used for anything in Bazel at the moment.
I agree that it would be a nice thing to add.

What strategies are you using then to prevent overtaxing of CPU?

philwo · 2017-09-06T14:34:22Z

Buy even bigger workstations 😆 No, this is actually a problem for us as well. Internally there was a request to add a flag like --worker_max_instances=Javac=2. In other words, people wanted to tweak the limit of workers per mnemonic, not only in general.

What you propose here sounds useful too, but is actually not worker specific - there is just no way to specify the resource usage of Skylark actions. They always use the default set of resources (

bazel/src/main/java/com/google/devtools/build/lib/actions/AbstractAction.java

Line 69 in 8328dc9

public static final ResourceSet DEFAULT_RESOURCE_SET =

). If the action had a higher CPU resource demand, Bazel would already do the right thing, without further changes to the worker strategy code.

Adding @laurentlb for the Skylark side.

abergmeier-dsfishlabs · 2017-09-07T08:47:35Z

Internally there was a request to add a flag like --worker_max_instances=Javac=2

We currently use worker_max_instances to at least "prevent" overtaxing. It is horrible, though, because sometimes we have a situation where e.g. Javac could use all resources since we only have Javac tasks to be executed and then it takes forever because out of 20 cores only 2 are used by these 2 workers.

Even worse for us is that we have offline Image compression, where single actions can take up to 20 minutes.

If it was up to me I would deprecate worker_max_instances.

What you propose here sounds useful too, but is actually not worker specific - there is just no way to specify the resource usage of Skylark actions.

Being able to specify a resource set from Skylark would be really good @laurentlb.
On the long run this however is also insufficient because CPU percentages do not map well between e.g. peak Hz on ARM and AMD or between heterogenic mobile processors and homogenic (probably for limited time) desktop processors.

And then I currently have tests, which need to reserve a Vulkan context.

So I would like to be able to do something like:

# Content of .bazelrc
startup --resource_available=Vulkan=2 # We have 2 Vulkan capable GPUs available on that machine
startup --resource_available=Ram=1024MB # Executables may take up to 1024MB of RAM
startup --resource_available=Cpu=1.4GHz*2 # 2 Slow ARM cores available
startup --resource_available=Cpu=2.2GHz*2 # 2 Fast ARM cores available
startup --resource_available=Network=1GB/s*2 # 2 Network cards with 1GB each

and:

# Content of .bzl
ctx.action(
    executable = image_compressor,
    execution_requirements = {
        "//resources:Vulkan": 1,
        "//resources:Ram": "512MB",
        "//resources:Cpu": "0.6GHz*1",
    },
)

or maybe better:

# Content of .bzl
needed_resources = ctx.actions.resources()
needed_resources.put("Vulkan", 1)
needed_resources.put("Ram", ctx.resources.mb(512))
needed_resources.put("Cpu", ctx.resources.ghz(0.6) * 1) # "Multiplication" should be optional

ctx.action(
    executable = image_compressor,
    execution_requirements = {
    } + needed_resources,
)

EricCousineau-TRI · 2019-11-26T16:49:22Z

Dumb question:
Test rules now seem to respect the cpu:n tags, per the docs and from testing it directly.

Is there any update on supporting cpu:n on actions / genrules? Is it still a no-op?

Also, I assume this issue is related: #6477

kastiglione · 2019-12-30T16:57:54Z

@EricCousineau-TRI Last night I did some tests of our build with this patch that adds cpu:n support to Starlark actions: kastiglione@0a1e78b

the build times weren't better, I don't know if I'll continue with it.

jmmv · 2020-05-14T14:47:26Z

I think we can say that this is a dup of #6477, which tracks the more general request of exposing resource sets to Starlark.

meteorcloudy added category: performance P2 We'll consider working on this in future. (Assignee optional) under investigation labels Sep 6, 2017

philwo added the type: feature request label Sep 6, 2017

meisterT added team-Execution and removed category: performance labels Nov 29, 2018

jin added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Execution labels Jan 14, 2019

jwnimmer-tri mentioned this issue May 1, 2020

Limit CPU usage of actions (or all sandboxed operations) #11275

Closed

jmmv closed this as completed May 14, 2020

jmmv added the duplicate label May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cpu of workers not taken into account #3688

Cpu of workers not taken into account #3688

abergmeier-dsfishlabs commented Sep 5, 2017 •

edited

Loading

meteorcloudy commented Sep 6, 2017

Uh oh!

philwo commented Sep 6, 2017

Uh oh!

abergmeier-dsfishlabs commented Sep 6, 2017

Uh oh!

philwo commented Sep 6, 2017

Uh oh!

abergmeier-dsfishlabs commented Sep 7, 2017 •

edited

Loading

Uh oh!

EricCousineau-TRI commented Nov 26, 2019

Uh oh!

kastiglione commented Dec 30, 2019 •

edited

Loading

Uh oh!

jmmv commented May 14, 2020

Uh oh!

Cpu of workers not taken into account #3688

Cpu of workers not taken into account #3688

Comments

abergmeier-dsfishlabs commented Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the problem / feature request / question:

If possible, provide a minimal example to reproduce the problem:

Environment info

meteorcloudy commented Sep 6, 2017

Uh oh!

philwo commented Sep 6, 2017

Uh oh!

abergmeier-dsfishlabs commented Sep 6, 2017

Uh oh!

philwo commented Sep 6, 2017

Uh oh!

abergmeier-dsfishlabs commented Sep 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EricCousineau-TRI commented Nov 26, 2019

Uh oh!

kastiglione commented Dec 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmmv commented May 14, 2020

Uh oh!

abergmeier-dsfishlabs commented Sep 5, 2017 •

edited

Loading

abergmeier-dsfishlabs commented Sep 7, 2017 •

edited

Loading

kastiglione commented Dec 30, 2019 •

edited

Loading