Skip to content

Implement extra resources for test actions #11963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

exoson
Copy link
Contributor

@exoson exoson commented Aug 18, 2020

Adds support for specifying extra resources available for the bazel
invocation and requirement for test actions. Using
--local_extra_resources=<resource name>=<amount of resource> one can
specify the resources available for tests. Specifying a tag
"resources:<resource name>:<amount of resources>" for a test action will
tell bazel how much of the said resource this test will require. Bazel
will then limit concurrently running test actions using the same
resource if there isn't enough of said resource available.

This can be used to arbitrate access to additional resources such as GPUs or embedded developer boards.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

Adds support for specifying extra resources available for the bazel
invocation and requirement for test actions. Using
--local_extra_resources=<resource name>=<amount of resource> one can
specify the resources available for tests. Specifying a tag
"resources:<resource name>:<amount of resources>" for a test action will
tell bazel how much of the said resource this test will require. Bazel
will then limit concurrently running test actions using the same
resource if there isn't enough of said resource available.
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@philwo philwo requested a review from gregestren October 21, 2020 12:08
@philwo philwo added team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request labels Oct 21, 2020
@philwo
Copy link
Member

philwo commented Oct 21, 2020

I think this is pretty cool. :)

@EricCousineau-TRI
Copy link
Contributor

EricCousineau-TRI commented Feb 26, 2021

Does this relate #6477 at all?
EDIT: Ah, prolly not. That issue's for genrule things.

@exoson
Copy link
Contributor Author

exoson commented Feb 26, 2021

Does this relate #6477 at all?
EDIT: Ah, prolly not. That issue's for genrule things.

It seems to me that these are quite similar in the things they achieve. Or at least some of the proposed things.

@gregestren
Copy link
Contributor

This PR has been sitting around for a while and deserves proper attention, apologies.

I don't have enough expertise to feel I know how to review this well. But I should be able to route it to the right place. I'm going to chat with people to get this moving. Stay tuned.

effectTags = {OptionEffectTag.UNKNOWN},
allowMultiple = true,
help =
"Explicitly set the number of extra recourses available to Bazel. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo in resources

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

+ "types of extra resources. Bazel will limit concurrently running test actions "
+ "based on the available extra resources and the extra resources required "
+ "by the test actions. "
+ "Note: This is a no-op if --local_resources is set.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether Bazel should fail when both are specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter, --local_resources has been removed since the revision this is based on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@meisterT meisterT requested review from larsrc-google and wilwell May 14, 2021 06:44
/** How many extra resources an action requires for execution. */
public static final ParseableRequirement RESOURCES =
ParseableRequirement.create(
"resources:<str>:<int>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an int rather than a float? CPU resources are already floats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to match the cpu:<int> code above. Looks like there is some inconsistency around what can be a float and what can be an int:

  • Execution requirements: src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java parses int.
  • Test tags: src/main/java/com/google/devtools/build/lib/analysis/test/TestTargetProperties.java parses Float.
  • Command line flags: src/main/java/com/google/devtools/build/lib/util/ResourceConverter.java parses int or a Float in the expression multiplier:
  /** Description of the accepted inputs to the converter. */
  public static final String FLAG_SYNTAX =
      "an integer, or a keyword (\"auto\", \"HOST_CPUS\", \"HOST_RAM\"), optionally followed by "
          + "an operation ([-|*]<float>) eg. \"auto\", \"HOST_CPUS*.5\"";

In its current form, this change is matching the existing code: integers in execution requirements, floats in test tags, and integer on the command line. But I'm open to suggestions if we want to use slightly different conventions than what is used for CPU resources.


// De-and-reserialize & compare to only allow canonical integer formats.
if (!Integer.toString(value).equals(resourceCount)) {
return "must be in canonical format (e.g. '4' instead of '+04')";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must it be in canonical format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were just trying to match behavior of the cpu:N tag validation above. Perhaps there's a reason why that validation was originally added for the cpu:N tag...?

return "can't be zero or negative";
}

return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to return value here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a validator lambda that should return null if validation passes.

@@ -322,6 +343,13 @@ private synchronized CountDownLatch acquire(ResourceSet resources) {
private synchronized boolean release(ResourceSet resources) {
usedCpu -= resources.getCpuUsage();
usedRam -= resources.getMemoryMb();

for (Map.Entry<String, Float> resource : resources.getExtraResourceUsage().entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would look nicer as a forEach().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -333,6 +361,19 @@ private synchronized boolean release(ResourceSet resources) {
if (usedRam < epsilon) {
usedRam = 0;
}

Set<String> toRemove = new HashSet<>();
for (Map.Entry<String, Float> resource : usedExtraResources.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use forEach here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -150,23 +151,21 @@ public ResourceSet getLocalResourceUsage(Label label, boolean usingLocalTestJobs
ResourceSet testResourcesFromSize = TestTargetProperties.getResourceSetFromSize(size);

// Tests can override their CPU reservation with a "cpus:<n>" tag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are a sign that the relevant portions would prefer to live in separate functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@Override
public Map.Entry<String, Float> convert(String input) throws OptionsParsingException {
int pos = input.indexOf("=");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use a static Splitter instead of hand-coding. Alternatively, make this a subclass of AssignmentConverter. Or even better, make AssignmentConverter a generic method with a converter lambda.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

+ "types of extra resources. Bazel will limit concurrently running test actions "
+ "based on the available extra resources and the extra resources required "
+ "by the test actions. "
+ "Note: This is a no-op if --local_resources is set.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter, --local_resources has been removed since the revision this is based on.

effectTags = {OptionEffectTag.UNKNOWN},
allowMultiple = true,
help =
"Explicitly set the number of extra recourses available to Bazel. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to mention that CPU, RAM, and tests cannot be specified here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -419,7 +419,7 @@ private Spawn newCustomSpawn(String mnemonic, ImmutableMap<String, String> execu
executionInfo,
EmptyRunfilesSupplier.INSTANCE,
action,
ResourceSet.create(1, 0, 0));
ResourceSet.create(1, 0, null, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of these tests could use the two-parameter overload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like many actually can't, because they also specify a nonzero localTestCount. In fact, looks like this was the only place where I could use ResourceSet.createWithRamCpu instead.

@katre katre added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Configurability platforms, toolchains, cquery, select(), config transitions labels Sep 8, 2021
@katre
Copy link
Member

katre commented Sep 8, 2021

Is this still under review? What's the status?

@gregestren gregestren removed their assignment Sep 8, 2021
Copy link
Contributor

@scele scele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exoson is no longer working with us, I'm picking this up.

A new PR is here: #13996 - I think this one could be closed.
I've tried to address all comments from @larsrc-google in the new PR.

return "can't be zero or negative";
}

return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a validator lambda that should return null if validation passes.


// De-and-reserialize & compare to only allow canonical integer formats.
if (!Integer.toString(value).equals(resourceCount)) {
return "must be in canonical format (e.g. '4' instead of '+04')";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were just trying to match behavior of the cpu:N tag validation above. Perhaps there's a reason why that validation was originally added for the cpu:N tag...?

/** How many extra resources an action requires for execution. */
public static final ParseableRequirement RESOURCES =
ParseableRequirement.create(
"resources:<str>:<int>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to match the cpu:<int> code above. Looks like there is some inconsistency around what can be a float and what can be an int:

  • Execution requirements: src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java parses int.
  • Test tags: src/main/java/com/google/devtools/build/lib/analysis/test/TestTargetProperties.java parses Float.
  • Command line flags: src/main/java/com/google/devtools/build/lib/util/ResourceConverter.java parses int or a Float in the expression multiplier:
  /** Description of the accepted inputs to the converter. */
  public static final String FLAG_SYNTAX =
      "an integer, or a keyword (\"auto\", \"HOST_CPUS\", \"HOST_RAM\"), optionally followed by "
          + "an operation ([-|*]<float>) eg. \"auto\", \"HOST_CPUS*.5\"";

In its current form, this change is matching the existing code: integers in execution requirements, floats in test tags, and integer on the command line. But I'm open to suggestions if we want to use slightly different conventions than what is used for CPU resources.

@@ -135,7 +159,7 @@ public ResourceSet convert(String input) throws OptionsParsingException {
if (memoryMb <= 0.0 || cpuUsage <= 0.0) {
throw new OptionsParsingException("All resource values must be positive");
}
return create(memoryMb, cpuUsage, Integer.MAX_VALUE);
return create(memoryMb, cpuUsage, null, Integer.MAX_VALUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean supporting --local_extra_resources=foo=2,bar=4? If so, can't we already get the same behavior just with --local_extra_resources=foo=2 --local_extra_resources=bar=4?

effectTags = {OptionEffectTag.UNKNOWN},
allowMultiple = true,
help =
"Explicitly set the number of extra recourses available to Bazel. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -333,6 +361,19 @@ private synchronized boolean release(ResourceSet resources) {
if (usedRam < epsilon) {
usedRam = 0;
}

Set<String> toRemove = new HashSet<>();
for (Map.Entry<String, Float> resource : usedExtraResources.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

this.memoryMb = memoryMb;
this.cpuUsage = cpuUsage;
if (extraResourceUsage == null) {
this.extraResourceUsage = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest version I converted the extraResourceUsage from Map to an ImmutableMap, and using ImmutableMap.of() here.

/** Returns the local test count used. */
public int getLocalTestCount() {
return localTestCount;
}

@Override
public String toString() {
StringBuilder sb = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -150,23 +151,21 @@ public ResourceSet getLocalResourceUsage(Label label, boolean usingLocalTestJobs
ResourceSet testResourcesFromSize = TestTargetProperties.getResourceSetFromSize(size);

// Tests can override their CPU reservation with a "cpus:<n>" tag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -419,7 +419,7 @@ private Spawn newCustomSpawn(String mnemonic, ImmutableMap<String, String> execu
executionInfo,
EmptyRunfilesSupplier.INSTANCE,
action,
ResourceSet.create(1, 0, 0));
ResourceSet.create(1, 0, null, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like many actually can't, because they also specify a nonzero localTestCount. In fact, looks like this was the only place where I could use ResourceSet.createWithRamCpu instead.

@sgowroji
Copy link
Member

We are closing this PR with respect to the comment mentioned over here #11963 (review) and also the new reference PR (#13996) . Please feel free to reach to reopen or for any further update. Thanks!

@sgowroji sgowroji closed this Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants