Skip to content

Enable Requester Pays support #5140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 4, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ public abstract class CommandLineProgram implements CommandLinePluginProvider {
@Argument(fullName = StandardArgumentDefinitions.NIO_MAX_REOPENS_LONG_NAME, shortName = StandardArgumentDefinitions.NIO_MAX_REOPENS_SHORT_NAME, doc = "If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection", optional = true)
public int NIO_MAX_REOPENS = ConfigFactory.getInstance().getGATKConfig().gcsMaxRetries();

@Argument(fullName = StandardArgumentDefinitions.NIO_PROJECT_FOR_REQUESTER_PAYS_LONG_NAME, shortName = StandardArgumentDefinitions.NIO_PROJECT_FOR_REQUESTER_PAYS_SHORT_NAME, doc = "Project to bill when accessing \"requester pays\" buckets. If unset, these buckets cannot be accessed.", optional = true)
public String NIO_PROJECT_FOR_REQUESTER_PAYS = ConfigFactory.getInstance().getGATKConfig().gcsProjectForRequesterPays();

// This option is here for documentation completeness.
// This is actually parsed out in Main to initialize configuration files because
// we need to have the configuration completely set up before we create our CommandLinePrograms.
Expand Down Expand Up @@ -176,7 +179,7 @@ public Object instanceMainPostParseArgs() {
BlockGunzipper.setDefaultInflaterFactory(new IntelInflaterFactory());
}

BucketUtils.setGlobalNIODefaultOptions(NIO_MAX_REOPENS);
BucketUtils.setGlobalNIODefaultOptions(NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS);

if (!QUIET) {
printStartupMessage(startDateTime);
Expand Down Expand Up @@ -435,7 +438,7 @@ protected void printSettings() {
final boolean usingIntelInflater = (BlockGunzipper.getDefaultInflaterFactory() instanceof IntelInflaterFactory && ((IntelInflaterFactory)BlockGunzipper.getDefaultInflaterFactory()).usingIntelInflater());
logger.info("Inflater: " + (usingIntelInflater ? "IntelInflater": "JdkInflater"));

logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS).maxChannelReopens());
logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS, "").maxChannelReopens());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a second logger message here indicating whether requester pays is enabled/disabled, and if it's enabled the project that will be billed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,6 @@ private StandardArgumentDefinitions(){}
public static final String USE_JDK_INFLATER_SHORT_NAME = "jdk-inflater";
public static final String NIO_MAX_REOPENS_LONG_NAME = "gcs-max-retries";
public static final String NIO_MAX_REOPENS_SHORT_NAME = "gcs-retries";
public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_LONG_NAME = "project-for-requester-pays";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gcs-project-for-requester-pays would be better, and more consistent with the existing retry args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_SHORT_NAME = "requester-project";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would eliminate the short name, and just always require the long name for this argument. I think it's important for the word "pays" to always be present :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,9 @@ public interface GATKConfig extends Mutable, Accessible {
@DefaultValue("20")
int gcsMaxRetries();

@DefaultValue("")
String gcsProjectForRequesterPays();

@DefaultValue("true")
boolean createOutputBamIndex();
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@
import com.google.cloud.http.HttpTransportOptions;
import com.google.cloud.storage.StorageOptions;
import com.google.cloud.storage.contrib.nio.CloudStorageConfiguration;
import com.google.cloud.storage.contrib.nio.CloudStorageConfiguration.Builder;
import com.google.cloud.storage.contrib.nio.CloudStorageFileSystem;
import com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider;
import com.google.common.base.Strings;
import com.google.common.io.ByteStreams;
import htsjdk.samtools.util.IOUtil;
import htsjdk.samtools.util.RuntimeIOException;
import htsjdk.tribble.AbstractFeatureReader;
import htsjdk.tribble.Tribble;
import htsjdk.tribble.util.TabixUtils;
import org.apache.hadoop.conf.Configuration;
Expand All @@ -19,7 +20,6 @@
import org.apache.logging.log4j.Logger;
import org.broadinstitute.hellbender.exceptions.UserException;
import org.broadinstitute.hellbender.utils.Utils;
import org.broadinstitute.hellbender.utils.config.ConfigFactory;
import org.broadinstitute.hellbender.utils.io.IOUtils;
import shaded.cloud_nio.com.google.api.gax.retrying.RetrySettings;
import shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials;
Expand Down Expand Up @@ -343,20 +343,16 @@ public static String getBucket(String path) {
*/
public static String getPathWithoutBucket(String path) {
final String[] split = path.split("/");
final String BUCKET = split[2];
return String.join("/", Arrays.copyOfRange(split, 3, split.length));

}

/**
* Sets NIO_MAX_REOPENS and generous timeouts as the global default.
* Sets max_reopens, requester_pays, and generous timeouts as the global default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc for the maxReopens and requesterProject args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* These will apply even to library code that creates its own paths to access with NIO.
*/
public static void setGlobalNIODefaultOptions() {
setGlobalNIODefaultOptions(ConfigFactory.getInstance().getGATKConfig().gcsMaxRetries());
}
public static void setGlobalNIODefaultOptions(int maxReopens) {
CloudStorageFileSystemProvider.setDefaultCloudStorageConfiguration(getCloudStorageConfiguration(maxReopens));
public static void setGlobalNIODefaultOptions(int maxReopens, String requesterProject) {
CloudStorageFileSystemProvider.setDefaultCloudStorageConfiguration(getCloudStorageConfiguration(maxReopens, requesterProject));
CloudStorageFileSystemProvider.setStorageOptions(setGenerousTimeouts(StorageOptions.newBuilder()).build());
}

Expand All @@ -374,11 +370,15 @@ public static java.nio.file.Path getPathOnGcs(String gcsUrl) {
}

/** The config we want to use. **/
public static CloudStorageConfiguration getCloudStorageConfiguration(int maxReopens) {
return CloudStorageConfiguration.builder()
public static CloudStorageConfiguration getCloudStorageConfiguration(int maxReopens, String requesterProject) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc for the two method args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Builder builder = CloudStorageConfiguration.builder()
// if the channel errors out, re-open up to this many times
.maxChannelReopens(maxReopens)
.build();
.maxChannelReopens(maxReopens);
if (!Strings.isNullOrEmpty(requesterProject)) {
// enable requester pays and indicate who pays
builder = builder.autoDetectRequesterPays(true).userProject(requesterProject);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a sensible exception be thrown if the requesterProject is invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not there, no. Only when trying to access the bucket.

}
return builder.build();
}

private static StorageOptions.Builder setGenerousTimeouts(StorageOptions.Builder builder) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
public final class BucketUtilsTest extends GATKBaseTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some sensible way to unit test the requester-pays support (perhaps using a mock object)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well what we have here is parsing the command-line arguments, and setting the cloud storage configuration. The former I pretty much already trust. The latter I added a little test for.


static {
BucketUtils.setGlobalNIODefaultOptions();
BucketUtils.setGlobalNIODefaultOptions(20,"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get the defaults from the GATKConfig instead of hardcoding them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

}

@Test(groups={"bucket"})
Expand Down