-
Notifications
You must be signed in to change notification settings - Fork 602
Enable Requester Pays support #5140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Google Cloud Storage has "requester pays" buckets. When reading from those buckets, the requester is billed for the bandwidth (normally, it's the bucket owner who is billed). This pull request enables this feature with GATK. By default it is turned off (so there are no unexpected charges). To turn it on, use the command line argument: "--project-for-requester-pays" (or "--requester-project"). Example: $ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam fails with: com.google.cloud.storage.StorageException: Bucket is requester pays bucket but no user project provided. $ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam --requester-project=$PROJECT works This PR also removes the argumentless version of setGlobalNIODefaultOptions() because it was confusing (it uses the default values, not those indicated by the user - usually we'd expect the user's values to be taken into account).
This will address issue #4828. |
Codecov Report
@@ Coverage Diff @@
## master #5140 +/- ##
===============================================
+ Coverage 85.568% 86.814% +1.247%
- Complexity 28983 29935 +952
===============================================
Files 1810 1813 +3
Lines 135251 137671 +2420
Branches 15000 15393 +393
===============================================
+ Hits 115731 119518 +3787
+ Misses 14170 12682 -1488
- Partials 5350 5471 +121
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete, back to @jean-philippe-martin with some comments.
@@ -435,7 +438,7 @@ protected void printSettings() { | |||
final boolean usingIntelInflater = (BlockGunzipper.getDefaultInflaterFactory() instanceof IntelInflaterFactory && ((IntelInflaterFactory)BlockGunzipper.getDefaultInflaterFactory()).usingIntelInflater()); | |||
logger.info("Inflater: " + (usingIntelInflater ? "IntelInflater": "JdkInflater")); | |||
|
|||
logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS).maxChannelReopens()); | |||
logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS, "").maxChannelReopens()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a second logger
message here indicating whether requester pays is enabled/disabled, and if it's enabled the project that will be billed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -92,4 +92,6 @@ private StandardArgumentDefinitions(){} | |||
public static final String USE_JDK_INFLATER_SHORT_NAME = "jdk-inflater"; | |||
public static final String NIO_MAX_REOPENS_LONG_NAME = "gcs-max-retries"; | |||
public static final String NIO_MAX_REOPENS_SHORT_NAME = "gcs-retries"; | |||
public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_LONG_NAME = "project-for-requester-pays"; | |||
public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_SHORT_NAME = "requester-project"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would eliminate the short name, and just always require the long name for this argument. I think it's important for the word "pays" to always be present :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return String.join("/", Arrays.copyOfRange(split, 3, split.length)); | ||
|
||
} | ||
|
||
/** | ||
* Sets NIO_MAX_REOPENS and generous timeouts as the global default. | ||
* Sets max_reopens, requester_pays, and generous timeouts as the global default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add javadoc for the maxReopens
and requesterProject
args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.maxChannelReopens(maxReopens); | ||
if (!Strings.isNullOrEmpty(requesterProject)) { | ||
// enable requester pays and indicate who pays | ||
builder = builder.autoDetectRequesterPays(true).userProject(requesterProject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will a sensible exception be thrown if the requesterProject
is invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not there, no. Only when trying to access the bucket.
@@ -374,11 +370,15 @@ public static void setGlobalNIODefaultOptions(int maxReopens) { | |||
} | |||
|
|||
/** The config we want to use. **/ | |||
public static CloudStorageConfiguration getCloudStorageConfiguration(int maxReopens) { | |||
return CloudStorageConfiguration.builder() | |||
public static CloudStorageConfiguration getCloudStorageConfiguration(int maxReopens, String requesterProject) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc for the two method args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -17,7 +17,7 @@ | |||
public final class BucketUtilsTest extends GATKBaseTest { | |||
|
|||
static { | |||
BucketUtils.setGlobalNIODefaultOptions(); | |||
BucketUtils.setGlobalNIODefaultOptions(20,""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get the defaults from the GATKConfig instead of hardcoding them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
@@ -92,4 +92,6 @@ private StandardArgumentDefinitions(){} | |||
public static final String USE_JDK_INFLATER_SHORT_NAME = "jdk-inflater"; | |||
public static final String NIO_MAX_REOPENS_LONG_NAME = "gcs-max-retries"; | |||
public static final String NIO_MAX_REOPENS_SHORT_NAME = "gcs-retries"; | |||
public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_LONG_NAME = "project-for-requester-pays"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think gcs-project-for-requester-pays
would be better, and more consistent with the existing retry args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done.
@@ -17,7 +17,7 @@ | |||
public final class BucketUtilsTest extends GATKBaseTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some sensible way to unit test the requester-pays support (perhaps using a mock object)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well what we have here is parsing the command-line arguments, and setting the cloud storage configuration. The former I pretty much already trust. The latter I added a little test for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks good, thanks @jean-philippe-martin !
Thank you @droazen for the review! |
Google Cloud Storage has "requester pays" buckets. When reading from
those buckets, the requester is billed for the bandwidth (normally, it's
the bucket owner who is billed).
This pull request enables this feature with GATK. By default it is
turned off (so there are no unexpected charges). To turn it on, use the
command line argument "--project-for-requester-pays" (or
"--requester-project") to indicate which project to bill.
Example:
$ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam
fails with: com.google.cloud.storage.StorageException: Bucket is requester pays bucket but no user project provided.
$ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam --requester-project=$PROJECT
works
This PR also removes the argumentless version of
setGlobalNIODefaultOptions() because it was confusing (it uses the
default values, not those indicated by the user - usually we'd expect
the user's values to be taken into account).