-
Notifications
You must be signed in to change notification settings - Fork 242
Wire up CRAM 3.1 codecs for reading. #1736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
773b235
to
c6c7ce5
Compare
fd4e834
to
68c3b8a
Compare
f012878
to
d89abba
Compare
1a5dc3a
to
79792f2
Compare
840fc9d
to
c8ba753
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad Some comments. One potential bug and a request to rename the new build commands so it's clear they're for a special test build.
Dockerfile
Outdated
@@ -0,0 +1,13 @@ | |||
# This docker build is here to enable Carrot tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there's a way to have carrot build a non-default docker image, but if that's possible I would call this something like CarrotDockerFile so people don't naively try to use it for non-carrot things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit removed, to be revived on the write branch.
build.gradle
Outdated
@@ -79,6 +79,21 @@ jar { | |||
} | |||
} | |||
|
|||
// A shadow jar that includes all dependencies PLUS test code, for use with carrot tests | |||
shadowJar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make separate carrotShadowJar task instead of using the default. Again, people might just run shadowjar and assume it's the standard htsjdk thing.
* the newest version, since it has no write implementation yet. | ||
*/ | ||
@Override | ||
protected List<ReadsCodec> filterByVersion(final List<ReadsCodec> candidateCodecs, final HtsVersion htsVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but I wonder if we should make this something that exists generically.
return buffer.array(); | ||
} | ||
|
||
final byte[] bytes = new byte[buffer.remaining()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably am misunderstanding, but doesn't remaining return the length between the current position and the end? Isn't what you want the length from arrayOffset() to limit()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, very good catch. Fixed.
@@ -57,6 +64,9 @@ public ExternalCompressor getCompressorForMethod( | |||
final int compressorSpecificArg) { | |||
switch (compressionMethod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a switch expression which lets you drop the default condition. It's fine though.
sharedRANSNx16Decode = new RANSNx16Decode(); | ||
} | ||
compressorCache.put( | ||
new Tuple(BlockCompressionMethod.RANSNx16, ransArg), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Tuple<>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be the sort of place where a private CompressorConfig record type could clarify things instead of the awkward tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -56,6 +55,8 @@ | |||
* on the corresponding reference contig. | |||
*/ | |||
public class Slice { | |||
private final CRAMVersion cramVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, can we have crams that vary version by slice?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us hope not. Its just for convenience - there have always been cases where slice needs to discriminate by version when reading a slice, but for some reason I thought I needed it when writing now too - although so far there are no references to it.
// reads file 1 (bam or cram) | ||
// reads file 2 (bam or cram) | ||
// reference | ||
public static void main(final String[] args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels awkwardly placed but doing it differently seems like more of a hassle. I think you've come up with a pretty good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually reverting the carrot changes in this branch (reverting the one commit) since its not complete/working anyway. I'll revive it for the write branch though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad Some comments. One potential bug and a request to rename the new build commands so it's clear they're for a special test build.
The release notes for the recent 4.2.0 release note that it includes CRAM 3.1 codecs due to #1714. However it appears that, without this PR's addition to CramVersions.java in particular, the recent release remains unable to read CRAM v3.1 files. |
@jmarshall Ah, sorry for the confusion. It's literally just the codecs in that PR. The wiring isn't in yet. It's coming soon. |
No confusion 😄 I just wanted to make sure that this PR hadn't been inadvertently omitted from the release or anything similar. Particularly in the light of samtools/htslib#1900 and the related release notes for the (imminent?) upcoming htslib/samtools releases — cf samtools/htslib#1898 first paragraph of the addition. |
c8ba753
to
3362a10
Compare
This reverts commit c8ba753.
3362a10
to
144f5f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I'm merging!
A few notes:
FASTACodecV1_0
changes are to fix an issue that came up when I added the (beta api) CRAM 3.1 codec and a corresponding test; the codec did not properly handle gzipped fasta files.NameTokenisationEncoder
andEncodeToken
class changes are to fix an issue discovered on the separate CRAM 3.1 writer branch.