Skip to content

Milestone 2.2.0 #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 23, 2025
Merged

Milestone 2.2.0 #66

merged 15 commits into from
May 23, 2025

Conversation

charles-plessy
Copy link
Collaborator

This PR adds bam and cram to the list of supported export formats. A new subworkflow takes care of ensuring that the genome file is appropriately compressed and indexed for CRAM encoding and to support the user running BAM/CRAM indexing commands later. A sequence dictionary is computed, and will be useful in a future update of last/mafconvert (PR under review).

Other updates preparing the 2.2.0 release will follow, but I thought that it would be useful to review this bunch of commits separately. In particular I welcome critical comments on how I manage the optional run of the subworkflow in workflows/pairgenomealign.nf.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/pairgenomealign branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

This is in preparation for CRAM support.
FASTA_BGZIP_INDEX_DICT_SAMTOOLS is a subworkflow that takes a FASTA file
regardless of its compression, and returns it BGZIPped together with
index files needed to sort the alignments and a sequence dictionary
needed to ensure that alignments of different _queries_ to the same
_target_ can be merged later.
Pushing this commit now to trigger a new CI run of the nf-core branch
protection.

This said, multiqc_assemblyscan_plot_data combines one file per _query_
genome, and removing the `tag` ensure that the list of file names does
not clutter the screen when monitoring the pipeline run.  The nf-core
MultiQC module also does not have a `tag`.

Closes #64
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General minor things:

  • Missing citations.md entry for SAMTOOLS
  • Possibly missing diagram update to add SAMTOOLS
  • Missing reference to SAMTOOLS on README

But otherwise code nice and clean as always, so will give you a premptive approval :)

@charles-plessy
Copy link
Collaborator Author

Thanks a lot for the very useful comments. I have added credit to Samtools and opened an issue about the diagram (#68)

@charles-plessy charles-plessy merged commit 0374473 into dev May 23, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants