Skip to content

Remove usage of aria2 module #811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 28, 2025
Merged

Conversation

dialvarezs
Copy link
Contributor

Remove a overly specific and a bit ugly patch of aria2 module to download and untar the CheckM db.
Updates the module.

And also update other "util" modules:

  • cat/fastq
  • gunzip
  • untar

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@dialvarezs dialvarezs changed the base branch from main to dev May 25, 2025 20:30
@nf-core nf-core deleted a comment from github-actions bot May 25, 2025
@jfy133
Copy link
Member

jfy133 commented May 26, 2025

question
Can you explain the benefit of the aria2 patch removal bit of this PR?

(Unless I misunderstand it) You say 'unnecessary' patch but now you just move a patch to UNTAR instead.
Wouldn't it make more sense to just keep that patch in the one module to reduce the number of small jobs submitted to a cluster in this case?

@dialvarezs
Copy link
Contributor Author

dialvarezs commented May 26, 2025

IMHO, modules should do one thing, as they do in their pure form on nf-core, unless there is a clear advantage to coupling things together (for example, using a mapper like minimap2 and piping the output to samtools to get BAM or CRAM output instantly and reduce storage usage).

And if a module needs to be patched, (IMO) it should preserve its original functionality as closely as possible. This way, it can be reused in other parts of the pipeline just like the unpatched one, since we can only keep one copy in the repository. If I see "aria2", I would expect it to let me download a file, but if it also assumes that the downloaded file is a tarball and unpacks it, that breaks any use case where the downloaded file is not a tarball.

In mag we don't have many patched modules either, only this one and UNTAR (which I patched to be able to produce the BUSCO db directory without introducing another process, not in this PR). The aria2 patch in particular was very specific, it even had the directory names hardcoded, so it wouldn't be possible to use the module for anything else.

Wouldn't it make more sense to just keep that patch in the one module to reduce the number of small jobs submitted to a cluster in this case?

That's a fair point, but in this case it's only one extra job, and only if you use CheckM. I guess there are other parts of MAG where this could be significantly improved. For example, GUNZIP_BINS is run per bin and produces a lot of small jobs, we could reduce to one per sample or something like that.

@dialvarezs
Copy link
Contributor Author

But, on second thought, maybe we could just use UNTAR(file(url)) and say goodbye to the aria2 module. It's a <300 MB file, so I guess there's no drawback in letting Nextflow handle the download.

@jfy133
Copy link
Member

jfy133 commented May 26, 2025

OK fair point @dialvarezs !

I think maybe we should go with your UNTAR(file(url)) method.

I'm trying to remember why I made the standalone aria2c module... I think it was due to downloads from the server was very flaky... but generally letting NExtflow download for you remains the recommeneded method via the Nextflow devs...

@dialvarezs
Copy link
Contributor Author

@nf-core-bot fix linting

@dialvarezs dialvarezs changed the title Remove unnecessary patch for aria2 module Remove usage of aria2 module May 27, 2025
@jfy133
Copy link
Member

jfy133 commented May 28, 2025

Thanks again @dialvarezs !

@dialvarezs dialvarezs merged commit 77a6cef into nf-core:dev May 28, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants