Skip to content

Commit 9c8cf81

Browse files
authored
perf: Use samtools collate in fastq separate wrapper (snakemake#2960)
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).
1 parent 3c86bae commit 9c8cf81

File tree

2 files changed

+18
-26
lines changed

2 files changed

+18
-26
lines changed

bio/samtools/fastq/separate/test/Snakefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ rule samtools_fastq_separate:
77
log:
88
"{sample}.separate.log",
99
params:
10-
sort="-m 4G",
10+
collate="",
1111
fastq="-n",
1212
# Remember, this is the number of samtools' additional threads. At least 2 threads have to be requested on cluster sumbission. This value - 2 will be sent to samtools sort -@ argument.
1313
threads: 3

bio/samtools/fastq/separate/wrapper.py

Lines changed: 17 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -10,37 +10,29 @@
1010
from snakemake.shell import shell
1111
from snakemake_wrapper_utils.snakemake import get_mem
1212

13-
params_sort = snakemake.params.get("sort", "")
13+
params_collate = snakemake.params.get("collate", "")
1414
params_fastq = snakemake.params.get("fastq", "")
1515
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
1616

1717
# Samtools takes additional threads through its option -@
18-
# One thread is used bu Samtools sort
18+
# One thread is used by Samtools collate
1919
# One thread is used by Samtools fastq
2020
# So snakemake.threads has to take them into account
2121
# before allowing additional threads through samtools sort -@
2222
threads = 0 if snakemake.threads <= 2 else snakemake.threads - 2
2323

24-
mem = get_mem(snakemake, "MiB")
25-
mem = "-m {0:.0f}M".format(mem / threads) if mem and threads else ""
26-
27-
with tempfile.TemporaryDirectory() as tmpdir:
28-
tmp_prefix = Path(tmpdir) / "samtools_fastq.sort"
29-
30-
shell(
31-
"(samtools sort -n"
32-
" --threads {threads}"
33-
" {mem}"
34-
" -T {tmp_prefix}"
35-
" {params_sort}"
36-
" {snakemake.input[0]} | "
37-
"samtools fastq"
38-
" {params_fastq}"
39-
" -1 {snakemake.output[0]}"
40-
" -2 {snakemake.output[1]}"
41-
" -0 /dev/null"
42-
" -s /dev/null"
43-
" -F 0x900"
44-
" - "
45-
") {log}"
46-
)
24+
shell(
25+
"(samtools collate -u -O"
26+
" --threads {threads}"
27+
" {params_collate}"
28+
" {snakemake.input[0]} | "
29+
"samtools fastq"
30+
" {params_fastq}"
31+
" -1 {snakemake.output[0]}"
32+
" -2 {snakemake.output[1]}"
33+
" -0 /dev/null"
34+
" -s /dev/null"
35+
" -F 0x900"
36+
" - "
37+
") {log}"
38+
)

0 commit comments

Comments
 (0)