Skip to content

Commit 3c86bae

Browse files
fgvieiracmeesters
andauthored
feat: add samtools markdup wrapper (#2926)
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: Christian Meesters <[email protected]>
1 parent e10ab57 commit 3c86bae

File tree

8 files changed

+114
-3
lines changed

8 files changed

+114
-3
lines changed

bio/samtools/calmd/meta.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
name: samtools calmd
22
description: Calculates MD and NM tags.
3+
url: http://www.htslib.org/doc/samtools-calmd.html
34
authors:
45
- Filipe G. Vieira
5-
notes: |
6-
* The `extra` param allows for additional program arguments (not `-@/--threads` or `-O/--output-fmt`).
7-
* For more information see, http://www.htslib.org/doc/samtools-calmd.html
6+
input:
7+
- SAM/BAM/CRAM file
8+
output:
9+
- SAM/BAM/CRAM file
10+
- SAM/BAM/CRAM index file (optional)
11+
params:
12+
- extra: additional program arguments (not `-@/--threads`, `--write-index`, `-m`, `-o` or `-O/--output-fmt`).
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# This file may be used to create an environment using:
2+
# $ conda create --name <env> --file <this file>
3+
# platform: linux-64
4+
@EXPLICIT
5+
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
6+
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2024.2.2-hbcca054_0.conda#2f4327a1cbe7f022401b236e915a5fef
7+
https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.40-h41732ed_0.conda#7aca3059a1729aa76c597603f10b0dd3
8+
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-13.2.0-h7e041cc_5.conda#f6f6600d18a4047b54f803cf708b868a
9+
https://conda.anaconda.org/conda-forge/noarch/tzdata-2024a-h0c530f3_0.conda#161081fc7cec0bfda0d86d7cb595f8d8
10+
https://conda.anaconda.org/conda-forge/linux-64/libgomp-13.2.0-h807b86a_5.conda#d211c42b9ce49aee3734fdc828731689
11+
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
12+
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.2.0-h807b86a_5.conda#d4ff227c46917d3b4565302a2bbb276b
13+
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-hd590300_5.conda#69b8b6202a07720f448be700e300ccf4
14+
https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.28.1-hd590300_0.conda#dcde58ff9a1f30b0037a2315d1846d1f
15+
https://conda.anaconda.org/conda-forge/linux-64/keyutils-1.6.1-h166bdaf_0.tar.bz2#30186d27e2c9fa62b45fb1476b7200e3
16+
https://conda.anaconda.org/conda-forge/linux-64/libdeflate-1.20-hd590300_0.conda#8e88f9389f1165d7c0936fe40d9a9a79
17+
https://conda.anaconda.org/conda-forge/linux-64/libev-4.33-hd590300_2.conda#172bf1cd1ff8629f2b1179945ed45055
18+
https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.6.2-h59595ed_0.conda#e7ba12deb7020dd080c6c70e7b6f6a3d
19+
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.4.2-h7f98852_5.tar.bz2#d645c6d2ac96843a2bfaccd2d62b3ac3
20+
https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.1-hd590300_0.conda#30fd6e37fe21f86f4bd26d6ee73eeec7
21+
https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.38.1-h0b41bf4_0.conda#40b61aab5c7ba9ff276c41cfffe6b80b
22+
https://conda.anaconda.org/conda-forge/linux-64/libxcrypt-4.4.36-hd590300_1.conda#5aa797f8787fe7a17d1b0821485b5adc
23+
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.2.13-hd590300_5.conda#f36c115f1ee199da648e0597ec2047ad
24+
https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.4.20240210-h59595ed_0.conda#97da8860a0da5413c7c98a3b3838a645
25+
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.2.1-hd590300_1.conda#9d731343cff6ee2e5a25c4a091bf8e2a
26+
https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.6-h166bdaf_0.tar.bz2#2161070d867d1b1204ea749c8eec4ef0
27+
https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20191231-he28a2e2_2.tar.bz2#4d331e44109e3f0e19b4cb8f9b82f3e1
28+
https://conda.anaconda.org/conda-forge/linux-64/libnghttp2-1.58.0-h47da74e_1.conda#700ac6ea6d53d5510591c4344d5c989a
29+
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.45.3-h2797004_0.conda#b3316cbe90249da4f8e84cd66e1cc55b
30+
https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.11.0-h0841786_0.conda#1f5a58e686b13bcfde88b93f547d23fe
31+
https://conda.anaconda.org/conda-forge/linux-64/readline-8.2-h8228510_1.conda#47d31b792659ce70f470b5c82fdfb7a4
32+
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_h4845f30_101.conda#d453b98d9c83e71da0741bb0ff4d76bc
33+
https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.5-hfc55251_0.conda#04b88013080254850d6c01ed54810589
34+
https://conda.anaconda.org/conda-forge/linux-64/krb5-1.21.2-h659d440_0.conda#cd95826dbd331ed1be26bdf401432844
35+
https://conda.anaconda.org/conda-forge/linux-64/python-3.12.3-hab00c5b_0_cpython.conda#2540b74d304f71d3e89c81209db4db84
36+
https://conda.anaconda.org/conda-forge/linux-64/libcurl-8.7.1-hca28451_0.conda#755c7f876815003337d2c61ff5d047e5
37+
https://conda.anaconda.org/conda-forge/noarch/setuptools-69.5.1-pyhd8ed1ab_0.conda#7462280d81f639363e6e63c81276bd9e
38+
https://conda.anaconda.org/bioconda/noarch/snakemake-wrapper-utils-0.6.2-pyhdfd78af_0.tar.bz2#fd8759bbd04116eace828c4fab906096
39+
https://conda.anaconda.org/conda-forge/noarch/wheel-0.43.0-pyhd8ed1ab_1.conda#0b5293a157c2b5cd513dd1b03d8d3aae
40+
https://conda.anaconda.org/bioconda/linux-64/htslib-1.20-h81da01d_0.tar.bz2#1084947eefd2bbe9c1f84ca24061a9d5
41+
https://conda.anaconda.org/conda-forge/noarch/pip-24.0-pyhd8ed1ab_0.conda#f586ac1e56c8638b64f9c8122a7b8a67
42+
https://conda.anaconda.org/bioconda/linux-64/samtools-1.20-h50ea8bc_0.tar.bz2#7b3b1d0feea64e7e211ae24e7cd126d8

bio/samtools/markdup/environment.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
channels:
2+
- conda-forge
3+
- bioconda
4+
- nodefaults
5+
dependencies:
6+
- samtools =1.20
7+
- snakemake-wrapper-utils =0.6.2

bio/samtools/markdup/meta.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
name: samtools markdup
2+
description: Mark duplicate alignments in a coordinate sorted file .
3+
url: http://www.htslib.org/doc/samtools-markdup.html
4+
authors:
5+
- Filipe G. Vieira
6+
input:
7+
- SAM/BAM/CRAM file
8+
output:
9+
- SAM/BAM/CRAM file
10+
- SAM/BAM/CRAM index file (optional)
11+
params:
12+
- extra: additional program arguments (not `-@/--threads`, `--write-index`, `-m`, `-T`, `-f`, `-o` or `-O/--output-fmt`).

bio/samtools/markdup/test/Snakefile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
rule samtools_markdup:
2+
input:
3+
aln="{sample}.bam",
4+
output:
5+
bam="{sample}.markdup.bam",
6+
idx="{sample}.markdup.bam.csi",
7+
log:
8+
"{sample}.markdup.log",
9+
params:
10+
extra="-c --no-PG",
11+
threads: 2
12+
wrapper:
13+
"master/bio/samtools/markdup"

bio/samtools/markdup/test/a.bam

64 KB
Binary file not shown.

bio/samtools/markdup/wrapper.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
__author__ = "Filipe G. Vieira"
2+
__copyright__ = "Copyright 2024, Filipe G. Vieira"
3+
__license__ = "MIT"
4+
5+
6+
import tempfile
7+
from pathlib import Path
8+
from snakemake.shell import shell
9+
from snakemake_wrapper_utils.samtools import get_samtools_opts
10+
11+
samtools_opts = get_samtools_opts(snakemake, parse_output=False)
12+
extra = snakemake.params.get("extra", "")
13+
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
14+
15+
metrics = snakemake.output.get("metrics", "")
16+
if metrics:
17+
metrics = f"-f {metrics}"
18+
19+
20+
with tempfile.TemporaryDirectory() as tmpdir:
21+
tmp_prefix = Path(tmpdir) / "samtools_markdup"
22+
shell(
23+
"samtools markdup {samtools_opts} {extra} -T {tmp_prefix} {metrics} {snakemake.input[0]} {snakemake.output[0]} {log}"
24+
)

test.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4097,6 +4097,14 @@ def test_samtools_mpileup():
40974097
)
40984098

40994099

4100+
@skip_if_not_modified
4101+
def test_samtools_mpileup():
4102+
run(
4103+
"bio/samtools/markdup",
4104+
["snakemake", "--cores", "1", "a.markdup.bam", "--use-conda", "-F"],
4105+
)
4106+
4107+
41004108
@skip_if_not_modified
41014109
def test_samtools_stats():
41024110
run(

0 commit comments

Comments
 (0)