Skip to content

Commit 445b35f

Browse files
fgvieiracmeesters
andauthored
feat: Add support for regions file and arbitrary FAI/GZI paths (#2936)
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: Christian Meesters <[email protected]>
1 parent 0c7ae27 commit 445b35f

File tree

8 files changed

+113
-8
lines changed

8 files changed

+113
-8
lines changed

bio/samtools/faidx/meta.yaml

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,18 @@
11
name: samtools faidx
22
description: index reference sequence in FASTA format from reference sequence.
3+
url: http://www.htslib.org/doc/samtools-faidx.html
34
authors:
45
- Michael Chambers
56
- Filipe G. Vieira
67
input:
78
- reference sequence file (.fa)
9+
- regions: file with regions
10+
- fai: index for reference file (optional)
11+
- gzi: index for BGZip'ed reference file (optional)
812
output:
913
- indexed reference sequence file (.fai)
10-
notes: |
11-
* The `extra` param allows for additional program arguments (not `-o`).
12-
* For more information see, http://www.htslib.org/doc/samtools-faidx.html
14+
- fai: index for reference file (optional)
15+
- gzi: index for BGZip'ed reference file (optional)
16+
params:
17+
- region: region to extract from input file (optional)
18+
- extra: additional program arguments (not `-o`).

bio/samtools/faidx/test/Snakefile

Lines changed: 61 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,69 @@
1-
rule samtools_index:
1+
rule samtools_faidx:
22
input:
33
"{sample}.fa",
44
output:
5-
"{sample}.fa.fai",
5+
"out/{sample}.fa.fai",
66
log:
77
"{sample}.log",
88
params:
9-
extra="", # optional params string
9+
extra="",
10+
wrapper:
11+
"master/bio/samtools/faidx"
12+
13+
14+
rule samtools_faidx_named:
15+
input:
16+
"{sample}.fa",
17+
output:
18+
fai="out/{sample}.named.fa.fai",
19+
log:
20+
"{sample}.named.log",
21+
params:
22+
extra="",
23+
wrapper:
24+
"master/bio/samtools/faidx"
25+
26+
27+
rule samtools_faidx_bgzip:
28+
input:
29+
"{sample}.fa.bgz",
30+
output:
31+
fai="out/{sample}.fas.bgz.fai",
32+
gzi="out/{sample}.fas.bgz.gzi",
33+
log:
34+
"{sample}.bzgip.log",
35+
params:
36+
extra="",
37+
wrapper:
38+
"master/bio/samtools/faidx"
39+
40+
41+
rule samtools_faidx_region:
42+
input:
43+
"{sample}.fa",
44+
fai="idx/{sample}.fa.fai",
45+
output:
46+
"out/{sample}.fas",
47+
log:
48+
"{sample}.region.log",
49+
params:
50+
region="ref",
51+
extra="--length 5",
52+
wrapper:
53+
"master/bio/samtools/faidx"
54+
55+
56+
rule samtools_faidx_bgzip_region:
57+
input:
58+
"{sample}.fa.bgz",
59+
fai="idx/{sample}.fa.bgz.fai",
60+
gzi="idx/{sample}.fa.bgz.gzi",
61+
output:
62+
"out/{sample}.bgz.fas",
63+
log:
64+
"{sample}.bgzip_region.log",
65+
params:
66+
region="ref",
67+
extra="--length 5",
1068
wrapper:
1169
"master/bio/samtools/faidx"

bio/samtools/faidx/test/genome.fa.bgz

122 Bytes
Binary file not shown.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
ref 45 5 45 46
2+
ref2 40 57 40 41
8 Bytes
Binary file not shown.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
ref 45 5 45 46
2+
ref2 40 57 40 41

bio/samtools/faidx/wrapper.py

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,28 @@
1313
extra = snakemake.params.get("extra", "")
1414
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
1515

16-
shell("samtools faidx {samtools_opts} {extra} {snakemake.input[0]} {log}")
16+
17+
# Get regions (if present)
18+
regions = snakemake.input.get("regions", "")
19+
if regions:
20+
regions = f"--regions-file {regions}"
21+
22+
region = snakemake.params.get("region", "")
23+
24+
# Get FAI and GZI files
25+
if region or regions:
26+
fai = snakemake.input.get("fai", "")
27+
gzi = snakemake.input.get("gzi", "")
28+
else:
29+
fai = snakemake.output.get("fai", "")
30+
gzi = snakemake.output.get("gzi", "")
31+
32+
if fai:
33+
fai = f"--fai-idx {fai}"
34+
if gzi:
35+
gzi = f"--gzi-idx {gzi}"
36+
37+
38+
shell(
39+
"samtools faidx {fai} {gzi} {regions} {samtools_opts} {extra} {snakemake.input[0]} {region:q} {log}"
40+
)

test.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1453,6 +1453,7 @@ def test_goleft_indexcov():
14531453
["snakemake", "--cores", "1", "--use-conda", "-Fp"],
14541454
)
14551455

1456+
14561457
@skip_if_not_modified
14571458
def test_gridss_call():
14581459
run(
@@ -4171,7 +4172,19 @@ def test_samtools_fastq_separate():
41714172
def test_samtools_faidx():
41724173
run(
41734174
"bio/samtools/faidx",
4174-
["snakemake", "--cores", "1", "genome.fa.fai", "--use-conda", "-F"],
4175+
[
4176+
"snakemake",
4177+
"--cores",
4178+
"1",
4179+
"out/genome.fa.fai",
4180+
"out/genome.named.fa.fai",
4181+
"out/genome.fas.bgz.fai",
4182+
"out/genome.fas.bgz.gzi",
4183+
"out/genome.fas",
4184+
"out/genome.bgz.fas",
4185+
"--use-conda",
4186+
"-F",
4187+
],
41754188
)
41764189

41774190

0 commit comments

Comments
 (0)