Skip to content

Commit 1c1c9ae

Browse files
authored
Merge pull request #268 from nf-core/dev
Release - v1.1.0 - British Beans on Toast
2 parents 98a0815 + af57abe commit 1c1c9ae

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+2450
-480
lines changed

.github/workflows/ci.yml

+27-26
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
parameters:
3030
- "--annotation_tool prodigal"
3131
- "--annotation_tool prokka"
32-
## Warning: we can't test Bakta as uses more memory than available on GHA CIs
32+
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"
3333

3434
steps:
3535
- name: Check out pipeline code
@@ -57,6 +57,7 @@ jobs:
5757
parameters:
5858
- "--annotation_tool prodigal"
5959
- "--annotation_tool prokka"
60+
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"
6061

6162
steps:
6263
- name: Check out pipeline code
@@ -71,31 +72,31 @@ jobs:
7172
run: |
7273
nextflow run ${GITHUB_WORKSPACE} -profile test_bgc,docker --outdir ./results ${{ matrix.parameters }}
7374
74-
## DEACTIVATE CURRENTLY DUE TO EXTENDED DATABASE SERVER FAILURE
75-
## CAN REACTIVATE ONCE WORKING AGAIN
76-
# test_deeparg:
77-
# name: Run pipeline with test data (DeepARG only workflow)
78-
# # Only run on push if this is the nf-core dev branch (merged PRs)
79-
# if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
80-
# runs-on: ubuntu-latest
81-
# strategy:
82-
# matrix:
83-
# NXF_VER:
84-
# - "22.10.1"
85-
# - "latest-everything"
86-
# parameters:
87-
# - "--annotation_tool prodigal"
88-
# - "--annotation_tool prokka"
75+
test_deeparg:
76+
name: Run pipeline with test data (DeepARG only workflow)
77+
# Only run on push if this is the nf-core dev branch (merged PRs)
78+
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
79+
runs-on: ubuntu-latest
80+
strategy:
81+
matrix:
82+
NXF_VER:
83+
- "22.10.1"
84+
- "latest-everything"
85+
parameters:
86+
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"
87+
- "--annotation_tool prodigal"
88+
- "--annotation_tool prokka"
89+
- "--annotation_tool pyrodigal"
8990

90-
# steps:
91-
# - name: Check out pipeline code
92-
# uses: actions/checkout@v2
91+
steps:
92+
- name: Check out pipeline code
93+
uses: actions/checkout@v2
9394

94-
# - name: Install Nextflow
95-
# uses: nf-core/setup-nextflow@v1
96-
# with:
97-
# version: "${{ matrix.NXF_VER }}"
95+
- name: Install Nextflow
96+
uses: nf-core/setup-nextflow@v1
97+
with:
98+
version: "${{ matrix.NXF_VER }}"
9899

99-
# - name: Run pipeline with test data (DeepARG workflow)
100-
# run: |
101-
# nextflow run ${GITHUB_WORKSPACE} -profile test_deeparg,docker --outdir ./results ${{ matrix.parameters }}
100+
- name: Run pipeline with test data (DeepARG workflow)
101+
run: |
102+
nextflow run ${GITHUB_WORKSPACE} -profile test_deeparg,docker --outdir ./results ${{ matrix.parameters }}

CHANGELOG.md

+33
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,39 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
## v1.1.0 - British Beans on Toast - [2023-04-26]
7+
8+
### `Added`
9+
10+
- [#238](https://github.com/nf-core/funcscan/pull/238) Added dedicated DRAMP database downloading step for AMPcombi to prevent parallel downloads when no database provided by user. (by @jfy133)
11+
- [#235](https://github.com/nf-core/funcscan/pull/235) Added parameter `annotation_bakta_db_downloadtype` to be able to switch between downloading either full (33.1 GB) or light (1.3 GB excluding UPS, IPS, PSC, see parameter description) versions of the Bakta database. (by @jasmezz)
12+
- [#249](https://github.com/nf-core/funcscan/pull/249) Added bakta annotation to CI tests. (by @jasmezz)
13+
- [#251](https://github.com/nf-core/funcscan/pull/251) Added annotation tool: Pyrodigal. (by @jasmezz)
14+
- [#252](https://github.com/nf-core/funcscan/pull/252) Added a new parameter `-arg_rgi_savejson` that saves the file `<samplename>.json` in the RGI directory. The default ouput for RGI is now only `<samplename>.txt`. (by @darcy220606)
15+
- [#253](https://github.com/nf-core/funcscan/pull/253) Updated Prodigal to have compressed output files. (by @jasmezz)
16+
- [#262](https://github.com/nf-core/funcscan/pull/262) Added comBGC function to screen whole directory of antiSMASH output (one subfolder per sample). (by @jasmezz)
17+
- [#263](https://github.com/nf-core/funcscan/pull/263) Removed `AMPlify` from test_full.config. (by @jasmezz)
18+
- [#266](https://github.com/nf-core/funcscan/pull/266) Updated README.md with Pyrodigal. (by @jasmezz)
19+
20+
### `Fixed`
21+
22+
- [#243](https://github.com/nf-core/funcscan/pull/243) Compress the ampcombi_complete_summary.csv in the output directory. (by @louperelo)
23+
- [#237](https://github.com/nf-core/funcscan/pull/237) Reactivate DeepARG automatic database downloading and CI tests as server is now back up. (by @jfy133)
24+
- [#235](https://github.com/nf-core/funcscan/pull/235) Improved annotation speed by switching off pipeline-irrelevant Bakta annotation steps by default. (by @jasmezz)
25+
- [#235](https://github.com/nf-core/funcscan/pull/235) Renamed parameter `annotation_bakta_db` to `annotation_bakta_db_localpath`. (by @jasmezz)
26+
- [#242](https://github.com/nf-core/funcscan/pull/242) Fixed MACREL '.faa' issue that was generated when it was run on its own and upgraded MACREL from version `1.1.0` to `1.2.0` (by @Darcy220606)
27+
- [#248](https://github.com/nf-core/funcscan/pull/248) Applied best-practice `error("message")` to all (sub)workflow files. (by @jasmezz)
28+
- [#254](https://github.com/nf-core/funcscan/pull/254) Further resource optimisation based on feedback from 'real world' datasets. (ongoing, reported by @alexhbnr and @Darcy220606, fix by @jfy133)
29+
- [#266](https://github.com/nf-core/funcscan/pull/266) Fixed wrong process name in base.config. (reported by @Darcy220606, fix by @jasmezz)
30+
31+
### `Dependencies`
32+
33+
| Tool | Previous version | New version |
34+
| ----- | ---------------- | ----------- |
35+
| Bakta | 1.6.1 | 1.7.0 |
36+
37+
### `Deprecated`
38+
639
## v1.0.1 - [2023-02-27]
740

841
### `Added`

CITATIONS.md

+8-4
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
- [ABRicate](https://github.com/tseemann/abricate)
1414

15-
> Seemann T. (2020). ABRicate. Github [https://github.com/tseemann/abricate](https://github.com/tseemann/abricate).
15+
> Seemann, T. (2020). ABRicate. Github [https://github.com/tseemann/abricate](https://github.com/tseemann/abricate).
1616
1717
- [AMPir](https://doi.org/10.1093/bioinformatics/btaa653)
1818

@@ -48,15 +48,15 @@
4848
4949
- [GECCO](https://gecco.embl.de)
5050

51-
> Carroll, L.M. , Larralde, M., Fleck, J. S., Ponnudurai, R., Milanese, A., Cappio Barazzone, E. & Zeller, G. (2021). Accurate de novo identification of biosynthetic gene clusters with GECCO. bioRxiv [DOI: 10.1101/2021.05.03.442509](https://doi.org/10.1101/2021.05.03.442509)
51+
> Carroll, L. M. , Larralde, M., Fleck, J. S., Ponnudurai, R., Milanese, A., Cappio Barazzone, E. & Zeller, G. (2021). Accurate de novo identification of biosynthetic gene clusters with GECCO. bioRxiv [DOI: 10.1101/2021.05.03.442509](https://doi.org/10.1101/2021.05.03.442509)
5252
5353
- [hAMRonization](https://github.com/pha4ge/hAMRonization)
5454

5555
> Public Health Alliance for Genomic Epidemiology (pha4ge). (2022). Parse multiple Antimicrobial Resistance Analysis Reports into a common data structure. Github. Retrieved October 5, 2022, from [https://github.com/pha4ge/hAMRonization](https://github.com/pha4ge/hAMRonization)
5656
5757
- [AMPcombi](https://github.com/Darcy220606/AMPcombi)
5858

59-
> Anan Ibrahim, & Louisa Perelo. (2023). Darcy220606/AMPcombi. [DOI: 10.5281/zenodo.7639121](https://doi.org/10.5281/zenodo.7639121).
59+
> Ibrahim, A. & Perelo, L. (2023). Darcy220606/AMPcombi. [DOI: 10.5281/zenodo.7639121](https://doi.org/10.5281/zenodo.7639121).
6060
6161
- [HMMER](https://doi.org/10.1371/journal.pcbi.1002195.)
6262

@@ -72,7 +72,11 @@
7272
7373
- [PROKKA](https://doi.org/10.1093/bioinformatics/btu153)
7474

75-
> Seemann T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics (Oxford, England), 30(14), 2068–2069. [DOI: 10.1093/bioinformatics/btu153](https://doi.org/10.1093/bioinformatics/btu153)
75+
> Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics (Oxford, England), 30(14), 2068–2069. [DOI: 10.1093/bioinformatics/btu153](https://doi.org/10.1093/bioinformatics/btu153)
76+
77+
- [Pyrodigal](https://doi.org/10.1186/1471-2105-11-119)
78+
79+
> Larralde, M. (2022). Pyrodigal: Python bindings and interface to Prodigal, an efficient method for gene prediction in prokaryotes. Journal of Open Source Software, 7(72), 4296. [DOI: 10.21105/joss.04296](https://doi.org/10.21105/joss.04296)
7680
7781
- [RGI](https://doi.org/10.1093/nar/gkz935)
7882

LICENSE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) Jasmin Frangenberg, Anan Ibrahim, James A. Fellows Yates
3+
Copyright (c) Jasmin Frangenberg, Anan Ibrahim, Louisa Perelo, Moritz E. Beber, James A. Fellows Yates
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s
2121

2222
## Pipeline summary
2323

24-
1. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
24+
1. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Pyrodigal`](https://github.com/althonos/pyrodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
2525
2. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
2626
3. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg)
2727
4. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)

bin/ampcombi_download.py

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
#!/usr/bin/env python3
2+
3+
#########################################
4+
# Authors: [Anan Ibrahim](https://github.com/brianjohnhaas), [Louisa Perelo](https://github.com/louperelo)
5+
# File: amp_database.py
6+
# Source: https://github.com/Darcy220606/AMPcombi/blob/main/ampcombi/amp_database.py
7+
# Source+commit: https://github.com/Darcy220606/AMPcombi/commit/a75bc00c32ecf873a133b18cf01f172ad9cf0d2d/ampcombi/amp_database.py
8+
# Download Date: 2023-03-08, commit: a75bc00c
9+
# This source code is licensed under the MIT license
10+
#########################################
11+
12+
# TITLE: Download the DRAMP database if input db empty AND and make database compatible for diamond
13+
14+
import pandas as pd
15+
import requests
16+
import os
17+
from datetime import datetime
18+
import subprocess
19+
from Bio import SeqIO
20+
import tempfile
21+
import shutil
22+
23+
24+
########################################
25+
# FUNCTION: DOWNLOAD DRAMP DATABASE AND CLEAN IT
26+
#########################################
27+
def download_DRAMP(db):
28+
##Download the (table) file and store it in a results directory
29+
url = "http://dramp.cpu-bioinfor.org/downloads/download.php?filename=download_data/DRAMP3.0_new/general_amps.xlsx"
30+
r = requests.get(url, allow_redirects=True)
31+
with open(db + "/" + "general_amps.xlsx", "wb") as f:
32+
f.write(r.content)
33+
##Convert excel to tab sep file and write it to a file in the DRAMP_db directly with the date its downloaded
34+
date = datetime.now().strftime("%Y_%m_%d")
35+
ref_amps = pd.read_excel(db + "/" + r"general_amps.xlsx")
36+
ref_amps.to_csv(db + "/" + f"general_amps_{date}.tsv", index=None, header=True, sep="\t")
37+
##Download the (fasta) file and store it in a results directory
38+
urlfasta = (
39+
"http://dramp.cpu-bioinfor.org/downloads/download.php?filename=download_data/DRAMP3.0_new/general_amps.fasta"
40+
)
41+
z = requests.get(urlfasta)
42+
fasta_path = os.path.join(db + "/" + f"general_amps_{date}.fasta")
43+
with open(fasta_path, "wb") as f:
44+
f.write(z.content)
45+
##Cleaning step to remove ambigous aminoacids from sequences in the database (e.g. zeros and brackets)
46+
new_fasta = db + "/" + f"general_amps_{date}_clean.fasta"
47+
seq_record = SeqIO.parse(open(fasta_path), "fasta")
48+
with open(new_fasta, "w") as f:
49+
for record in seq_record:
50+
id, sequence = record.id, str(record.seq)
51+
letters = [
52+
"A",
53+
"C",
54+
"D",
55+
"E",
56+
"F",
57+
"G",
58+
"H",
59+
"I",
60+
"K",
61+
"L",
62+
"M",
63+
"N",
64+
"P",
65+
"Q",
66+
"R",
67+
"S",
68+
"T",
69+
"V",
70+
"W",
71+
"Y",
72+
]
73+
new = "".join(i for i in sequence if i in letters)
74+
f.write(">" + id + "\n" + new + "\n")
75+
return os.remove(fasta_path), os.remove(db + "/" + r"general_amps.xlsx")
76+
77+
78+
download_DRAMP("amp_ref_database")

bin/comBGC.py

+56-8
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
SOFTWARE.
3333
"""
3434

35-
tool_version = "0.5"
35+
tool_version = "0.6.0"
3636
welcome = """\
3737
........................
3838
* comBGC v.{version} *
@@ -61,7 +61,9 @@
6161
these can be:
6262
- antiSMASH: <sample name>.gbk and (optional) knownclusterblast/ directory
6363
- DeepBGC: <sample name>.bgc.tsv
64-
- GECCO: <sample name>.clusters.tsv""",
64+
- GECCO: <sample name>.clusters.tsv
65+
Note: Please provide files from a single sample only. If you would like to
66+
summarize multiple samples, please see the --antismash_multiple_samples flag.""",
6567
)
6668
parser.add_argument(
6769
"-o",
@@ -73,6 +75,16 @@
7375
type=str,
7476
default=".",
7577
)
78+
parser.add_argument(
79+
"-a",
80+
"--antismash_multiple_samples",
81+
metavar="PATH",
82+
dest="antismash_multiple_samples",
83+
nargs="?",
84+
help="""directory of antiSMASH output. Should contain subfolders (one per
85+
sample). Can only be used if --input is not specified.""",
86+
type=str,
87+
)
7688
parser.add_argument("-vv", "--verbose", help="increase output verbosity", action="store_true")
7789
parser.add_argument("-v", "--version", help="show version number and exit", action="store_true")
7890

@@ -81,6 +93,7 @@
8193

8294
# Assign input arguments to variables
8395
input = args.input
96+
dir_antismash = args.antismash_multiple_samples
8497
outdir = args.outdir
8598
verbose = args.verbose
8699
version = args.version
@@ -111,15 +124,38 @@
111124
elif path.endswith("knownclusterblast/"):
112125
input_antismash.append(path)
113126

127+
if input and dir_antismash:
128+
exit(
129+
"The flags --input and --antismash_multiple_samples are mutually exclusive.\nPlease use only one of them (or see --help for how to use)."
130+
)
131+
114132
# Make sure that at least one input argument is given
115-
if not (input_antismash or input_gecco or input_deepbgc):
133+
if not (input_antismash or input_gecco or input_deepbgc or dir_antismash):
116134
exit("Please specify at least one input file (i.e. output from antismash, deepbgc, or gecco) or see --help")
117135

118136
########################
119137
# ANTISMASH FUNCTIONS
120138
########################
121139

122140

141+
def prepare_multisample_input_antismash(antismash_dir):
142+
"""
143+
Prepare string of input paths of a given antiSMASH output folder (with sample subdirectories)
144+
"""
145+
sample_paths = []
146+
for root, subdirs, files in os.walk(antismash_dir):
147+
antismash_file = "/".join([root, "index.html"])
148+
if os.path.exists(antismash_file):
149+
sample = root.split("/")[-1]
150+
gbk_path = "/".join([root, sample]) + ".gbk"
151+
kkb_path = "/".join([root, "knownclusterblast"])
152+
if os.path.exists(kkb_path):
153+
sample_paths.append([gbk_path, kkb_path])
154+
else:
155+
sample_paths.append([gbk_path])
156+
return sample_paths
157+
158+
123159
def parse_knownclusterblast(kcb_file_path):
124160
"""
125161
Extract MIBiG IDs from knownclusterblast TXT file.
@@ -148,9 +184,6 @@ def antismash_workflow(antismash_paths):
148184
- Return data frame with aggregated info.
149185
"""
150186

151-
if verbose:
152-
print("\nParsing antiSMASH files\n... ", end="")
153-
154187
antismash_sum_cols = [
155188
"Sample_ID",
156189
"Prediction_tool",
@@ -186,6 +219,9 @@ def antismash_workflow(antismash_paths):
186219

187220
# Aggregate information
188221
Sample_ID = gbk_path.split("/")[-1].split(".gbk")[-2] # Assuming file name equals sample name
222+
if verbose:
223+
print("\nParsing antiSMASH file(s): " + Sample_ID + "\n... ", end="")
224+
189225
with open(gbk_path) as gbk:
190226
for record in SeqIO.parse(gbk, "genbank"): # GBK records are contigs in this case
191227
# Initiate variables per contig
@@ -514,7 +550,13 @@ def gecco_workflow(gecco_paths):
514550
########################
515551

516552
if __name__ == "__main__":
517-
tools = {"antiSMASH": input_antismash, "deepBGC": input_deepbgc, "GECCO": input_gecco}
553+
if input_antismash:
554+
tools = {"antiSMASH": input_antismash, "deepBGC": input_deepbgc, "GECCO": input_gecco}
555+
elif dir_antismash:
556+
tools = {"antiSMASH": dir_antismash}
557+
else:
558+
tools = {"deepBGC": input_deepbgc, "GECCO": input_gecco}
559+
518560
tools_provided = {}
519561

520562
for tool in tools.keys():
@@ -532,7 +574,13 @@ def gecco_workflow(gecco_paths):
532574

533575
for tool in tools_provided.keys():
534576
if tool == "antiSMASH":
535-
summary_antismash = antismash_workflow(input_antismash)
577+
if dir_antismash:
578+
antismash_paths = prepare_multisample_input_antismash(dir_antismash)
579+
for input_antismash in antismash_paths:
580+
summary_antismash_temp = antismash_workflow(input_antismash)
581+
summary_antismash = pd.concat([summary_antismash, summary_antismash_temp])
582+
else:
583+
summary_antismash = antismash_workflow(input_antismash)
536584
elif tool == "deepBGC":
537585
summary_deepbgc = deepbgc_workflow(input_deepbgc)
538586
elif tool == "GECCO":

0 commit comments

Comments
 (0)