nf-bacannot

This README describes how to launch the Bacannot pipeline on the MAF AWS Infrastructure.

For information about the original pipeline and all the tools that are used by the analysis pipeline please refer to the Bacannot README file.

Please Note

A Contig ID is defined as the sequence header before the first space, please make sure that each ID is unique within the fasta file.
Make sure that each Contig ID is less than 37 characters (before the first space). This is a hard limit set by the prokka pipeline. You may use a very basic helper script renameFastaHeaders.py for this. USAGE.
For simple use cases of this pipeline, where you only have a genome that needs annotation, there is a helper script createSubmissionYaml.py that will accept a local folder of fasta files, an s3path and an output yaml file name. USAGE.
The createSubmissionYaml.py script will also print a suggested pipeline submission command that you may use to launch the pipeline using the submission files that you've just created.

Quick Test

aws batch submit-job \
    --job-name nf-bacannot-mrsa \
    --job-queue priority-maf-pipelines \
    --job-definition nextflow-production \
    --container-overrides command=FischbachLab/nf-bacannot,\
"-profile","maf",\
"--input","s3://genomics-workflow-core/Results/Bacannot/MRSA/20221102/MRSA.yaml",\
"--output","s3://genomics-workflow-core/Results/Bacannot/00_TEST/MRSA/20230407"

Usage

aws batch submit-job \
    --job-name nf-bacannot-hCom2 \
    --job-queue priority-maf-pipelines \
    --job-definition nextflow-production \
    --container-overrides command=FischbachLab/nf-bacannot,\
"-profile","maf",\
"--input","s3://genomics-workflow-core/Results/Bacannot/hCom2/20221102/inputs/hCom2.yaml"
"--output","s3://genomics-workflow-core/Results/Bacannot/hCom2/20221102"

Helper Scripts

`renameFastaHeaders.py`

python renameFastaHeaders.py <ORIGINAL_FASTA_FILE> <RENAMED_FASTA_FILE>

Example

python renameFastaHeaders.py fasta_folder/genome.fasta renamed_fasta_folder/genome.fasta

`createSubmissionYaml.py`

Create submission YAML file for bacannot pipeline

Install dependency:

conda create -n bacannot python=3.11
pip install -U ruamel.yaml cloudpathlib[s3]

Run the script:

python createSubmissionYaml.py \
    -g <Local or S3 Path to Genome(s) directory> \
    -project <Name of the project that this data belongs to> \
    -prefix <Subset of the data in this Project; or date in YYYYMMDD format> \
    -s <Output YAML file name> \
    --extension fna (Optional: if you wish to use a different extension for the fasta files, default is fasta) \
    --copy-genomes (Optional: if you wish to copy the input genomes to the output directory, default is False) \
    --use-bakta (Optional: if you wish to use Bakta, instead of the standard Prokka, Most people SHOULD NOT use this flag, default is False)

Example

python createSubmissionYaml.py \
    -g s3://genomics-workflow-core/Results/BinQC/MITI-MCB/20230324/fasta/ \
    -project MITI-MCB \
    -prefix 20230411 \
    -s test.yaml

`aggregateGFFs.py`

Copies GFF files from each sample folder to the aggregate folder.

Example

python aggregateGFFs.py \
  -p s3://genomics-workflow-core/Results/Bacannot/MITI-MCB/20230515 \
  -s s3://genomics-workflow-core/Results/Bacannot/MITI-MCB/20230515/inputs/DELETE_ME.yaml

Exploring the results

This pipeline generates A LOT of data per genome. Each genome contains a directory structure described here. The easiest way to explore this data interactively is by using docker.

Make sure you have docker installed. See instructions here.

Once docker is installed and running, sync the genome directory that is of interest to you, by using the aws s3 sync command. The following commands will explain the process using the annotation outputs of the Slackia-exigua-ATCC-700122-MAF-2 genome, present on S3 at s3://genomics-workflow-core/Results/Bacannot/00_TEST/20221031/.

Download results

aws s3 sync s3://genomics-workflow-core/Results/Bacannot/00_TEST/20221031/Slackia-exigua-ATCC-700122-MAF-2/ Slackia-exigua-ATCC-700122-MAF-2

This command will download all the data into a local folder called Slackia-exigua-ATCC-700122-MAF-2.

Launch Interactive Data Browser

cd Slackia-exigua-ATCC-700122-MAF-2
docker run -v $(pwd):/work -d --rm --platform linux/amd64 -p 3838:3838 -p 4567:4567 --name ServerBacannot fmalmeida/bacannot:server

If this is your first time running this viewer, you might see docker trying to download a lot of data. This is normal and can take some time depending on your internet speeds. Once complete, you're now ready to interact with your data. Simply open your favorite web browser and go to http://localhost:3838/. Note the use of http and not https. Some browsers may automatically make this change. In case you are unable to seen your webpage copy and paste this in your web browser (rather that clicking on this link).

If you're using an EC2 instance, go to the AWS EC2 console by logging into your AWS account and Identify your instance and note the public IP address for your instance. Open your favorite web browser and go to http://Public.IP.Address:3838/. Note the use of http and not https. Some browsers may automatically make this change. In case you are unable to seen your webpage copy and paste this in your web browser (rather that clicking on this link).

Et voila! You can now explore your data!

Shutdown the Data Browser

All great things must come to an end. Use the following command to shut down your docker daemon that will in turn kill the data explorer webpage.

docker rm -f ServerBacannot

Name		Name	Last commit message	Last commit date
Latest commit History 765 Commits
.github		.github
assets		assets
bin		bin
conf		conf
docker		docker
docs		docs
images		images
lib		lib
markdown		markdown
modules		modules
nf_functions		nf_functions
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.readthedocs.yml		.readthedocs.yml
.zenodo.json		.zenodo.json
LICENSE		LICENSE
MAF_README.md		MAF_README.md
README.md		README.md
example_samplesheet.yaml		example_samplesheet.yaml
main.nf		main.nf
mkdocs.yml		mkdocs.yml
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
update_dbs.sh		update_dbs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nf-bacannot

Table of contents

Please Note

Quick Test

Usage

Helper Scripts

`renameFastaHeaders.py`

Example

`createSubmissionYaml.py`

Example

`aggregateGFFs.py`

Example

Exploring the results

Download results

Launch Interactive Data Browser

Shutdown the Data Browser

About

Uh oh!

Releases

Packages

Languages

License

FischbachLab/nf-bacannot

Folders and files

Latest commit

History

Repository files navigation

nf-bacannot

Table of contents

Please Note

Quick Test

Usage

Helper Scripts

renameFastaHeaders.py

Example

createSubmissionYaml.py

Example

aggregateGFFs.py

Example

Exploring the results

Download results

Launch Interactive Data Browser

Shutdown the Data Browser

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`renameFastaHeaders.py`

`createSubmissionYaml.py`

`aggregateGFFs.py`

Packages