(fork of midas_nextflow)
Running Midas using Nextflow Optimized for Running on AWS Batch
Beta This workflow is designed to help streamline bacterial metagenomic and metatranscriptomic data analysis using the Nextflow workflow manager. Trimmed, aligned, and filtered reads are passed on to MIDAS in order to assign taxonomy using the Phy-Eco single-copy marker gene set and pangenome database (v1.2) which provides species taxonomic level assignment, gene content, and strain level variation via single-nucleotide-polymorphism (SNP's) analysis for each metagenome.
This minimal test_data set was developed using Saccharibacteria Nanosynbacter lyticus HMT 952, formerly TM7x, a egnimatic member of the candidate phyla radiation (CPR), further emphazing the application of this workflow.
aws batch submit-job \
--job-name nf-midas-test-0302-2 \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command=FischbachLab/nf-midas,\
"--manifest","s3://genomics-workflow-core/Results/midas/00_TEST/00_seedfile/test.seedfile.csv"
aws batch submit-job \
--job-name nf-midas-hComB6Gen \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command=FischbachLab/nf-midas,\
"--project","hCom-B6Gen",\
"--manifest","s3://genomics-workflow-core/Results/midas/hCom-B6Gen/00_seedfile/hCom-B6Gen.seedfile.csv"
- Manifest file (S3://)
- Path to paried
or single endfastq or fastq.gz (R1 and R2) (S3://) - Metadata additional Columns with simple header format
- Path to paried
- Work directory (S3://)
- Path to Midas Database (S3://)
By default, the outdir is s3://genomics-workflow-core/Results/midas/<PROJECT>/
- Species Analysis from MiDAS
- Gene Analysis from MiDAS
- SNP Analysis from MiDAS
- Merged files for Species analysis
- Optional merged files for Genes and SNP analysis
- --outdir Folder to place analysis outputs (default ./midas)
- --species_cov Coverage (depth) threshold for species inclusion (default: 3.0)
- --merge_sample_depth Corresponds to the --sample_depth parameter in the merge_midas.py command (default: 1.0)
- The manifest is a CSV with a header indicating which samples correspond to which files.
- The file must contain a column
sampleName
. This value can not be repeated. - Reads are specified by columns,
R1
andR2
. - Below is an example manifest file
sampleName,R1,R2
EHECv3-121823-deltaFirmi-C1M1,s3://maf-sequencing/Illumina/20240626_LH00572_0050_A22MHTTLT3/AC-Metagenomics/MAF_20240411_AGC-mNGS_A01_P4_S286_R1_001.fastq.gz,s3://maf-sequencing/Illumina/20240626_LH00572_0050_A22MHTTLT3/AC-Metagenomics/MAF_20240411_AGC-mNGS_A01_P4_S286_R2_001.fastq.gz
EHECv3-121823-deltaFirmi-C1M2,s3://maf-sequencing/Illumina/20240626_LH00572_0050_A22MHTTLT3/AC-Metagenomics/MAF_20240411_AGC-mNGS_A02_P4_S287_R1_001.fastq.gz,s3://maf-sequencing/Illumina/20240626_LH00572_0050_A22MHTTLT3/AC-Metagenomics/MAF_20240411_AGC-mNGS_A02_P4_S287_R2_001.fastq.gz
- Sunit Jain
- Xiandong Meng