Add support for presupplied annotation files (FAA + GFF or FAA + GBK) #340

jfy133 · 2024-02-14T11:07:11Z

PR checklist

…e new columns not present)

Co-authored-by: Moritz E. Beber <[email protected]>

… subwkflws)

…with gzip support

…resupplied-orfs

github-actions · 2024-02-14T11:08:44Z

`nf-core lint` overall result: Passed ✅

Posted for pipeline commit 2d8b238

+| ✅ 315 tests passed       |+

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-funcscan_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-funcscan_logo_light.png
files_exist - File found: docs/images/nf-core-funcscan_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-funcscan_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowFuncscan.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.2.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.contig_qc_lengththreshold= 3000.0
nextflow_config - Config default value correct: params.taxa_classification_tool= mmseqs2
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_databases_id= Kalamari
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_taxonomy_searchtype= 2
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_taxonomy_lcaranks= kingdom,phylum,class,order,family,genus,species
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_taxonomy_taxlineage= 1
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_taxonomy_sensitivity= 5.0
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_taxonomy_orffilters= 2.0
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_taxonomy_lcamode= 3
nextflow_config - Config default value correct: params.taxa_classification_mmseqs_taxonomy_votemode= 1
nextflow_config - Config default value correct: params.annotation_tool= pyrodigal
nextflow_config - Config default value correct: params.annotation_bakta_db_downloadtype= full
nextflow_config - Config default value correct: params.annotation_bakta_mincontiglen= 1
nextflow_config - Config default value correct: params.annotation_bakta_translationtable= 11
nextflow_config - Config default value correct: params.annotation_bakta_gram= ?
nextflow_config - Config default value correct: params.annotation_prokka_kingdom= Bacteria
nextflow_config - Config default value correct: params.annotation_prokka_gcode= 11
nextflow_config - Config default value correct: params.annotation_prokka_mincontiglen= 1
nextflow_config - Config default value correct: params.annotation_prokka_evalue= 1e-06
nextflow_config - Config default value correct: params.annotation_prokka_coverage= 80
nextflow_config - Config default value correct: params.annotation_prokka_compliant= true
nextflow_config - Config default value correct: params.annotation_prodigal_transtable= 11
nextflow_config - Config default value correct: params.annotation_pyrodigal_transtable= 11
nextflow_config - Config default value correct: params.amp_ampir_model= precursor
nextflow_config - Config default value correct: params.amp_ampir_minlength= 10
nextflow_config - Config default value correct: params.amp_ampcombi_cutoff= 0.0
nextflow_config - Config default value correct: params.arg_amrfinderplus_identmin= -1.0
nextflow_config - Config default value correct: params.arg_amrfinderplus_coveragemin= 0.5
nextflow_config - Config default value correct: params.arg_amrfinderplus_translationtable= 11
nextflow_config - Config default value correct: params.arg_deeparg_data_version= 2
nextflow_config - Config default value correct: params.arg_deeparg_model= LS
nextflow_config - Config default value correct: params.arg_deeparg_minprob= 0.8
nextflow_config - Config default value correct: params.arg_deeparg_alignmentevalue= 1e-10
nextflow_config - Config default value correct: params.arg_deeparg_alignmentidentity= 50
nextflow_config - Config default value correct: params.arg_deeparg_alignmentoverlap= 0.8
nextflow_config - Config default value correct: params.arg_deeparg_numalignmentsperentry= 1000
nextflow_config - Config default value correct: params.arg_fargene_hmmmodel= class_a,class_b_1_2,class_b_3,class_c,class_d_1,class_d_2,qnr,tet_efflux,tet_rpg,tet_enzyme
nextflow_config - Config default value correct: params.arg_fargene_minorflength= 90
nextflow_config - Config default value correct: params.arg_fargene_translationformat= pearson
nextflow_config - Config default value correct: params.arg_rgi_savejson= false
nextflow_config - Config default value correct: params.arg_rgi_savetmpfiles= false
nextflow_config - Config default value correct: params.arg_rgi_alignmenttool= BLAST
nextflow_config - Config default value correct: params.arg_rgi_includeloose= false
nextflow_config - Config default value correct: params.arg_rgi_includenudge= false
nextflow_config - Config default value correct: params.arg_rgi_lowquality= false
nextflow_config - Config default value correct: params.arg_rgi_data= NA
nextflow_config - Config default value correct: params.arg_rgi_split_prodigal_jobs= true
nextflow_config - Config default value correct: params.arg_abricate_db= ncbi
nextflow_config - Config default value correct: params.arg_abricate_minid= 80
nextflow_config - Config default value correct: params.arg_abricate_mincov= 80
nextflow_config - Config default value correct: params.bgc_antismash_contigminlength= 1000
nextflow_config - Config default value correct: params.bgc_antismash_hmmdetectionstrictness= relaxed
nextflow_config - Config default value correct: params.bgc_antismash_taxon= bacteria
nextflow_config - Config default value correct: params.bgc_deepbgc_score= 0.5
nextflow_config - Config default value correct: params.bgc_deepbgc_mergemaxproteingap= 0
nextflow_config - Config default value correct: params.bgc_deepbgc_mergemaxnuclgap= 0
nextflow_config - Config default value correct: params.bgc_deepbgc_minnucl= 1
nextflow_config - Config default value correct: params.bgc_deepbgc_minproteins= 1
nextflow_config - Config default value correct: params.bgc_deepbgc_mindomains= 1
nextflow_config - Config default value correct: params.bgc_deepbgc_minbiodomains= 0
nextflow_config - Config default value correct: params.bgc_deepbgc_classifierscore= 0.5
nextflow_config - Config default value correct: params.bgc_gecco_cds= 3
nextflow_config - Config default value correct: params.bgc_gecco_pfilter= 1e-09
nextflow_config - Config default value correct: params.bgc_gecco_threshold= 0.8
nextflow_config - Config default value correct: params.bgc_gecco_edgedistance= 0
nextflow_config - Config default value correct: params.arg_hamronization_summarizeformat= tsv
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-funcscan_logo_light.png matches the template
files_unchanged - docs/images/nf-core-funcscan_logo_light.png matches the template
files_unchanged - docs/images/nf-core-funcscan_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (313 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - GUNZIP found in conf/base.config and Nextflow scripts.
base_config - UNTAR found in conf/base.config and Nextflow scripts.
base_config - PROKKA found in conf/base.config and Nextflow scripts.
base_config - PRODIGAL_GBK found in conf/base.config and Nextflow scripts.
base_config - BAKTA_BAKTA found in conf/base.config and Nextflow scripts.
base_config - ABRICATE_RUN found in conf/base.config and Nextflow scripts.
base_config - AMRFINDERPLUS_RUN found in conf/base.config and Nextflow scripts.
base_config - DEEPARG_DOWNLOADDATA found in conf/base.config and Nextflow scripts.
base_config - DEEPARG_PREDICT found in conf/base.config and Nextflow scripts.
base_config - FARGENE found in conf/base.config and Nextflow scripts.
base_config - RGI_MAIN found in conf/base.config and Nextflow scripts.
base_config - AMPIR found in conf/base.config and Nextflow scripts.
base_config - AMPLIFY_PREDICT found in conf/base.config and Nextflow scripts.
base_config - AMP_HMMER_HMMSEARCH found in conf/base.config and Nextflow scripts.
base_config - MACREL_CONTIGS found in conf/base.config and Nextflow scripts.
base_config - BGC_HMMER_HMMSEARCH found in conf/base.config and Nextflow scripts.
base_config - ANTISMASH_ANTISMASHLITE found in conf/base.config and Nextflow scripts.
base_config - ANTISMASH_ANTISMASHLITEDOWNLOADDATABASES found in conf/base.config and Nextflow scripts.
base_config - DEEPBGC_DOWNLOAD found in conf/base.config and Nextflow scripts.
base_config - DEEPBGC_PIPELINE found in conf/base.config and Nextflow scripts.
base_config - GECCO_RUN found in conf/base.config and Nextflow scripts.
base_config - HAMRONIZATION_ABRICATE found in conf/base.config and Nextflow scripts.
base_config - HAMRONIZATION_AMRFINDERPLUS found in conf/base.config and Nextflow scripts.
base_config - HAMRONIZATION_DEEPARG found in conf/base.config and Nextflow scripts.
base_config - HAMRONIZATION_RGI found in conf/base.config and Nextflow scripts.
base_config - HAMRONIZATION_FARGENE found in conf/base.config and Nextflow scripts.
base_config - HAMRONIZATION_SUMMARIZE found in conf/base.config and Nextflow scripts.
base_config - AMPCOMBI found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - GUNZIP found in conf/modules.config and Nextflow scripts.
modules_config - SEQKIT_SEQ_LONG found in conf/modules.config and Nextflow scripts.
modules_config - SEQKIT_SEQ_SHORT found in conf/modules.config and Nextflow scripts.
modules_config - MMSEQS_DATABASES found in conf/modules.config and Nextflow scripts.
modules_config - MMSEQS_CREATEDB found in conf/modules.config and Nextflow scripts.
modules_config - MMSEQS_TAXONOMY found in conf/modules.config and Nextflow scripts.
modules_config - MMSEQS_CREATETSV found in conf/modules.config and Nextflow scripts.
modules_config - PROKKA found in conf/modules.config and Nextflow scripts.
modules_config - BAKTA_BAKTADBDOWNLOAD found in conf/modules.config and Nextflow scripts.
modules_config - BAKTA_BAKTA found in conf/modules.config and Nextflow scripts.
modules_config - PRODIGAL found in conf/modules.config and Nextflow scripts.
modules_config - PYRODIGAL found in conf/modules.config and Nextflow scripts.
modules_config - ABRICATE_RUN found in conf/modules.config and Nextflow scripts.
modules_config - AMRFINDERPLUS_UPDATE found in conf/modules.config and Nextflow scripts.
modules_config - AMRFINDERPLUS_RUN found in conf/modules.config and Nextflow scripts.
modules_config - DEEPARG_DOWNLOADDATA found in conf/modules.config and Nextflow scripts.
modules_config - DEEPARG_PREDICT found in conf/modules.config and Nextflow scripts.
modules_config - FARGENE found in conf/modules.config and Nextflow scripts.
modules_config - UNTAR_CARD found in conf/modules.config and Nextflow scripts.
modules_config - RGI_CARDANNOTATION found in conf/modules.config and Nextflow scripts.
modules_config - RGI_MAIN found in conf/modules.config and Nextflow scripts.
modules_config - AMPIR found in conf/modules.config and Nextflow scripts.
modules_config - AMPLIFY_PREDICT found in conf/modules.config and Nextflow scripts.
modules_config - AMP_HMMER_HMMSEARCH found in conf/modules.config and Nextflow scripts.
modules_config - MACREL_CONTIGS found in conf/modules.config and Nextflow scripts.
modules_config - BGC_HMMER_HMMSEARCH found in conf/modules.config and Nextflow scripts.
modules_config - ANTISMASH_ANTISMASHLITE found in conf/modules.config and Nextflow scripts.
modules_config - ANTISMASH_ANTISMASHLITEDOWNLOADDATABASES found in conf/modules.config and Nextflow scripts.
modules_config - DEEPBGC_DOWNLOAD found in conf/modules.config and Nextflow scripts.
modules_config - DEEPBGC_PIPELINE found in conf/modules.config and Nextflow scripts.
modules_config - GECCO_RUN found in conf/modules.config and Nextflow scripts.
modules_config - HAMRONIZATION_ABRICATE found in conf/modules.config and Nextflow scripts.
modules_config - HAMRONIZATION_AMRFINDERPLUS found in conf/modules.config and Nextflow scripts.
modules_config - HAMRONIZATION_DEEPARG found in conf/modules.config and Nextflow scripts.
modules_config - HAMRONIZATION_RGI found in conf/modules.config and Nextflow scripts.
modules_config - HAMRONIZATION_FARGENE found in conf/modules.config and Nextflow scripts.
modules_config - HAMRONIZATION_SUMMARIZE found in conf/modules.config and Nextflow scripts.
modules_config - MERGE_TAXONOMY_HAMRONIZATION found in conf/modules.config and Nextflow scripts.
modules_config - ARG_TABIX_BGZIP found in conf/modules.config and Nextflow scripts.
modules_config - AMPCOMBI found in conf/modules.config and Nextflow scripts.
modules_config - MERGE_TAXONOMY_AMPCOMBI found in conf/modules.config and Nextflow scripts.
modules_config - AMP_TABIX_BGZIP found in conf/modules.config and Nextflow scripts.
modules_config - COMBGC found in conf/modules.config and Nextflow scripts.
modules_config - MERGE_TAXONOMY_COMBGC found in conf/modules.config and Nextflow scripts.
modules_config - BGC_TABIX_BGZIP found in conf/modules.config and Nextflow scripts.
modules_config - DRAMP_DOWNLOAD found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-05-22 14:02:15

…verison (as reported on antiSMASH github)

docs/usage.md

Co-authored-by: James A. Fellows Yates <[email protected]>

jfy133

You need to change feature -> gbk everywhere if we want to go with that route, including also the schema_input.tsv etc, and would also have to update the test-data samplesheet files 😬

Darcy220606

Just few comments to consider :) but good job 💯

Darcy220606 · 2024-04-25T08:12:16Z

workflows/funcscan.nf

+                                meta, files ->
+                                    def fasta_found   = files.find{it.toString().tokenize('.').last().matches('fasta|fas|fna|fa')}
+                                    def faa_found     = files.find{it.toString().endsWith('.faa')}
+                                    def gbk_found     = files.find{it.toString().tokenize('.').last().matches('gbk')}


Should also add 'gbk|gbff' also as that is the gbk extension output for bakta.

Ill add it here as i couldnt comment on it for some reason :D
So in line 188

// TODO: Only NT at the moment. AA tax. classification will be added only when its PR is merged.

remove this because now that the user supllies always both fasta and gbk files for the preannotated track, we dont need to update the taxonomy workflow ;)

FFFFF bakta!? WHY!?

Oooh that's a good question.. currently only FASTAs go to taxonomy... I can't remember if that's preanno ones too

Wait, the user doesn't always supply GBK, I don't understand now...?

Darcy220606 · 2024-04-25T08:20:38Z

workflows/funcscan.nf

    ch_versions = ch_versions.mix( SEQKIT_SEQ_LONG.out.versions )
    ch_versions = ch_versions.mix( SEQKIT_SEQ_SHORT.out.versions )

-    ch_prepped_input_long = SEQKIT_SEQ_LONG.out.fastx
+    ch_intermediate_input_long = SEQKIT_SEQ_LONG.out.fastx
                                .map{ meta, file -> [ meta + [id: meta.id + '_long', length: "long" ], file ] }


Did you check the actual final output files (.tsv) with the test_taxonomy.config ? Do please correct me if im wrong with understanding the workflow now, so we are now renaming the meta.ids with suffixes _longand _short, will that not interfere with adding the taxonomy to the right files because taxonomy is added according to not only contig but sample_id too which comes from the meta_id.

No, which is why I asked you to test 😆

If I understand your question correctly:

meta.id indeeds comes from sample, so updating meta is replacing the sample_id with the suffix

But the taxonomy workflow takes input from after this renaming, so it should be OK i hope? Please run the branch and check 😬

With regards to the summary and taxonomy :

ampcombi: it doesnt output all samples (but that will be fixed when i put in the new ampcombi submodules). Taxonomy is merged

combgc: it doesnt consider the samples_1 and 2 that are preannotated, only takes those from the annotatation step in the final report, which i dont understand why. Taxonomy is merged

hamronization: Works perfectly.

So either therees a problem with this collectfile() function fro both ampcombi/combgc or somethings up with the publishdir path, that every time the pipeline stops and resumes it rewrites the final report.

jfy133 · 2024-04-26T07:54:16Z

workflows/funcscan.nf

    ch_versions = ch_versions.mix( SEQKIT_SEQ_LONG.out.versions )
    ch_versions = ch_versions.mix( SEQKIT_SEQ_SHORT.out.versions )

-    ch_prepped_input_long = SEQKIT_SEQ_LONG.out.fastx
+    ch_intermediate_input_long = SEQKIT_SEQ_LONG.out.fastx
                                .map{ meta, file -> [ meta + [id: meta.id + '_long', length: "long" ], file ] }


No, which is why I asked you to test 😆

If I understand your question correctly:

meta.id indeeds comes from sample, so updating meta is replacing the sample_id with the suffix

But the taxonomy workflow takes input from after this renaming, so it should be OK i hope? Please run the branch and check 😬

jfy133 · 2024-04-26T07:57:53Z

workflows/funcscan.nf

+                                meta, files ->
+                                    def fasta_found   = files.find{it.toString().tokenize('.').last().matches('fasta|fas|fna|fa')}
+                                    def faa_found     = files.find{it.toString().endsWith('.faa')}
+                                    def gbk_found     = files.find{it.toString().tokenize('.').last().matches('gbk')}


FFFFF bakta!? WHY!?

workflows/funcscan.nf

jfy133 · 2024-04-26T08:00:38Z

workflows/funcscan.nf

+                                meta, files ->
+                                    def fasta_found   = files.find{it.toString().tokenize('.').last().matches('fasta|fas|fna|fa')}
+                                    def faa_found     = files.find{it.toString().endsWith('.faa')}
+                                    def gbk_found     = files.find{it.toString().tokenize('.').last().matches('gbk')}


Oooh that's a good question.. currently only FASTAs go to taxonomy... I can't remember if that's preanno ones too

workflows/funcscan.nf

Darcy220606 · 2024-05-01T12:59:06Z

workflows/funcscan.nf

    if ( params.run_taxa_classification ) {
-            TAXA_CLASS ( ch_prepped_input )
+            TAXA_CLASS ( ch_prepped_input.fastas )
            ch_versions     = ch_versions.mix( TAXA_CLASS.out.versions )


Suggested change

if ( params.run_taxa_classification ) {

TAXA_CLASS ( ch_prepped_input )

TAXA_CLASS ( ch_prepped_input.fastas )

ch_versions = ch_versions.mix( TAXA_CLASS.out.versions )

ch_intermediate_fasta_for_taxa = ch_intermediate_input.fastas.map{ meta, fasta, faa, gbk -> [ meta, fasta ] }

.mix(ch_intermediate_input.preannotated.map{ meta, fasta, faa, gbk -> [ meta, fasta ] })

if ( params.run_taxa_classification ) {

TAXA_CLASS ( ch_intermediate_fasta_for_taxa )

ch_versions = ch_versions.mix( TAXA_CLASS.out.versions )

Darcy220606 · 2024-05-01T14:38:54Z

workflows/funcscan.nf

    ch_versions = ch_versions.mix( SEQKIT_SEQ_LONG.out.versions )
    ch_versions = ch_versions.mix( SEQKIT_SEQ_SHORT.out.versions )

-    ch_prepped_input_long = SEQKIT_SEQ_LONG.out.fastx
+    ch_intermediate_input_long = SEQKIT_SEQ_LONG.out.fastx
                                .map{ meta, file -> [ meta + [id: meta.id + '_long', length: "long" ], file ] }


With regards to the summary and taxonomy :

ampcombi: it doesnt output all samples (but that will be fixed when i put in the new ampcombi submodules). Taxonomy is merged

combgc: it doesnt consider the samples_1 and 2 that are preannotated, only takes those from the annotatation step in the final report, which i dont understand why. Taxonomy is merged

hamronization: Works perfectly.

So either therees a problem with this collectfile() function fro both ampcombi/combgc or somethings up with the publishdir path, that every time the pipeline stops and resumes it rewrites the final report.

…t executing

jfy133 · 2024-05-15T09:02:11Z

…hannels for CREATETSV

…g fastas taxonom results to BGC

jasmezz · 2024-06-12T12:23:36Z

Superseded by #381 🙃

jfy133 and others added 18 commits April 26, 2023 11:08

Start modifying samplesheet check (untested)

e280092

Made the samplesheet work if columns are existing (python error if th…

9fbf52c

…e new columns not present)

Continue work

cf2b8b7

Apply suggestions from code review

80106d8

Co-authored-by: Moritz E. Beber <[email protected]>

Get most of the log working, needs more testing (particularly non FAA…

1b60b24

… subwkflws)

Merge branch 'dev' into presupplied-orfs

348b16c

Sync latest dev changes from annotation into workflow

43ee615

Update all modules to get right container version and also pyRodigal …

183b81a

…with gzip support

Merge branch 'presupplied-orfs' of github.com:nf-core/funcscan into p…

258a49a

…resupplied-orfs

Add test nothing config

a7b03a7

Merge branch 'dev' into presupplied-orfs

29670ab

Get back to previous starting point before bad merge removed old changes

ac0a25d

Merge branch 'dev' into presupplied-orfs

da993dd

Refactor - have amp/arg working. Includes better fargene tagging

4370106

Have it working

7eb5bed

Add test profile and docs

28a50ee

Include preanntotaed files in one of the CI runs

9caae3e

Fix prettier linting

23e3929

jfy133 and others added 11 commits February 14, 2024 12:09

Fix ci command

663a7e9

Fix BAKTA to multiqc channel name

6177c90

Add a preannotated test to BGC workflows

bf0f572

Make preannotated bgc config accesible

ea555ab

Install newer version of antismash to see if it'll work with the GFF …

80c5b0c

…verison (as reported on antiSMASH github)

Use correct dummy files

106b76e

Add warning about Prokka GBK/GFF

c25cab1

Merge branch 'dev' into presupplied-orfs

89fdf9a

Wrapping my head around it

f9f808d

Excluded GFF support, fixed multiqc report, update variables etc.

9b483ac

Merge branch 'dev' into presupplied-orfs

55224f1

jasmezz reviewed Apr 10, 2024

View reviewed changes

docs/usage.md Outdated Show resolved Hide resolved

jasmezz and others added 2 commits April 10, 2024 11:47

Apply suggestions from code review

a8716f8

Co-authored-by: James A. Fellows Yates <[email protected]>

Apply suggestions from code review, fix linting

fd52fee

jfy133 commented Apr 10, 2024

View reviewed changes

jasmezz and others added 9 commits April 10, 2024 15:57

Change feature to gbk, remove gff from docs

4e0a61f

Merge branch 'dev' into presupplied-orfs

de6bd40

Fix "feature" renaming to "gbk"

b5fc8f4

Fix linting

95f8fb5

Merge branch 'dev' into presupplied-orfs

de0a7bf

Fix variables

bf43049

Fix channels, missing warnin/docs about no splitting for preanno

0d2ef7c

Use correct GBK channel

7ed3594

Add log warning when BGC and preannotated input

619479a

Darcy220606 reviewed Apr 25, 2024

View reviewed changes

jfy133 commented Apr 26, 2024

View reviewed changes

Darcy220606 reviewed May 1, 2024

View reviewed changes

Start trying to fix taxonomy, not working yet as MMSEQS_TAXONOMYDB no…

85c4359

…t executing

jfy133 added 8 commits May 15, 2024 11:02

Add more GBK/GBFF updates

b11bed1

Remove dumps

800dff9

Merge branch 'dev' into presupplied-orfs

58787f6

Only do splitting when BGC workflow executed

50aa076

Fix taxonomy workflow from possibly getting async between two input c…

c419ce9

…hannels for CREATETSV

Fix prokka annotation MQC collection

d1d0177

Fix linting

8f1c7ba

Make it so deepBGC actually produces otutput, and START send only lon…

2d8b238

…g fastas taxonom results to BGC

jfy133 marked this pull request as draft May 29, 2024 06:17

jasmezz closed this Jun 12, 2024

jasmezz deleted the presupplied-orfs branch June 12, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for presupplied annotation files (FAA + GFF or FAA + GBK) #340

Add support for presupplied annotation files (FAA + GFF or FAA + GBK) #340

jfy133 commented Feb 14, 2024 •

edited by jasmezz

Loading

github-actions bot commented Feb 14, 2024 •

edited

Loading

✅ Tests passed:

Run details

jfy133 left a comment •

edited

Loading

Darcy220606 left a comment

Darcy220606 Apr 25, 2024

Darcy220606 Apr 25, 2024

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

jfy133 May 6, 2024

Darcy220606 Apr 25, 2024

jfy133 Apr 26, 2024

Darcy220606 May 1, 2024 •

edited

Loading

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

Darcy220606 May 1, 2024

Darcy220606 May 1, 2024 •

edited

Loading

jfy133 commented May 15, 2024 •

edited

Loading

jasmezz commented Jun 12, 2024

Add support for presupplied annotation files (FAA + GFF or FAA + GBK) #340

Add support for presupplied annotation files (FAA + GFF or FAA + GBK) #340

Conversation

jfy133 commented Feb 14, 2024 • edited by jasmezz Loading

PR checklist

github-actions bot commented Feb 14, 2024 • edited Loading

nf-core lint overall result: Passed ✅

✅ Tests passed:

Run details

jfy133 left a comment • edited Loading

Choose a reason for hiding this comment

Darcy220606 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darcy220606 May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darcy220606 May 1, 2024 • edited Loading

Choose a reason for hiding this comment

jfy133 commented May 15, 2024 • edited Loading

jasmezz commented Jun 12, 2024

jfy133 commented Feb 14, 2024 •

edited by jasmezz

Loading

github-actions bot commented Feb 14, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅

jfy133 left a comment •

edited

Loading

Darcy220606 May 1, 2024 •

edited

Loading

Darcy220606 May 1, 2024 •

edited

Loading

jfy133 commented May 15, 2024 •

edited

Loading