Releases: CFIA-NCFAD/nf-flu
3.9.0
This minor release adds Nextclade analysis of assembled Influenza genome sequences against 30 Nextclade Influenza-related datasets by default and updates the Influenza sequences used by nf-flu (downloaded from NCBI 2025-04-04; 809,739 unique sequences and metadata).
The specific Nextclade datasets and optionally versions (tags) can be configured with a headerless CSV file. Nextclade results are aggregrated across samples and datasets and filtered for positive results into a single Nextclade TSV (tab-separated values) report with additional fields capturing sample, dataset name and dataset version/tag information as well as Nextclade and Nextclade dataset specific results.
Changes
- update: Influenza sequences and metadata from NCBI (2025-04-04). 809,739 non-redundant, unique sequences were retrieved along with their metadata. Added documentation for how to update Influenza sequences for use with nf-flu (see docs/update_seqs_db.md)
- feat: added Nextclade (v3.12.0) analysis subworkflow against 30 Influenza-related Nextclade datasets with a convenient aggregation and summarization of useful results into a single Nextclade TSV report.
- update: GenoFLU 1.05 -> 1.06 (#112)
What's Changed
Full Changelog: 3.8.1...3.9.0
3.8.1
This patch release updates Clair3, the variant caller for Nanopore sequence data, to 1.0.11. Clair 1.0.11 adds an option to enable variant calling at the head and tail 16bp of each sequence (--enable_variant_calling_at_sequence_head_and_tail
). This option is enabled by default in the nf-flu workflow to ensure that the 16bp at the start and end of each of the 8 segments of IAV and IBV are variant called. It should be noted that the developers of Clair3 note that results are used "with caution because alignments are less reliable in the regions, and there would be insufficient context to be fed to the neural network for reliable calling".
A minor issue with the MultiQC report was also fixed where sample names were not cleaned properly. The .bcftools_filt
extension was added to extra_fn_clean_exts
in assets/multiqc_config.yaml
.
Changes
- fix: MultiQC report sample name cleaning. Added
.bcftools_filt
toextra_fn_clean_exts
inassets/multiqc_config.yaml
. - update: Clair3 1.0.10 -> 1.0.11
- fix: Clair3 not variant calling the ends of each segment enable variant calling at the head and tail 16bp of each sequence (
--enable_variant_calling_at_sequence_head_and_tail
) (#61) - dev: move Clair3 arguments and options to
conf/modules_nanopore.config
. This should allow users to change Clair3 options more easily using custom Nextflow config files (e.g.nextflow run CFIA-NCFAD/nf-flu -c clair3-custom.config ...
). - test: added nf-test for
clair3.nf
to with simulated test data for head and tail variant calling with the--enable_variant_calling_at_sequence_head_and_tail
option.
What's Changed
Full Changelog: 3.8.0...3.8.1
3.8.0
This release adds the --platform assemblies
mode for analysis of FASTA sequences along with --input /path/to/fasta-dir/
to specify the directory containing the FASTA sequences.
Changes
- feat: analysis of previously assembled IAV FASTA sequences with the addition of a new analysis mode via
--platform assemblies
. Use along with--input /path/to/fasta-dir/
to specify the directory containing the FASTA sequences. - fix:
bin/cleavage_site.py
short cleavage site index access error (#106) - fix:
cleavage_site.nf
version output issue (#105) - fix: low abundance indels appearing in consensus sequences despite major/minor allele frequency thresholds. Explicitly excluding non-SNP variants below the major allele fraction prior to consensus sequence generation with Bcftools consensus.
- fix: subtyping report issue with some poor quality IBV sequences (#107)
- dev: add nf-test for VCF filtering and consensus sequence generation from VCF with low AF indels.
- dev: replaced
vcf_filter_frameshift.py
with Bcftools filter commands.
What's Changed
Full Changelog: 3.7.0...3.8.0
3.7.0
This minor release adds GenoFLU for H5 genotyping and HA cleavage site output with VADR annotations. This release also adds a script to classify HA cleavage sites based on mono-/multibasicity and low/high pathogenicity.
Changes
- feat: GenoFLU v1.05 for H5 genotyping.
- feat: Added
--custom_flu_minfo
option to specify customflu.minfo
for VADR. The defaultflu.minfo
is the same as the VADR flu v1.6.3-2 model except that it includes cleavage site info. Feature table, GenBank and GFF files should now have amisc_feature
for HA cleavage site info. - feat:
bin/cleavage_site.py
to classify HA cleavage sites. - feat: Added VADR subtype prediction into subtyping report. VADR subtype predictions are pulled from the output
.mdl
files. - feat: Added subtyping report output directory containing CSV for each sheet in the Excel report.
- fix: MultiQC converts the general info table into a violin plot if there are more than 500 rows in the table by default. Added
max_table_rows: 1000000
tomultiqc_config.yaml
to avoid this conversion in most cases.
What's Changed
- Add GenoFLU and HA cleavage site prediction by @cerdelyan in #103
- Release 3.7.0 by @peterk87 in #104
New Contributors
- @cerdelyan made their first contribution in #103
Full Changelog: 3.6.2...3.7.0
3.6.2
This patch release fixes issues relating to subtype prediction (N5) (#100), Apptainer usage (#95) and IRMA read length threshold (#99).
Changes
- chore: renamed:
bin/parse_influenza_blast_results.py
->bin/subtyping_report.py
- fix: N5 sequences being typed as N1 due to the high proportion of lower % identity hits to N1 sequences (#100). In
bin/subtyping_report.py
, H and N subtype is predicted based on determining what the subtype is using the BLAST analysis results starting at a % identity threshold of 99% and decrementing by 1% until a subtype or the minimum % identity is reached (default: 85%). At least 3 hits are required to determine a subtype at a particular threshold. If no subtype is determined, the subtype is set to "N/A". - fix: added back missing results columns to subtyping report H and N subtyping sheets.
- fix: Added workflow parameter
--irma_min_len
to be able to change the minimum read length threshold for IRMA assembly (MIN_LEN
) and set default to 50 instead of 125. nf-flu should now be compatible with BGI sequencing data producing shorter paired-end reads by running with--platform illumina
(#99). - fix:
-profile apptainer
is functionally the same as-profile singularity
. The same configuration is set for the Apptainer profile as for the Singularity one. If a user has Apptainer installed, running$ singularity ...
and$ apptainer ...
should be equivalent, e.g. both$ apptainer --version
and$ singularity --version
produceapptainer version 1.3.6
. (#95) - ci: updated ci.yml for better cache handling and inter-job caching of VADR flu model tar.gz. Updated latest version of Nextflow to test from 24.04.4 to 24.10.3.
- config:
--max_top_blastn
default changed 3 -> 5. Top 5 BLASTN hits will be shown for each segment for each sample in subtyping report.
What's Changed
Full Changelog: 3.6.1...3.6.2
3.6.1
This patch release fixes an issue with Clair3 not producing variant calls for some regions due to full-alignment not being triggered. This issue was resolved by adding --var_pct_phasing=1
, --var_pct_full=1
and --ref_pct_full=1
to the Clair3 command line.
Changes
- fix: Added
--var_pct_phasing=1
,--var_pct_full=1
and--ref_pct_full=1
to Clair3 command line to ensure full-alignment is triggered for all reads to avoid missing variant calls in some regions. - fix: Added
stageAs: "input*/*"
toCAT_NANOPORE_FASTQ
process input channels to ensure that input files are not concatenated with themselves in an infinite loop until disk space is exhausted in rare cases. - feat: Don't save NCBI Influenza reference sequences, metadata CSV and BLAST DB to the output directory by default. Added
--save_ncbi_db
and--save_blastdb
workflow params to save these files to the output directory if desired. - docs: Updated README.md to mention Apptainer. Updated
usage.md
to describe new workflow params. Updatedoutput.md
to better describe BLAST subtyping results.
What's Changed
Full Changelog: 3.6.0...3.6.1
3.6.0
This minor release adds FluMut to "to search for molecular markers with potential impact on the biological characteristics of Influenza A viruses of the A(H5N1) subtype."
Changes
- feat: Added FluMut (v0.6.3)
What's Changed
Full Changelog: 3.5.3...3.6.0
3.5.3
This patch release fixes an issue (#22) with Illumina paired-end read analysis by IRMA producing empty consensus sequences when the forward and reverse reads do not contain "1:N:0:." or "2:N:0:." in the FASTQ header lines.
What's Changed
- Fix issue with Illumina PE read headers for IRMA analysis by @peterk87 in #90
- Release 3.5.3 by @peterk87 in #91
Full Changelog: 3.5.2...3.5.3
3.5.2
This patch release fixes a few issues when running the pipeline.
Changes
- fix: better handling of empty IRMA consensus sequences to avoid downstream analysis errors with VADR and BLASTN (peterk87/nf-flu #22)
- fix: Clair3
versions.yml
indentation issue (#87) - fix: removed capturing of cat and gzip versions in CAT_ILLUMINA_FASTQ process (#46) to avoid issue in some execution environments.
- docs: update README.md
What's Changed
Full Changelog: 3.5.1...3.5.2
3.5.1
This patch release fixes an issue (#84) with long sample names (over 50 characters) causing VADR to fail. --noseqnamemax
has been added to the default arguments for VADR to avoid this issue.
Changes
- fix: Added
--noseqnamemax
to VADR default arguments to avoid issues with long sample names causing VADR to fail. - config: Output directory paths for IRMA and Bcftools consensus VADR annotation results were made more explicit and clear for the Illumina workflow.
What's Changed
Full Changelog: 3.5.0...3.5.1