Skip to content

5. Output files

Håkon Kaspersen edited this page Mar 29, 2023 · 5 revisions

Output file descriptions

Three output directories are generated: versions, logs, and results. The versions directory contains the versions of each tool, while the logs directory contain the nextflow run report and the logs from each tool. Finally, the files listed below are placed in the results folder. Each track has its own html report file outputted in the results directory.

ANI track
FASTANI_results.txt:                    The raw FastANI results
FASTANI_cleaned.txt:                    Filtered and de-duplicated FastANI results

cgMLST track
MLST_report.txt:                        MLST results, including sequence type and individual allele results
CHEWBBACA_schema:                       Prepped or downloaded schema
CHEWBBACA_schema_evaluation:            Schema evaluation results
CHEWBBACA_results_statistics.tsv:       Overview of allelecalling stats
CHEWBBACA_results_alleles.tsv:          Main results file from ChewBBACA Allelecall
CHEWBBACA_loci_stats.tsv:               Allelecalling stats for each loci in the schema
CHEWBBACA_unclassified_sequences.fasta: Distinct CDSs that were not classified
CHEWBBACA_filtered_allele_results.tsv:  Filtered allele results based on --max_missing
R_dissimilarity_matrix.tsv:             Dissimilarity matrix based on the filtered allele results
R_hamming_distances.tsv:                Hamming distances, number of called and compared alleles, and number of missing alleles for each pairwise comparison
R_dendrogram.phylo:                     A dendrogram of the dissimilarity matrix based on the user-selected clustering method

Core gene track
PANAROO_mashdist.txt:                   Mash distances, from PANAROO QC.
MDS_mash_plot.png:                      MDS plot of the mash results, from PANAROO QC. 
ncontigs_barplot.png:                   Number of contigs for each included genome, from PANAROO QC.
ngenes_barplot.png:                     Number of genes per included genome, from PANAROO PANGENOME.
PANAROO_pangenome_results.txt:          Number of core genes, accessory genes and total genes, from PANAROO PANGENOME.
PANAROO_core_gene_alignment.aln:        The concatenated core gene alignment of all included genomes in FASTA format, from PANAROO PANGENOME.

Core genome track
PARSNP_alignment.aln:                   The converted FASTA alignment of all included genomes, from PARSNP.
PARSNP_gingr_archive.ggr:               Gingr-file used for visualizing the alignment, from PARSNP.
PARSNP_results.txt:                     Result file that contains information about the size (bp) and the percent coverage of each genome, from PARSNP.
PARSNP_unaligned.txt                    Unaligned regions not included in the core genome alignment.

Mapping track
SNIPPY_alignment.aln:                   Core genome multiple alignment (reconstituted, see Snippy multi).
SNIPPY_results.txt:                     File that contains the result statistics, from SNIPPY.

General output files
SEQKIT_deduplicated_alignment.fasta:    Deduplicated alignment in FASTA format, from SEQKIT.
SEQKIT_deduplicated_sequences.fasta:    The FASTA sequence of each duplicated genome, from SEQKIT.
SEQKIT_duplicated_list.txt:             List of duplicated sequence IDs, from SEQKIT. Each line represents each group of identical samples.
GUBBINS_filtered_alignment.aln:         Alignment with recombinant sites removed in FASTA format, from GUBBINS. 
GUBBINS_statistics.txt:                 Text file with the result statistics, from GUBBINS.
MASKRC_masked_alignment.aln:            Alignment with masked recombinant sites in FASTA format, from MASKRC.
MASKRC_recombinant_regions.txt:         Text file with information about the recombinant regions, from MASKRC.
MASKRC_recombinant_plot.svg:            Genomic location of recombination according to the alignment, from MASKRC.
IQTREE_tree.phylo:                      Concensus tree with bootstrap values in NEXUS format, from IQTREE.
IQTREE_ml_tree.phylo:                   Maximum likelihood tree in NEXUS format, from IQTREE.
IQTREE_results.txt:                     Text file with results, from IQTREE.
IQTREE_bootstrap_trees.ufboot:          Bootstrap values with branch lengths, from IQTREE
IQTREE_alninfo.txt:                     Alignment information, from IQTREE
IQTREE_splits.nex:                      Support values in percentages for all splits
SNPDIST_results.txt:                    The SNP distance matrix calculated from the (recombination-masked) alignment, from SNPDIST.

Clone this wiki locally