You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ancient-metagenomic-pipelines.qmd
+10-17Lines changed: 10 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -550,8 +550,8 @@ For every hit that the reads match inside this database, then sees the genome of
550
550
Rather than the very computationally heavy HOPS pipeline [@Hubler2019-qw], that requires extremely large computational nodes with large RAM (>1 TB) to load MALT databases into memory, aMeta does this via a two step approach.
551
551
Firstly it uses `KrakenUniq` [a k-mer based and thus memory efficient method, @Breitwieser2018-xg] to do a screening of sequencing reads against a broad generalised microbial database.
552
552
Once all the possible taxa have been detected, aMeta will then make a new database of just the genomes of the taxa that were reported from `KrakenUniq` (i.e.
553
-
a specific database) but using `MALT`.
554
-
`MALT` on thus much reduced database is then used to perform computationally much heavier alignment against the reference genomes and LCA taxonomic reassignment.
553
+
a project-specific database) but using `MALT.
554
+
Then aMeta will use `MALT` on thus much reduced database to perform computationally much heavier alignment against the reference genomes and LCA taxonomic reassignment.
555
555
The output from `MALT` is then sent to the `MaltExtract` program of the HOPS pipeline for ancient DNA authentication statistics.
556
556
557
557
### Running aMeta
@@ -673,6 +673,10 @@ foo data/foo.fq.gz
673
673
Make sure when copy pasting into our test editor, tabs are not replaced with spaces, otherwise the file might not be read!
674
674
:::
675
675
676
+
:::{.callout-warning}
677
+
aMeta (v1.0.0) currently only supports single-end or pre-merged- data only!
678
+
:::
679
+
676
680
Then we need to write a config file.
677
681
This tells aMeta where to find things such as database files and other settings.
678
682
@@ -701,23 +705,11 @@ n_unique_kmers: 1000
701
705
n_tax_reads: 200
702
706
```
703
707
704
-
And make a two column samplesheet file with the following content in a file called `samples.tsv`, also under `configs/`.
705
-
706
-
```tsv
707
-
sample fastq
708
-
foo data/foo.fq.gz
709
-
bar data/bar.fq.gz
710
-
```
711
-
712
-
:::{.callout-warning}
713
-
aMeta (v1.0.0) currently only supports single-end or pre-merged- data only!
714
-
:::
715
-
716
708
Once this config file is generated, we can start the run.
717
709
718
710
719
711
:::{.callout-note}
720
-
As this is only a dummy run (due to the large-ish computational resources required forKrakenUniq), we re-use some of the resource files here.
712
+
As this is only a dummy run (due to the large-ish computational resources required for`MALT`), we re-use some of the resource files here.
721
713
While this will produce nonsense output, it is used here to demonstrate how we would execute the pipeline.
722
714
:::
723
715
@@ -775,7 +767,7 @@ Numbers in the coloured cells also provide the direct score number.](assets/imag
775
767
The heatmap demonstrates microbial species (in rows) authenticated for each sample (in columns) ([@fig-ancientmetagenomicpipelines-ametaoutput]).
776
768
777
769
The colors and the numbers in the heatmap represent authentications scores, i.e.
778
-
numeric quantification of seven quality metrics that provide information about microbial presence and ancient status.
770
+
numeric quantification of eight quality metrics that provide information about microbial presence and ancient status.
779
771
780
772
The authentication scores can vary from 0 to 10, the higher is the score the more likely that a microbe is present in a sample and is ancient.
781
773
@@ -798,6 +790,7 @@ To visually examine the seven quality metrics
798
790
- Read length distribution
799
791
- PMD scores distribution
800
792
- Number of assigned reads (depth of coverage)
793
+
- average nucleotide identity (ANI)
801
794
802
795
Corresponding to the numbers and colors of the heatmap, one can find them in `results/AUTHENTICATION/sampleID/taxID/authentic_<Sample>_<sampleID>.trimmed.rma6_<TaxID>_<taxID>.pdf` for each sample sampleID and each authenticated microbe taxID.
803
796
An example of such quality metrics is shown below in @fig-ancientmetagenomicpipelines-persampleplot.
2. How can the design of pipelines such as nf-core/eager pipeline help researchers comply with the FAIR principles for management of scientific data?
1149
1142
3. What metrics do we use to evaluate the success/failure of ancient DNA sequencing experiments? How can these measures be evaluated when using nf-core/eager for data preprocessing and analysis?
0 commit comments