Merge pull request #96 from LeandroRitter/main

jfy133 · web-flow · commit 98f2f8079093 · 2024-08-01T15:24:43.000+02:00
added some edits to the aMeta section of the book
diff --git a/ancient-metagenomic-pipelines.qmd b/ancient-metagenomic-pipelines.qmd
@@ -550,8 +550,8 @@ For every hit that the reads match inside this database, then sees the genome of
 Rather than the very computationally heavy HOPS pipeline [@Hubler2019-qw], that requires extremely large computational nodes with large RAM (>1 TB) to load MALT databases into memory, aMeta does this via a two step approach.
 Firstly it uses `KrakenUniq` [a k-mer based and thus memory efficient method, @Breitwieser2018-xg] to do a screening of sequencing reads against a broad generalised microbial database.
 Once all the possible taxa have been detected, aMeta will then make a new database of just the genomes of the taxa that were reported from `KrakenUniq` (i.e.
-a specific database) but using `MALT`.
-`MALT` on thus much reduced database is then used to perform computationally much heavier alignment against the reference genomes and LCA taxonomic reassignment.
+a project-specific database) but using `MALT.
+Then aMeta will use `MALT` on thus much reduced database to perform computationally much heavier alignment against the reference genomes and LCA taxonomic reassignment.
 The output from `MALT` is then sent to the `MaltExtract` program of the HOPS pipeline for ancient DNA authentication statistics.
 
 ### Running aMeta
@@ -673,6 +673,10 @@ foo	data/foo.fq.gz
 Make sure when copy pasting into our test editor, tabs are not replaced with spaces, otherwise the file might not be read!
 :::
 
+:::{.callout-warning}
+aMeta (v1.0.0) currently only supports single-end or pre-merged- data only!
+:::
+
 Then we need to write a config file.
 This tells aMeta where to find things such as database files and other settings.
 
@@ -701,23 +705,11 @@ n_unique_kmers: 1000
 n_tax_reads: 200
 ```
 
-And make a two column samplesheet file with the following content in a file called `samples.tsv`, also under `configs/`.
-
-```tsv
-sample	fastq
-foo	data/foo.fq.gz
-bar	data/bar.fq.gz
-```
-
-:::{.callout-warning}
-aMeta (v1.0.0) currently only supports single-end or pre-merged- data only!
-:::
-
 Once this config file is generated, we can start the run.
 
 
 :::{.callout-note}
-As this is only a dummy run (due to the large-ish computational resources required for KrakenUniq), we re-use some of the resource files here.
+As this is only a dummy run (due to the large-ish computational resources required for `MALT`), we re-use some of the resource files here.
 While this will produce nonsense output, it is used here to demonstrate how we would execute the pipeline.
 :::
 
@@ -775,7 +767,7 @@ Numbers in the coloured cells also provide the direct score number.](assets/imag
 The heatmap demonstrates microbial species (in rows) authenticated for each sample (in columns) ([@fig-ancientmetagenomicpipelines-ametaoutput]).
 
 The colors and the numbers in the heatmap represent authentications scores, i.e.
-numeric quantification of seven quality metrics that provide information about microbial presence and ancient status.
+numeric quantification of eight quality metrics that provide information about microbial presence and ancient status.
 
 The authentication scores can vary from 0 to 10, the higher is the score the more likely that a microbe is present in a sample and is ancient.
 
@@ -798,6 +790,7 @@ To visually examine the seven quality metrics
 - Read length distribution
 - PMD scores distribution
 - Number of assigned reads (depth of coverage)
+- average nucleotide identity (ANI)
 
 Corresponding to the numbers and colors of the heatmap, one can find them in `results/AUTHENTICATION/sampleID/taxID/authentic_<Sample>_<sampleID>.trimmed.rma6_<TaxID>_<taxID>.pdf` for each sample sampleID and each authenticated microbe taxID.
 An example of such quality metrics is shown below in @fig-ancientmetagenomicpipelines-persampleplot.
@@ -1148,4 +1141,4 @@ conda remove --name ancient-metagenomic-data --all -y
 2. How can the design of pipelines such as nf-core/eager pipeline help researchers comply with the FAIR principles for management of scientific data?
 3. What metrics do we use to evaluate the success/failure of ancient DNA sequencing experiments? How can these measures be evaluated when using nf-core/eager for data preprocessing and analysis?
 
-## References
+## References