Further minior formatting

jfy133 · jfy133 · commit 29f050fbcc2e · 2024-08-06T09:53:43.000Z
diff --git a/genome-mapping.qmd b/genome-mapping.qmd
@@ -7,14 +7,14 @@ bibliography: assets/references/genome-mapping.bib
 :::{.callout-note collapse="true" title="Self guided: chapter environment setup"}
 For this chapter's exercises, if not already performed, you will need to download the chapter's dataset, decompress the archive, and create and activate the conda environment.
 
-Do this, use `wget` or right click and save to download this Zenodo archive: [10.5281/zenodo.8413204](https://doi.org/10.5281/zenodo.8413204), and unpack
+Do this, use `wget` or right click and save to download this Zenodo archive: [10.5281/zenodo.8413204](https://doi.org/10.5281/zenodo.8413204), and unpack.
 
 ```bash
 tar xvf genome-mapping.tar.gz 
 cd genome-mapping/
 ```
 
-You can then create the subsequently activate environment with
+You can then create the subsequently activate environment with the following.
 
 ```bash
 conda env create -f genome-mapping.yml
@@ -58,13 +58,13 @@ The first index we will generate is for `bwa`.
 bwa index YpestisCO92.fa
 ```
 
-The second index will be used by the genome browser we will apply to our results later on:
+The second index will be used by the genome browser we will apply to our results later on.
 
 ```bash
 samtools faidx YpestisCO92.fa
 ```
 
-We need to build a third index that is necessary for the genotyping step, which comes later after mapping:
+We need to build a third index that is necessary for the genotyping step, which comes later after mapping.
 
 ```bash
 picard CreateSequenceDictionary R=YpestisCO92.fa
@@ -96,7 +96,7 @@ Long seed length → `-l 32`
 We will be working with pre-processed files (`sample1.fastq.gz`, `sample2.fastq.gz`), i.e. any quality filtering and removal of sequencing adapters is already done.
 
 We will map each file once with lenient and once with strict parameters.
-For this, we will make 4 separate directories, to avoid mixing up files:
+For this, we will make 4 separate directories, to avoid mixing up files.
 
 ```bash
 mkdir sample1_lenient sample2_lenient sample1_strict sample2_strict
@@ -106,19 +106,19 @@ mkdir sample1_lenient sample2_lenient sample1_strict sample2_strict
 
 Let’s begin with a lenient mapping of sample1.
 
-Go into the corresponding folder:
+Go into the corresponding folder.
 
 ```bash
 cd sample1_lenient
 ```
 
-Perform the `bwa` alignment, here for sample1, and specify lenient mapping parameters:
+Perform the `bwa` alignment, here for sample1, and specify lenient mapping parameters.
 
 ```bash
 bwa aln -n 0.01 -l 16 ../YpestisCO92.fa ../sample1.fastq.gz > reads_file.sai
 ```
 
-Proceed with writing the mapping in `sam` format [@Li2009, [https://en.wikipedia.org/wiki/SAM\_(file_format)](<https://en.wikipedia.org/wiki/SAM_(file_format)>)]:
+Proceed with writing the mapping in `sam` format [@Li2009, [https://en.wikipedia.org/wiki/SAM\_(file_format)](<https://en.wikipedia.org/wiki/SAM_(file_format)>)].
 
 ```bash
 bwa samse -r '@RG\tID:all\tLB:NA\tPL:illumina\tPU:NA\tSM:NA' ../YpestisCO92.fa reads_file.sai ../sample1.fastq.gz > reads_mapped.sam
@@ -127,7 +127,7 @@ bwa samse -r '@RG\tID:all\tLB:NA\tPL:illumina\tPU:NA\tSM:NA' ../YpestisCO92.fa r
 Note that we have specified the sequencing platform (Illumina) by creating a so-called "Read Group" (`-r`).
 This information is used later during the genotyping step.
 
-Convert SAM file to binary format (BAM file):
+Convert SAM file to binary format (BAM file).
 
 ```bash
 samtools view -b -S reads_mapped.sam > reads_mapped.bam
@@ -136,21 +136,21 @@ samtools view -b -S reads_mapped.sam > reads_mapped.bam
 For processing of `sam` and `bam` files we use `samtools` [@Li2009, [https://github.com/samtools/samtools](https://github.com/samtools/samtools)].
 
 `-b` specifies to output in BAM format.
-(`-S` specifies input is SAM, can be omitted in recent versions.)
+(`-S` specifies input is SAM, can be omitted in recent versions).
 
-Now we sort the `bam` file → Sort alignments by leftmost coordinates:
+Now we sort the `bam` file → Sort alignments by leftmost coordinates.
 
 ```bash
 samtools sort reads_mapped.bam > reads_mapped_sorted.bam
 ```
 
-The sorted bam file needs to be indexed → more efficient for further processing:
+The sorted bam file needs to be indexed → more efficient for further processing.
 
 ```bash
 samtools index reads_mapped_sorted.bam
 ```
 
-Deduplication → Removal of reads from duplicated fragments:
+Deduplication → Removal of reads from duplicated fragments.
 
 ```bash
 samtools rmdup -s reads_mapped_sorted.bam reads_mapped_sorted_dedup.bam
@@ -163,15 +163,15 @@ samtools index reads_mapped_sorted_dedup.bam
 Duplicated reads are usually a consequence of amplification of the DNA fragments in the lab. Therefore, they are not biologically meaningful.
 
 We have now completed the mapping procedure.
-Let's have a look at our mapping results:
+Let's have a look at our mapping results.
 
 ```bash
 samtools view reads_mapped_sorted_dedup.bam | less -S
 ```
 
 (exit by pressing <kbd>q</kbd>)
 
-We can also get a summary about the number of mapped reads. For this we use the `samtools idxstats` command ([http://www.htslib.org/doc/samtools-idxstats.html](http://www.htslib.org/doc/samtools-idxstats.html)):
+We can also get a summary about the number of mapped reads. For this we use the `samtools idxstats` command ([http://www.htslib.org/doc/samtools-idxstats.html](http://www.htslib.org/doc/samtools-idxstats.html)).
 
 ```bash
 samtools idxstats reads_mapped_sorted_dedup.bam
@@ -184,7 +184,7 @@ For this we use the 'Genome Analysis Toolkit' (`gatk`) [@DePristo2011, [http://w
 
 It uses the reference genome and the mapping as input and produces an output in 'Variant Call Format (VCF)' ([https://en.wikipedia.org/wiki/Variant_Call_Format](https://en.wikipedia.org/wiki/Variant_Call_Format)).
 
-Perform genotyping on the mapping file:
+Perform genotyping on the mapping file.
 
 ```bash
 gatk3 -T UnifiedGenotyper -R ../YpestisCO92.fa -I reads_mapped_sorted_dedup.bam --output_mode EMIT_ALL_SITES -o mysnps.vcf
@@ -286,31 +286,31 @@ For this we will use the software
 It produces various output files and summary statistics and can integrate gene annotations for SNP effect analysis as done by the program `SnpEff` [@Cingolani2012, [https://github.com/pcingola/SnpEff](https://github.com/pcingola/SnpEff)].
 
 Run `multivcfanalyzer` on all 4 files at once.
-First `cd` one level up (if you type `ls` you should see your 4 directories, reference, etc.):
+First `cd` one level up (if you type `ls` you should see your 4 directories, reference, etc.).
 
 ```bash
 cd ..
 ```
 
-Then make a new directory…
+Then make a new directory.
 
 ```bash
 mkdir vcf_out
 ```
 
-…and run the programme:
+And run the programme.
 
 ```bash
 multivcfanalyzer NA YpestisCO92.fa NA vcf_out F 30 3 0.9 0.9 NA sample1_lenient/mysnps.vcf sample1_strict/mysnps.vcf sample2_lenient/mysnps.vcf sample2_strict/mysnps.vcf
 ```
 
-Let’s have a look in the `vcf_out/` directory (`cd` into it):
+Let’s have a look in the `vcf_out/` directory (`cd` into it).
 
 ```bash
 cd vcf_out
 ```
 
-Check the parameters we set earlier:
+Check the parameters we set earlier.
 
 ```bash
 less -S info.txt
@@ -326,7 +326,7 @@ less -S snpStatistics.tsv
 
 (exit by pressing <kbd>q</kbd>)
 
-The file content should look like this:
+The file content should look like this.
 
 ```bash
 SNP statistics for 4 samples.
@@ -353,7 +353,7 @@ The first column contains the dataset name and the second column the number of c
 For visual exploration of mapping results so-called "Genome Browsers" are used.
 Here we will use the 'Integrative Genomics Viewer' (`igv`)' ([https://software.broadinstitute.org/software/igv/](https://software.broadinstitute.org/software/igv/)).
 
-To open `igv`, simply type the following command and the app will open:
+To open `igv`, simply type the following command and the app will open.
 
 ```bash
 igv
@@ -463,11 +463,11 @@ conda remove --name genome-mapping --all -y
 
 ## Summary
 
-- Mapping DNA sequencing reads to a reference genome is a complex procedure that requires multiple steps.
-- Mapping results are the basis for genotyping, i.e. the detection of differences to the reference.
-- The genotyping results can be aggregated from multiple samples and comparatively analysed e.g. in the context of phylogenomics.
-- The chosen mapping parameters can have a strong influence on the results of any downstream analysis.
-- This is particularly true when dealing with ancient DNA samples as they tend to contain DNA from multiple organisms.This can lead to mismapped reads and therefore incorrect genotypes, which can further influence downstream analyses.
+- Mapping DNA sequencing reads to a reference genome is a complex procedure that requires multiple steps
+- Mapping results are the basis for genotyping, i.e. the detection of differences to the reference
+- The genotyping results can be aggregated from multiple samples and comparatively analysed e.g. in the context of phylogenomics
+- The chosen mapping parameters can have a strong influence on the results of any downstream analysis
+- This is particularly true when dealing with ancient DNA samples as they tend to contain DNA from multiple organisms.This can lead to mismapped reads and therefore incorrect genotypes, which can further influence downstream analyses
 
 ## References