HowAreYaxisValsCalculated

Jump to bottom

Li Shen edited this page Mar 30, 2015 · 2 revisions

This is how the Y-axis values are calculated (Step by step)

Each genomic region (e.g. TSS+/-2Kb) is first extended by the fragment length on both sides (to create a "buffer" in the flanking regions);
The short reads that are overlapping the extended regions are retrieved from BAM files;
Coverage (or depth) at single base resolution are calculated by extending each short read to the length of the fragment;
Normalize the coverage vectors by their lengths:

If the algorithm is spline: the coverage vector is fit to a spline and then 101 points are sampled at equal interval;
If the algorithm is bin: the coverage vector is separated into 101 equal-sized bins and the averaged value for each bin is calculated;

The negative values in the coverage vectors are forced to zeros (This is caused by spline fit);
Each value is normalized by library size using: val = val / libsize * 1e6;
If the setting is bam-pair, repeat the above procedure for the background bam file; The final value = log2(foreground value / background value) with pseudo count used to avoid division by zeros;
The coverage (or log2 ratio) vectors for each gene list are averaged to produce the average profiles.