Waltz

A fast, efficient bam pileup and application modules based on it, like coverage metrics, genotyping, signature finding etc.

This software was developed at the Innovation Lab, Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center.

Waltz has 2 main modules:

Bam metrics: Generate various useful metrics for a given bam file
Genotyping: Determine the fragment count and allele fraction of given mutations in given bam file

Java

Java 1.8 or above is required.

Dependencies (bundled with the release jar)

BioinfoUtils
HTSJDK
Google Guava
Apache Commons IO

1. Bam Metrics

Generate bam level metrics

java -server -Xms4g -Xmx4g -cp Waltz.jar org.mskcc.juber.waltz.countreads.CountReads bam-file coverageThreshold canonical-transcripts-bed-file intervals-bed-file

where
coverageThreshold is the average coverage above which a contiguous region should be considered covered (suggested value: 5)
canonical-transcripts-bed-file is the bed file with all exons in across the genomes (included above)
intervals-bed-file is the bed file of chosen genomic intervals

This produces 3 files:
.covered-regions: regions of contiguous coverage, annotated with canonical transcripts. Useful for checking what regions are actually covered in the bam file. Columns: chr, start, end, length, average total coverage in the contiguous region.

.read-counts: bam-level stats. Columns: bam file name, total reads, unmapped reads, total mapped reads, unique mapped reads, duplicate fraction, total on-target reads, unique on-target reads, total on-target rate, unique on-target rate

.fragment-sizes: fragment size distribution. Columns: fragment-size, total frequency, unique frequency

Generate metrics specific to given genomic regions

java -server -Xms4g -Xmx4g -cp Waltz.jar org.mskcc.juber.waltz.Waltz PileupMetrics mappinngQualityThreshold bam-file reference-fasta intervals-bed-file

This produces 4 different files: -pileup.txt: per-position fragment count for different alleles. Columns: chr, position, ref, depth (including N's), fragment counts for A, C, G, T, insertions, deletions, soft clip start, soft clip end, hard clip start, hard clip end

-pileup-without-duplicates.txt: similar to above but only unique fragments are counted

-intervals.txt: stats per genomic interval. Columns: chr, start, end, interval name, interval length, peak coverage, average coverage, GC fraction, number of fragments mapped

-intervals-without-duplicates.txt: similar to above but only unique fragments are considered

Collect metrics across samples

Run aggregate-bam-metrics.sh script in the folder where the above output files are present to collect metrics across samples.

This produces 3 main files with self-explanatory headers. read-counts.txt: collection of metrics from *.read-counts files

waltz-coverage.txt: per sample coverage calculated across chosen genomic intervals

fragment-sizes.txt: fragment size distributions for all samples

2. Genotyping

java -server -Xms4g -Xmx4g -cp Waltz.jar org.mskcc.juber.waltz.Waltz Genotyping mappinngQualityThreshold bam-file reference-fasta intervals-bed-file mutations-maf-file

where mutations-maf-file is a file in maf format specifying the mutations to be profiled in the given bam. Required fields are Chromosome, Start_Position, Variant_Type, Reference_Allele and Tumor_Seq_Allele2

This will produce a -genotypes.maf file with 4 addtional columns at the end: Waltz_total_t_depth, Waltz_total_t_alt_count, Waltz_MD_t_depth and Waltz_MD_t_alt_count. All sample-specific columns will be made empty while all the mutation-specific information will be retained. Tumor_Sample_Barcode will contain the name of the sample being genotyped.

Collect genotypes across multiple samples

Run aggregate-genotypes.sh script in the folder where the -genotypes.maf files are present to collect genotyping information across multiple samples. The output is a genotypes.maf file.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
scripts		scripts
src/org/mskcc/juber/waltz		src/org/mskcc/juber/waltz
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Waltz

Java

Dependencies (bundled with the release jar)

1. Bam Metrics

Generate bam level metrics

Generate metrics specific to given genomic regions

Collect metrics across samples

2. Genotyping

Collect genotypes across multiple samples

About

Uh oh!

Releases 2

Packages

Languages

License

juberpatel/Waltz

Folders and files

Latest commit

History

Repository files navigation

Waltz

Java

Dependencies (bundled with the release jar)

1. Bam Metrics

Generate bam level metrics

Generate metrics specific to given genomic regions

Collect metrics across samples

2. Genotyping

Collect genotypes across multiple samples

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages