Open
Description
I see that maftools is inaccurately estimating the number of T>C
and C>T
and other statistics in SNV Class plot. I am also attaching the VCF file and MAF file for your reference.
Command
Please post your commands and the output (errors or any unexpected output)
BCFtools output:
bcftools stats temp-snpeff.vcf
[W::bcf_hrec_check] Invalid tag name: "B)QB"
# This file was produced by bcftools stats (1.17+htslib-1.17) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats temp-snpeff.vcf
#
# Definition of sets:
# ID [2]id [3]tab-separated file names
ID 0 temp-snpeff.vcf
# SN, Summary numbers:
# number of records .. number of data rows in the VCF
# number of no-ALTs .. reference-only sites, ALT is either "." or identical to REF
# number of SNPs .. number of rows with a SNP
# number of MNPs .. number of rows with a MNP, such as CC>TT
# number of indels .. number of rows with an indel
# number of others .. number of rows with other type, for example a symbolic allele or
# a complex substitution, such as ACT>TCGA
# number of multiallelic sites .. number of rows with multiple alternate alleles
# number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
#
# Note that rows containing multiple types will be counted multiple times, in each
# counter. For example, a row with a SNP and an indel increments both the SNP and
# the indel counter.
#
# SN [2]id [3]key [4]value
SN 0 number of samples: 1
SN 0 number of records: 736
SN 0 number of no-ALTs: 0
SN 0 number of SNPs: 690
SN 0 number of MNPs: 46
SN 0 number of indels: 0
SN 0 number of others: 0
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
# TSTV, transitions/transversions:
# TSTV [2]id [3]ts [4]tv [5]ts/tv [6]ts (1st ALT) [7]tv (1st ALT) [8]ts/tv (1st ALT)
TSTV 0 610 80 7.62 610 80 7.62
# SiS, Singleton stats:
# SiS [2]id [3]allele count [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable
SiS 0 1 690 610 80 0 0 0 0
# AF, Stats by non-reference allele frequency:
# AF [2]id [3]allele frequency [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable
AF 0 0.000000 690 610 80 0 0 0 0
# QUAL, Stats by quality
# QUAL [2]id [3]Quality [4]number of SNPs [5]number of transitions (1st ALT) [6]number of transversions (1st ALT) [7]number of indels
QUAL 0 0.0 690 610 80 0
# IDD, InDel distribution:
# IDD [2]id [3]length (deletions negative) [4]number of sites [5]number of genotypes [6]mean VAF
# ST, Substitution types:
# ST [2]id [3]type [4]count
ST 0 A>C 12
ST 0 A>G 132
ST 0 A>T 14
ST 0 C>A 14
ST 0 C>G 7
ST 0 C>T 170
ST 0 G>A 135
ST 0 G>C 4
ST 0 G>T 8
ST 0 T>A 17
ST 0 T>C 173
ST 0 T>G 4
# DP, Depth distribution
# DP [2]id [3]bin [4]number of genotypes [5]fraction of genotypes (%) [6]number of sites [7]fraction of sites (%)
But the maftools plotmafSummary
function says that T>C
and C>T
mutations are 305 in number but they are not (I also subset MAF file manually and check the number of rows). Now of variants in the samples are also reported less.
library(maftools)
mafs = list.files(path = ".", pattern = "*.maf$", full.names = TRUE)
maf_dat = lapply(mafs, data.table::fread, sep = "\t", skip = "Hugo_Symbol")
names(maf_dat) = unlist(data.table::tstrsplit(basename(path = mafs), spli = "\\.", keep = 1))
maf = data.table::rbindlist(l = maf_dat, use.names = TRUE, fill = TRUE, idcol = "maf_file")
maf = maftools::read.maf(maf, removeDuplicatedVariants = FALSE)
plotmafSummary(maf = maf, rmOutlier = FALSE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)
Session info
Run sessionInfo()
and post the output below
> sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Riyadh
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] maftools_2.24.0 dplyr_1.1.4 readr_2.1.5
loaded via a namespace (and not attached):
[1] tidyr_1.3.1 generics_0.1.3 lattice_0.22-5 stringi_1.8.7 hms_1.1.3 digest_0.6.37
[7] magrittr_2.0.3 grid_4.5.0 RColorBrewer_1.1-3 pkgload_1.4.0 fastmap_1.2.0 Matrix_1.7-3
[13] plyr_1.8.9 cellranger_1.1.0 processx_3.8.6 pkgbuild_1.4.7 sessioninfo_1.2.3 survival_3.8-3
[19] urlchecker_1.0.1 ps_1.9.1 promises_1.3.2 BiocManager_1.30.25 purrr_1.0.4 cli_3.6.4
[25] shiny_1.10.0 rlang_1.1.6 crayon_1.5.3 splines_4.5.0 ellipsis_0.3.2 bit64_4.6.0-1
[31] remotes_2.5.0 withr_3.0.2 cachem_1.1.0 DNAcopy_1.82.0 devtools_2.4.5 tools_4.5.0
[37] parallel_4.5.0 tzdb_0.5.0 memoise_2.0.1 httpuv_1.6.16 BiocGenerics_0.54.0 curl_6.2.2
[43] vctrs_0.6.5 R6_2.6.1 mime_0.13 stats4_4.5.0 lifecycle_1.0.4 stringr_1.5.1
[49] S4Vectors_0.46.0 fs_1.6.6 htmlwidgets_1.6.4 bit_4.6.0 usethis_3.1.0 vroom_1.6.5
[55] miniUI_0.1.2 pkgconfig_2.0.3 desc_1.4.3 callr_3.7.6 pillar_1.10.2 later_1.4.2
[61] data.table_1.17.0 glue_1.8.0 profvis_0.4.0 Rcpp_1.0.14 tibble_3.2.1 tidyselect_1.2.1
[67] rstudioapi_0.17.1 xtable_1.8-4 htmltools_0.5.8.1 mgsub_1.7.3 compiler_4.5.0 readxl_1.4.5
Metadata
Metadata
Assignees
Labels
No labels