PCGR Software Limitation with Hypermutated Samples Exceeding 500,000 Variants #1

qclayssen · 2024-09-30T00:42:01Z

We encountered an issue with the PCGR (Personal Cancer Genome Reporter) software when processing hypermutated samples containing more than 500,000 variants. PCGR has a hard threshold, which prevents it from efficiently handling these large variant datasets. To work around this, we are forced to filter down the number of variants to meet the threshold. However, this filtering introduces discrepancies between different pipelines, can lead to inconsistent data interpretation.

Proposed Solutions:

Chunked VCF Input: Divide the input VCF file into smaller chunks of variants, each below the 500,000 variant threshold, and process them in parallel through PCGR. Once processed, the individual results can be merged back together to create a single comprehensive report.
Parallelization: Implementing this chunking approach will ensure that large datasets are handled more efficiently, reducing the need for manual filtering and maintaining the full variant dataset for downstream analysis.

Expected Outcome:

By removing the filtering step, we can minimize inconsistencies between pipelines and preserve important variants in hypermutated samples. The use of parallel processing for chunked input will ensure that the overall processing time remains consistent or is even reduced, while still handling the full variant set.

Branch: hypermutation_handling

qclayssen added the enhancement label Oct 28, 2024

scwatts assigned qclayssen Oct 28, 2024

scwatts added this to the 0.3.0 milestone Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCGR Software Limitation with Hypermutated Samples Exceeding 500,000 Variants #1

PCGR Software Limitation with Hypermutated Samples Exceeding 500,000 Variants #1

qclayssen commented Sep 30, 2024 •

edited

Loading

PCGR Software Limitation with Hypermutated Samples Exceeding 500,000 Variants #1

PCGR Software Limitation with Hypermutated Samples Exceeding 500,000 Variants #1

Comments

qclayssen commented Sep 30, 2024 • edited Loading

Proposed Solutions:

Expected Outcome:

qclayssen commented Sep 30, 2024 •

edited

Loading