Skip to content

PCGR Software Limitation with Hypermutated Samples Exceeding 500,000 Variants #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
qclayssen opened this issue Sep 30, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@qclayssen
Copy link
Collaborator

qclayssen commented Sep 30, 2024

We encountered an issue with the PCGR (Personal Cancer Genome Reporter) software when processing hypermutated samples containing more than 500,000 variants. PCGR has a hard threshold, which prevents it from efficiently handling these large variant datasets. To work around this, we are forced to filter down the number of variants to meet the threshold. However, this filtering introduces discrepancies between different pipelines, can lead to inconsistent data interpretation.

Proposed Solutions:

  • Chunked VCF Input: Divide the input VCF file into smaller chunks of variants, each below the 500,000 variant threshold, and process them in parallel through PCGR. Once processed, the individual results can be merged back together to create a single comprehensive report.

  • Parallelization: Implementing this chunking approach will ensure that large datasets are handled more efficiently, reducing the need for manual filtering and maintaining the full variant dataset for downstream analysis.

Expected Outcome:

By removing the filtering step, we can minimize inconsistencies between pipelines and preserve important variants in hypermutated samples. The use of parallel processing for chunked input will ensure that the overall processing time remains consistent or is even reduced, while still handling the full variant set.

Branch: hypermutation_handling

@qclayssen qclayssen added the enhancement New feature or request label Oct 28, 2024
@scwatts scwatts added this to the 0.3.0 milestone Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants