You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We encountered an issue with the PCGR (Personal Cancer Genome Reporter) software when processing hypermutated samples containing more than 500,000 variants. PCGR has a hard threshold, which prevents it from efficiently handling these large variant datasets. To work around this, we are forced to filter down the number of variants to meet the threshold. However, this filtering introduces discrepancies between different pipelines, can lead to inconsistent data interpretation.
Proposed Solutions:
Chunked VCF Input: Divide the input VCF file into smaller chunks of variants, each below the 500,000 variant threshold, and process them in parallel through PCGR. Once processed, the individual results can be merged back together to create a single comprehensive report.
Parallelization: Implementing this chunking approach will ensure that large datasets are handled more efficiently, reducing the need for manual filtering and maintaining the full variant dataset for downstream analysis.
Expected Outcome:
By removing the filtering step, we can minimize inconsistencies between pipelines and preserve important variants in hypermutated samples. The use of parallel processing for chunked input will ensure that the overall processing time remains consistent or is even reduced, while still handling the full variant set.
We encountered an issue with the PCGR (Personal Cancer Genome Reporter) software when processing hypermutated samples containing more than 500,000 variants. PCGR has a hard threshold, which prevents it from efficiently handling these large variant datasets. To work around this, we are forced to filter down the number of variants to meet the threshold. However, this filtering introduces discrepancies between different pipelines, can lead to inconsistent data interpretation.
Proposed Solutions:
Chunked VCF Input: Divide the input VCF file into smaller chunks of variants, each below the 500,000 variant threshold, and process them in parallel through PCGR. Once processed, the individual results can be merged back together to create a single comprehensive report.
Parallelization: Implementing this chunking approach will ensure that large datasets are handled more efficiently, reducing the need for manual filtering and maintaining the full variant dataset for downstream analysis.
Expected Outcome:
By removing the filtering step, we can minimize inconsistencies between pipelines and preserve important variants in hypermutated samples. The use of parallel processing for chunked input will ensure that the overall processing time remains consistent or is even reduced, while still handling the full variant set.
Branch: hypermutation_handling
The text was updated successfully, but these errors were encountered: