Skip to content

circ + variants is parallelization working? #347

@lydiayliu

Description

@lydiayliu

I'm not sure if parallelization is working for circRNAs... The reason is that I have been running a sample for the better part of today and it hasn't budged... I'm doing this on a single node using a single process. I tried both 16 and 32 threads.

a=/data/Parser/VEP/gencode/gsnp/CPCG0183.gencode.tsv.s.gvf
b=$(basename -- "$a"); echo ${b};
c="${b%%.*}"; echo ${c};
moPepGen callVariant \
    --input-variant /hot/users/yiyangliu/MoPepGen/Parser/CIRCexplorer3/TOPHAT/${c}_IP_quant.txt.1.s.gvf \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gindel/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/somaticsniper/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/pindel/${b} \
    --index-dir /hot/users/yiyangliu/MoPepGen/Index/GRCh38-EBI-GENCODE34/ \
    --verbose-level 1 \
    --threads 16 \
    --noncanonical-transcripts \
    --output-fasta /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/${c}.fasta > /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/${c}.log

killing the process gets this, also the CPU usage from the docker just hovers around 100%

^CTraceback (most recent call last):                                                                                                                                        
  File "/usr/local/bin/moPepGen", line 8, in <module>                                                                                                                       
    sys.exit(main())                                                                                                                                                        
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 79, in main                                                                                  
    args.func(args)                                                                                                                                                         
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 280, in call_variant_peptide
    results = process_pool.map(wrapper, dispatches)                                                                                                                         
  File "/usr/local/lib/python3.8/site-packages/pathos/parallel.py", line 237, in map                                                                                        
    return list(self.imap(f, *args))                                                                                                                                        
  File "/usr/local/lib/python3.8/site-packages/pathos/parallel.py", line 250, in <genexpr>
    return (subproc() for subproc in list(builtins.map(submit, *args)))               
  File "/usr/local/lib/python3.8/site-packages/ppft/_pp.py", line 124, in __call__    
    self.wait()                                                                       
  File "/usr/local/lib/python3.8/site-packages/ppft/_pp.py", line 137, in wait
    self.lock.acquire()         
KeyboardInterrupt

the entire log just looks like this (this sample used to run in less than 20 minutes prior to GVF indexing and multi-process), it's been quite a few hours

yiyangliu@ip-0A125212:/hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm$ tail CPCG0183.log
[ 2022-01-16 18:19:52 ] moPepGen callVariant started
[ 2022-01-16 18:21:11 ] Reference indices loaded.

however, I did get CPCG0100 to run through, and it only took a little longer than before so idk...

yiyangliu@ip-0A125212:/hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm$ tail CPCG0100.log
[ 2022-01-16 17:49:40 ] moPepGen callVariant started
[ 2022-01-16 17:51:00 ] Reference indices loaded.
[ 2022-01-16 18:13:28 ] Variant peptide FASTA file written to disk.

CPCG0100 old log prior to GVF indexing and multi-process:

[ 2021-12-28 18:08:55 ] moPepGen callVariant started
[ 2021-12-28 18:10:27 ] Variant file /data/Parser/CIRCexplorer3/TOPHAT/CPCG0100_IP_quant.txt.1.3ff.gvf loaded.
[ 2021-12-28 18:16:39 ] Variant file /data/Parser/VEP/gencode/gsnp/CPCG0100.gencode.tsv.gvf loaded.
[ 2021-12-28 18:17:36 ] Variant file /data/Parser/VEP/gencode/gindel/CPCG0100.gencode.tsv.gvf loaded.
[ 2021-12-28 18:17:36 ] Variant file /data/Parser/VEP/gencode/somaticsniper/CPCG0100.gencode.tsv.gvf loaded.
[ 2021-12-28 18:17:36 ] Variant file /data/Parser/VEP/gencode/pindel/CPCG0100.gencode.tsv.gvf loaded.
[ 2021-12-28 18:18:00 ] Variant records sorted.
[ 2021-12-28 18:20:16 ] circRNA processed.
[ 2021-12-28 18:20:16 ] Variant peptide FASTA file written to disk.

I'm gonna try with 12 threads and verbose 2 now

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions