apply_to _db.py #134

Fefocrates · 2025-04-10T22:04:45Z

not sure if its an issue or if im missing something ive posted below what im encountering

apply_to_db.py [OPTIONS] -i <input_directory> -fi <fisher_output_dir>

Required arguments:

-i, --input <in_dir> Path to input directory containing parsed tsv labeled {gene_name}_parsed.tsv
-fi, --fisher_dir <fisher_output_dir> Path to initial fisher.py output directory

##new error peta cant

apply_to_db.py -i /scratch/arodr412/hokuscyto_phylo/parsing_tst/forest_out_copy_Apr.07.2025 -fi /scratch/arodr412/hokuscyto_phylo/fisher_out_Mar.28.2025

Traceback (most recent call last):
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 379, in
main()
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 325, in main
new_database(table)
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 250, in new_database
orthologs, paralogs = parse_table(table)
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 200, in parse_table
record = paralogs[name]
KeyError: 'PetaCant..p84149'

in the forest_out_copy_Apr.07.2025 are all the parsed.tsv aswell as the non parsed ones

the fisher out mar folder is the original fisher.py folder. i keep getting this issue

not sure whats up.

robert-ervin-jones · 2025-04-11T18:16:24Z

In this case, was the sequence in question originally a paralog, and you changed the designation to ortholog?

Fefocrates · 2025-04-11T18:22:56Z

Yes, i believe so would swithcing it to its original state fix it? i mean it still made the backup in the database.

Fefocrates · 2025-04-15T16:22:37Z

i kept going with the pipeline and at the matrix constructor step : i didn't get the indices.tsv file, the concatenated matrix nor the stats tsv files. ( attached below the error file)

matrix_slur_error.txt

robert-ervin-jones · 2025-04-15T16:23:46Z

I'll take a look at it. Sorry I haven't gotten back to you yet about your original error. I've been trying to figure out why this might be happening.

Fefocrates · 2025-04-15T16:33:15Z

No worries, and thanks!! not sure if it was cause at least in the original error i changed the PetaCant ( paralog to an ortholog) but the PetaCant isn't part of my input_metadata, that being said the database ( metadataset) im using was given to me by a collaborator ( not sure if that might have anything to do with it but he did run it with his own data and seemed to work. I'll keep tinkering on my end as well and thanks for the help!

robert-ervin-jones · 2025-04-15T16:48:53Z

So for this matrix construction error, it looks like prequal failed. You'll need to check /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.11.2025/logs/prequal/FAM.log for the exact error.

robert-ervin-jones · 2025-04-15T16:50:01Z

For the original error, can you confirm that PetaCant..p84149 is a header for one of the paralog sequences in the database.

Fefocrates · 2025-04-15T17:11:48Z

Yes! i switched it to an ortho since it still falls somewhat within the expected group ( Discobids) screen shot is the original one with no changes*

Fefocrates · 2025-04-15T17:17:59Z

This is all i get from the prequal log

[arodr412@sol-login02:/scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.11.2025/logs/prequal]$ more FAM.log
prequal: ZorroInterface.cpp:135: double** RunHMM(std::vector*, std::string, bool): Assertion `rLsize > 0' failed.

Found only DNA sequences. Doing translations.
There are 1 sequences of max length 0
Prepping pairHMM ... done
Collecting subset of posterior probabilities based on closest 10 sequences determined by Kmers
This may take time for larger data sets:

Creating collection sets of PPs based on Kmer distances
/ 1 / 1

robert-ervin-jones · 2025-04-17T16:17:38Z

Sorry again for just now getting back to you. I need to update my email on here, so that I see responses sooner. I would just remove FAM.fas from the prep_final_dataset_out directory. It looks like all but one sequence is removed after trimming.

Fefocrates · 2025-04-18T16:52:28Z

ok so i removed the FAM.fas from the prep final and now i received the following error ( the concatenation wasnt created) t

[Thu Apr 17 19:09:13 2025]
Finished job 259.
1432 of 1435 steps (100%) done
Exiting because a job execution failed. Look above for error message
Complete log: /scratch/arodr412/hokuscyto_phylo/.snakemake/log/2025-04-17T130516.471765.snakemake.log
Traceback (most recent call last):
File "/home/arodr412/mambaforge/envs/fisher/bin/matrix_constructor.py", line 145, in
run_snakemake()
File "/home/arodr412/mambaforge/envs/fisher/bin/matrix_constructor.py", line 94, in run_snakemake
subprocess.run(smk_cmd, shell=True, executable='/bin/bash', check=True)
File "/home/arodr412/mambaforge/envs/fisher/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'snakemake -s /home/arodr412/mambaforge/envs/fisher/bin/matrix_constructor.smk --config out_dir=/scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025 in_dir=/scratch/arodr412/hokuscyto_phylo/prep_final_dataset_out_Apr.11.2025 in_format=fasta out_format=fasta concatenation_only=False trimal_gt=0.8 genes=RPS17,RPAC1,EIF4A3,NSF1-K,AGX,IFT88,RPS2,MCM-B,MCM-C,ATSAR2,PSMA-E,RRAGD,AP3M1,PSMA-B,RPL24A,CS,WBSCR22,RPL5,GDI2,EIF3I,IFT46,PURA,SRP54,VATB,ODO2A,DHSB3,AMP2B,SPTLC1,TOPO1,PACE2-A,UBE12,AP3S1,TRS,VATA,ATP6,RPL43,STXBP1,NFS1-MITO,GLCN,NMD3,CORO1C,UBA3,AR21,PPP2R5C,RHEB,IMB1,RICTOR,S15P,TMS,NOP5A,CCDC37,CC1,BAT1,WD,IPO4,RPL4B,RPTOR,CCT-Z,CDK5,SND1,RPF1,CCDC40,FTSJ1,ARP2,SCO1-MITO,DNAL1,CRFG,MAT,MCM-A,RPL21,CCT-G,RPL12,SCSB,GCST,ARP3,CCT-E,NDF1,RPS5,EIF3C,RPL20,RPPO,RPL7A,RPS15,PPX2,PSMA-F,RPL13E,PSMB-L,RPL44,RPL11,XPB,NSF1-L,NDUFV2-MITO,ARPC1,ATG2,PACE2C,GDI,RPS16,GNL2,BTUB,CCT-A,SUCA,NSF1-I,ARPC4,RPL14E,COPG2,SPTC2,DRG2,S15A,IMP4,PSMA-H,GLGB2,ODPB,RPS12,CCDC65,RPS8,RAN,HYOU1,MCM-D,RPL15,ORF2,RPS3,NSF1-H,RPL9,VATC,NSF1-J,AGB1,ATP6V0A1,CPN60,VPS26B,GMPP3,RPL33,WD66,NAA15,PSMA-J,VPS18,YKT6,DIMT1L,RPL32,RPL35,PSMB-K,RPS23,RPL31,H2A,PSMD,RPN1B,HMT1,WRS,RPL17,COP-BETA,PSMA-G,RPS20,PSMB-N,SF3B2,AP4M,DHSA1,MTLPD2,PYGB,ALG11,VAPA,RPS18,RAD51A,ODBA,RPS4,NSF1-G,RPO-B,CTP,OPLAH,COPS6,ODPA2,RPS26,RPL19,GSS,RPL3,APBLC,COPE,PSD11,HSP70MT,GRC5,RPS11,VATE,PGM2,NSA2,IFT57,ADK2,CCDC113,SEC23,AP4S1,CALR,PSMB-M,MTHFR,RPS10,MRA1,RPO-C,IPO5,VBP1,ALIS1,METTL1,CCT-B,DNAI2,PSMA-A,EFG-MITO,RPL30,EFTUD1,MCM-E,CCT-N,TM9SF1,PSD7,C3H4,SYGM1,CAPZ,IF2P,ATP6V0D1,PACE5,L10A,IF2B,GTUB,PPP2R3,CLAT,CCT-T,ODBB,RPO-A,RPL2,SAP40,SRA,PIK3C3,PSMA-C,PSMD6,KDELR2,PSMD12,IF6,NSF1-C,CCT-D,CRNL1,IF2G,RF1,RPS6,RPL13A,NSF1-M,GNB2L --cores 1 --rerun-incomplete --keep-going --use-conda --nolock /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/matrix.fas /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/indices.tsv /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/matrix_constructor_stats.tsv /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/occupancy.tsv' returned non-zero exit status 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply_to _db.py #134

apply_to _db.py #134

Fefocrates commented Apr 10, 2025

robert-ervin-jones commented Apr 11, 2025

Fefocrates commented Apr 11, 2025

Fefocrates commented Apr 15, 2025

robert-ervin-jones commented Apr 15, 2025

Fefocrates commented Apr 15, 2025

robert-ervin-jones commented Apr 15, 2025

robert-ervin-jones commented Apr 15, 2025

Fefocrates commented Apr 15, 2025

Fefocrates commented Apr 15, 2025

robert-ervin-jones commented Apr 17, 2025

Fefocrates commented Apr 18, 2025

apply_to _db.py #134

apply_to _db.py #134

Comments

Fefocrates commented Apr 10, 2025

robert-ervin-jones commented Apr 11, 2025

Fefocrates commented Apr 11, 2025

Fefocrates commented Apr 15, 2025

robert-ervin-jones commented Apr 15, 2025

Fefocrates commented Apr 15, 2025

robert-ervin-jones commented Apr 15, 2025

robert-ervin-jones commented Apr 15, 2025

Fefocrates commented Apr 15, 2025

Fefocrates commented Apr 15, 2025

[arodr412@sol-login02:/scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.11.2025/logs/prequal]$ more FAM.log prequal: ZorroInterface.cpp:135: double** RunHMM(std::vector*, std::string, bool): Assertion `rLsize > 0' failed.

robert-ervin-jones commented Apr 17, 2025

Fefocrates commented Apr 18, 2025

[arodr412@sol-login02:/scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.11.2025/logs/prequal]$ more FAM.log
prequal: ZorroInterface.cpp:135: double** RunHMM(std::vector*, std::string, bool): Assertion `rLsize > 0' failed.