Skip to content

apply_to _db.py #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Fefocrates opened this issue Apr 10, 2025 · 11 comments
Open

apply_to _db.py #134

Fefocrates opened this issue Apr 10, 2025 · 11 comments

Comments

@Fefocrates
Copy link

not sure if its an issue or if im missing something ive posted below what im encountering

apply_to_db.py [OPTIONS] -i <input_directory> -fi <fisher_output_dir>

Required arguments:

-i, --input <in_dir> Path to input directory containing parsed tsv labeled {gene_name}_parsed.tsv
-fi, --fisher_dir <fisher_output_dir> Path to initial fisher.py output directory

##new error peta cant

apply_to_db.py -i /scratch/arodr412/hokuscyto_phylo/parsing_tst/forest_out_copy_Apr.07.2025 -fi /scratch/arodr412/hokuscyto_phylo/fisher_out_Mar.28.2025

Traceback (most recent call last):
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 379, in
main()
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 325, in main
new_database(table)
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 250, in new_database
orthologs, paralogs = parse_table(table)
File "/home/arodr412/mambaforge/envs/fisher/bin/apply_to_db.py", line 200, in parse_table
record = paralogs[name]
KeyError: 'PetaCant..p84149'

in the forest_out_copy_Apr.07.2025 are all the parsed.tsv aswell as the non parsed ones

the fisher out mar folder is the original fisher.py folder. i keep getting this issue

not sure whats up.

@robert-ervin-jones
Copy link
Member

In this case, was the sequence in question originally a paralog, and you changed the designation to ortholog?

@Fefocrates
Copy link
Author

Yes, i believe so would swithcing it to its original state fix it? i mean it still made the backup in the database.

@Fefocrates
Copy link
Author

i kept going with the pipeline and at the matrix constructor step : i didn't get the indices.tsv file, the concatenated matrix nor the stats tsv files. ( attached below the error file)

matrix_slur_error.txt

@robert-ervin-jones
Copy link
Member

I'll take a look at it. Sorry I haven't gotten back to you yet about your original error. I've been trying to figure out why this might be happening.

@Fefocrates
Copy link
Author

No worries, and thanks!! not sure if it was cause at least in the original error i changed the PetaCant ( paralog to an ortholog) but the PetaCant isn't part of my input_metadata, that being said the database ( metadataset) im using was given to me by a collaborator ( not sure if that might have anything to do with it but he did run it with his own data and seemed to work. I'll keep tinkering on my end as well and thanks for the help!

@robert-ervin-jones
Copy link
Member

So for this matrix construction error, it looks like prequal failed. You'll need to check /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.11.2025/logs/prequal/FAM.log for the exact error.

@robert-ervin-jones
Copy link
Member

For the original error, can you confirm that PetaCant..p84149 is a header for one of the paralog sequences in the database.

@Fefocrates
Copy link
Author

Yes! i switched it to an ortho since it still falls somewhat within the expected group ( Discobids) screen shot is the original one with no changes*

Image

@Fefocrates
Copy link
Author

This is all i get from the prequal log

[arodr412@sol-login02:/scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.11.2025/logs/prequal]$ more FAM.log
prequal: ZorroInterface.cpp:135: double** RunHMM(std::vector*, std::string, bool): Assertion `rLsize > 0' failed.

Found only DNA sequences. Doing translations.
There are 1 sequences of max length 0
Prepping pairHMM ... done
Collecting subset of posterior probabilities based on closest 10 sequences determined by Kmers
This may take time for larger data sets:

Creating collection sets of PPs based on Kmer distances
/ 1 / 1

@robert-ervin-jones
Copy link
Member

Sorry again for just now getting back to you. I need to update my email on here, so that I see responses sooner. I would just remove FAM.fas from the prep_final_dataset_out directory. It looks like all but one sequence is removed after trimming.

@Fefocrates
Copy link
Author

ok so i removed the FAM.fas from the prep final and now i received the following error ( the concatenation wasnt created) t

[Thu Apr 17 19:09:13 2025]
Finished job 259.
1432 of 1435 steps (100%) done
Exiting because a job execution failed. Look above for error message
Complete log: /scratch/arodr412/hokuscyto_phylo/.snakemake/log/2025-04-17T130516.471765.snakemake.log
Traceback (most recent call last):
File "/home/arodr412/mambaforge/envs/fisher/bin/matrix_constructor.py", line 145, in
run_snakemake()
File "/home/arodr412/mambaforge/envs/fisher/bin/matrix_constructor.py", line 94, in run_snakemake
subprocess.run(smk_cmd, shell=True, executable='/bin/bash', check=True)
File "/home/arodr412/mambaforge/envs/fisher/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'snakemake -s /home/arodr412/mambaforge/envs/fisher/bin/matrix_constructor.smk --config out_dir=/scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025 in_dir=/scratch/arodr412/hokuscyto_phylo/prep_final_dataset_out_Apr.11.2025 in_format=fasta out_format=fasta concatenation_only=False trimal_gt=0.8 genes=RPS17,RPAC1,EIF4A3,NSF1-K,AGX,IFT88,RPS2,MCM-B,MCM-C,ATSAR2,PSMA-E,RRAGD,AP3M1,PSMA-B,RPL24A,CS,WBSCR22,RPL5,GDI2,EIF3I,IFT46,PURA,SRP54,VATB,ODO2A,DHSB3,AMP2B,SPTLC1,TOPO1,PACE2-A,UBE12,AP3S1,TRS,VATA,ATP6,RPL43,STXBP1,NFS1-MITO,GLCN,NMD3,CORO1C,UBA3,AR21,PPP2R5C,RHEB,IMB1,RICTOR,S15P,TMS,NOP5A,CCDC37,CC1,BAT1,WD,IPO4,RPL4B,RPTOR,CCT-Z,CDK5,SND1,RPF1,CCDC40,FTSJ1,ARP2,SCO1-MITO,DNAL1,CRFG,MAT,MCM-A,RPL21,CCT-G,RPL12,SCSB,GCST,ARP3,CCT-E,NDF1,RPS5,EIF3C,RPL20,RPPO,RPL7A,RPS15,PPX2,PSMA-F,RPL13E,PSMB-L,RPL44,RPL11,XPB,NSF1-L,NDUFV2-MITO,ARPC1,ATG2,PACE2C,GDI,RPS16,GNL2,BTUB,CCT-A,SUCA,NSF1-I,ARPC4,RPL14E,COPG2,SPTC2,DRG2,S15A,IMP4,PSMA-H,GLGB2,ODPB,RPS12,CCDC65,RPS8,RAN,HYOU1,MCM-D,RPL15,ORF2,RPS3,NSF1-H,RPL9,VATC,NSF1-J,AGB1,ATP6V0A1,CPN60,VPS26B,GMPP3,RPL33,WD66,NAA15,PSMA-J,VPS18,YKT6,DIMT1L,RPL32,RPL35,PSMB-K,RPS23,RPL31,H2A,PSMD,RPN1B,HMT1,WRS,RPL17,COP-BETA,PSMA-G,RPS20,PSMB-N,SF3B2,AP4M,DHSA1,MTLPD2,PYGB,ALG11,VAPA,RPS18,RAD51A,ODBA,RPS4,NSF1-G,RPO-B,CTP,OPLAH,COPS6,ODPA2,RPS26,RPL19,GSS,RPL3,APBLC,COPE,PSD11,HSP70MT,GRC5,RPS11,VATE,PGM2,NSA2,IFT57,ADK2,CCDC113,SEC23,AP4S1,CALR,PSMB-M,MTHFR,RPS10,MRA1,RPO-C,IPO5,VBP1,ALIS1,METTL1,CCT-B,DNAI2,PSMA-A,EFG-MITO,RPL30,EFTUD1,MCM-E,CCT-N,TM9SF1,PSD7,C3H4,SYGM1,CAPZ,IF2P,ATP6V0D1,PACE5,L10A,IF2B,GTUB,PPP2R3,CLAT,CCT-T,ODBB,RPO-A,RPL2,SAP40,SRA,PIK3C3,PSMA-C,PSMD6,KDELR2,PSMD12,IF6,NSF1-C,CCT-D,CRNL1,IF2G,RF1,RPS6,RPL13A,NSF1-M,GNB2L --cores 1 --rerun-incomplete --keep-going --use-conda --nolock /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/matrix.fas /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/indices.tsv /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/matrix_constructor_stats.tsv /scratch/arodr412/hokuscyto_phylo/matrix_constructor_out_Apr.17.2025/occupancy.tsv' returned non-zero exit status 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants