Skip to content

Commit b28e07a

Browse files
authored
Initial upload of test data
1 parent b602b10 commit b28e07a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+5865
-0
lines changed

test_data/README.txt

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
In each workflow folder you can find the
2+
reduced output files and directory systems
3+
of the specific workflows.
4+
5+
Each folder also contains a textfile
6+
showing example commandline calls that
7+
would create the given data.
8+
9+
The output is not entirely complete and
10+
Spice does not have a test mode. If you
11+
wanna generate a small Library just to
12+
check if it all works, you can try the
13+
tetraodon. The species does not have a lot
14+
of splicing variants annotated and the
15+
genome is generally small in comparison to
16+
other vertebrates on Ensembl.
17+
18+
If you have any question you can always
19+
reach me via e-Mail:
20+
21+
22+
23+
Cheers,
24+
Christian
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Initialize library:
2+
python spice_library.py -s human -r 94 -o test_data/Spice_Library
3+
4+
Prepare FAS SLURM Jobs
5+
(Start here if you come from the Spice Novel Workflow):
6+
python FASJobAssistant.py -L test_data/spice_lib_test_homo_sapiens_94_1ee/ -m 2 -p node_names_of_computing_cluster -d path_to_binary_directory_of_FAS -o output_directory \
7+
-a node_names_of_computing_cluster_used_for_annotion -A number_of_cores_that_shall_be_used_for_annotation -t path_to_annoTools_directory
8+
9+
After running all FAS Jobs run this:
10+
python parse_domain_out.py -f test_data/Spice_Library/spice_lib_test_homo_sapiens_94_1ee/fas_data/forward.domains -r test_data/Spice_Library/spice_lib_test_homo_sapiens_94_1ee/fas_data/forward.domains \
11+
-m test_data/Spice_Library/spice_lib_test_homo_sapiens_94_1ee/fas_data/architectures -o test_data/Spice_Library/spice_lib_test_homo_sapiens_94_1ee/fas_data/
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#linearized
2+
Pfam
3+
#normal
4+
fLPS
5+
COILS2
6+
SEG
7+
SignalP
8+
TMHMM
9+
#checked
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
geneID ncbiID orthoID FAS_F FAS_B
2+
ENSG00000253293|ENSP00000283921|9606 ncbi9606 ENSG00000253293|ENSP00000479619|9606 0.3324 0.6492
3+
ENSG00000253293|ENSP00000283921|9606 ncbi9606 ENSG00000253293|ENSP00000379633|9606 0.371 0.9274
4+
ENSG00000253293|ENSP00000479619|9606 ncbi9606 ENSG00000253293|ENSP00000379633|9606 0.0 0.0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{"ENSG00000126368": "000000160.json",
2+
"ENSG00000183615": "000000004.json",
3+
"ENSG00000253293": "000000071.json"}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{"ENSG00000183615": {
2+
"ENSP00000362684": {
3+
"ENSP00000362684": 1.0,
4+
"HOMnoORFGroup": 0.0,
5+
"HOM4b9cf06933c8e78": 0.9986
6+
},
7+
"HOMnoORFGroup": {
8+
"ENSP00000362684": 0.0,
9+
"HOMnoORFGroup": 1.0,
10+
"HOM4b9cf06933c8e78": 0.0
11+
},
12+
"HOM4b9cf06933c8e78": {
13+
"ENSP00000362684": 0.1368,
14+
"HOMnoORFGroup": 0.0,
15+
"HOM4b9cf06933c8e78": 1.0
16+
}
17+
}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{"ENSG00000253293": {
2+
"ENSP00000283921": {
3+
"ENSP00000283921": 1.0,
4+
"ENSP00000479619": 0.6492,
5+
"ENSP00000379633": 0.9274,
6+
"HOMnoDiamondGroup": 0.0,
7+
"HOMnoORFGroup": 0.0
8+
},
9+
"ENSP00000479619": {
10+
"ENSP00000283921": 0.3324,
11+
"ENSP00000479619": 1.0,
12+
"ENSP00000379633": 0.0,
13+
"HOMnoDiamondGroup": 0.0,
14+
"HOMnoORFGroup": 0.0
15+
},
16+
"ENSP00000379633": {
17+
"ENSP00000283921": 0.371,
18+
"ENSP00000479619": 0.0,
19+
"ENSP00000379633": 1.0,
20+
"HOMnoDiamondGroup": 0.0,
21+
"HOMnoORFGroup": 0.0
22+
},
23+
"HOMnoDiamondGroup": {
24+
"ENSP00000283921": 0.0,
25+
"ENSP00000479619": 0.0,
26+
"ENSP00000379633": 0.0,
27+
"HOMnoDiamondGroup": 1.0,
28+
"HOMnoORFGroup": 0.0
29+
},
30+
"HOMnoORFGroup": {
31+
"ENSP00000283921": 0.0,
32+
"ENSP00000479619": 0.0,
33+
"ENSP00000379633": 0.0,
34+
"HOMnoDiamondGroup": 0.0,
35+
"HOMnoORFGroup": 1.0
36+
}
37+
}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{"ENSG00000126368": {
2+
"ENSP00000246672": {
3+
"ENSP00000246672": 1.0,
4+
"HOMnoORFGroup": 0.0,
5+
"HOMGRP81c552f7c12b": 0.0,
6+
"HOMnoDiamondGroup": 0.0
7+
},
8+
"HOMnoORFGroup": {
9+
"ENSP00000246672": 0.0,
10+
"HOMnoORFGroup": 1.0,
11+
"HOMGRP81c552f7c12b": 0.0,
12+
"HOMnoDiamondGroup": 0.0
13+
},
14+
"HOMGRP81c552f7c12b": {
15+
"ENSP00000246672": 0.0,
16+
"HOMnoORFGroup": 0.0,
17+
"HOMGRP81c552f7c12b": 1.0,
18+
"HOMnoDiamondGroup": 0.0
19+
},
20+
"HOMnoDiamondGroup": {
21+
"ENSP00000246672": 0.0,
22+
"HOMnoORFGroup": 0.0,
23+
"HOMGRP81c552f7c12b": 0.0,
24+
"HOMnoDiamondGroup": 1.0
25+
}
26+
}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
commandline_args:
2+
copylib: null
3+
force: false
4+
keepgtf: false
5+
modefas: null
6+
outdir: test_data/Spice_Library
7+
release: '94'
8+
species: human
9+
info:
10+
collected_sequences_count: 5
11+
fas_mode: 1ee
12+
fas_scored_sequences_count: 5
13+
gene_count: 3
14+
protein_count: 5
15+
release: '94'
16+
species: test_homo_sapiens
17+
taxon_id: 9606
18+
transcript_count: 5
19+
init_date: '2023-06-15'
20+
last_edit: '2023-06-15'
21+
spice_version: '0.1'
22+
status:
23+
01_id_collection: true
24+
02_sequence_collection: true
25+
03_small_protein_removing: true
26+
04_incorrect_entry_removing: true
27+
05_implicit_fas_scoring: true
28+
06_fasta_generation: true
29+
07_pairing_generation: true
30+
08_id_tsv_generation: true
31+
09_sequence_annotation: false
32+
10_fas_scoring: false
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"root": "/share/gluster/spice/spice_libraries/spice_lib_homo_sapiens_94_1ee",
3+
"info": "info.yaml",
4+
"fas_data": "fas_data",
5+
"fas_scores": "fas_data/fas_scores",
6+
"fas_temp": "fas_data/tmp",
7+
"fas_annotation": "fas_data/annotation",
8+
"fas_annoTools": "fas_data/annoTools.txt",
9+
"fas_archtiectures": "fas_data/architectures",
10+
"transcript_data": "transcript_data",
11+
"transcript_info": "transcript_data/transcript_info.json",
12+
"transcript_seq": "transcript_data/sequences.json",
13+
"transcript_fasta": "transcript_data/transcript_set.fasta",
14+
"transcript_pairings": "transcript_data/transcript_pairings.json",
15+
"transcript_ids": "transcript_data/phyloprofile_ids.tsv",
16+
"self": "paths.json",
17+
"fas_index": "fas_data/fas_index.json"
18+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
ENSG00000126368
2+
ENSG00000183615
3+
ENSG00000253293
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
ENSG00000253293|ENSP00000283921|9606 ncbi9606
2+
ENSG00000253293|ENSP00000479619|9606 ncbi9606
3+
ENSG00000253293|ENSP00000379633|9606 ncbi9606
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{"ENSG00000126368": {
2+
"ENSP00000246672": "MTTLDSNNNTGGVITYIGSSGSSPSRTSPESLYSDNSNGSFQSLTQGCPTYFPPSPTGSLTQDPARSFGSIPPSLSDDGSPSSSSSSSSSSSSFYNGSPPGSLQVAMEDSSRVSPSKSTSNITKLNGMVLLCKVCGDVASGFHYGVHACEGCKGFFRRSIQQNIQYKRCLKNENCSIVRINRNRCQQCRFKKCLSVGMSRDAVRFGRIPKREKQRMLAEMQSAMNLANNQLSSQCPLETSPTQHPTPGPMGPSPPPAPVPSPLVGFSQFPQQLTPPRSPSPEPTVEDVISQVARAHREIFTYAHDKLGSSPGNFNANHASGSPPATTPHRWENQGCPPAPNDNNTLAAQRHNEALNGLRQAPSSYPPTWPPGPAHHSCHQSNSNGHRLCPTHVYAAPEGKAPANSPRQGNSKNVLLACPMNMYPHGRSGRTVQEIWEDFSMSFTPAVREVVEFAKHIPGFRDLSQHDQVTLLKAGTFEVLMVRFASLFNVKDQTVMFLSRTTYSLQELGAMGMGDLLSAMFDFSEKLNSLALTEEELGLFTAVVLVSADRSGMENSASVEQLQETLLRALRALVLKNRPLETSRFTKLLLKLPDLRTLNNMHSEKLLSFRVDAQ"
3+
},
4+
"ENSG00000183615": {
5+
"ENSP00000362684": "MSLGLLKFQAVGEEDEEDEEGESLDSVKALTAKLQLQTRRPSYLEWTAQVQSQAWRRAQAKPGPGGPGDICGFDSMDSALEWLRRELREMQAQDRQLAGQLLRLRAQLHRLKMDQACHLHQELLDEAELELELEPGAGLALAPLLRHLGLTRMNISARRFTLC"
6+
},
7+
"ENSG00000253293": {
8+
"ENSP00000283921": "MSARKGYLLPSPNYPTTMSCSESPAANSFLVDSLISSGRGEAGGGGGGAGGGGGGGYYAHGGVYLPPAADLPYGLQSCGLFPTLGGKRNEAASPGSGGGGGGLGPGAHGYGPSPIDLWLDAPRSCRMEPPDGPPPPPQQQPPPPPQPPQPAPQATSCSFAQNIKEESSYCLYDSADKCPKVSATAAELAPFPRGPPPDGCALGTSSGVPVPGYFRLSQAYGTAKGYGSGGGGAQQLGAGPFPAQPPGRGFDLPPALASGSADAARKERALDSPPPPTLACGSGGGSQGDEEAHASSSAAEELSPAPSESSKASPEKDSLGNSKGENAANWLTAKSGRKKRCPYTKHQTLELEKEFLFNMYLTRERRLEISRSVHLTDRQVKIWFQNRRMKLKKMNRENRIRELTANFNFS",
9+
"ENSP00000479619": "MSCSESPAANSFLVDSLISSGRGEAGGGGGGAGGGGGGGYYAHGGVYLPPAADLPYGLQSCGLFPTLGGKRNEAASPGSGGGGGGLGPGAHGYGPSPIDLWLDAPRSCRRRAATRGWPVPRAAPGARFRSPARASLRLGRCGPEGASPRFAAAPHAGLRQRRGLAGRRGGARVVLGRGGALPGPFREQQSLAGEGFPGQFQR",
10+
"ENSP00000379633": "MCQGNSKGENAANWLTAKSGRKKRCPYTKHQTLELEKEFLFNMYLTRERRLEISRSVHLTDRQVKIWFQNRRMKLKKMNRENRIRELTANFNFS"
11+
}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
{"ENSG00000126368": {
2+
"_id": "ENSG00000126368",
3+
"name": "NR1D1",
4+
"feature": "gene",
5+
"taxon_id": 9606,
6+
"chromosome": "17",
7+
"species": "homo_sapiens",
8+
"biotype": "protein_coding",
9+
"transcripts": {
10+
"ENSP00000246672": {
11+
"_id": "ENSP00000246672",
12+
"feature": "protein",
13+
"gene_id": "ENSG00000126368",
14+
"transcript_name": "NR1D1-201",
15+
"transcript_id": "ENST00000246672",
16+
"taxon_id": 9606,
17+
"biotype": "protein_coding",
18+
"tags": [
19+
"CCDS",
20+
"basic",
21+
"complete"
22+
],
23+
"tsl": 1
24+
}
25+
}
26+
},
27+
"ENSG00000183615": {
28+
"_id": "ENSG00000183615",
29+
"name": "FAM167B",
30+
"feature": "gene",
31+
"taxon_id": 9606,
32+
"chromosome": "1",
33+
"species": "homo_sapiens",
34+
"biotype": "protein_coding",
35+
"transcripts": {
36+
"ENSP00000362684": {
37+
"_id": "ENSP00000362684",
38+
"feature": "protein",
39+
"gene_id": "ENSG00000183615",
40+
"transcript_name": "FAM167B-201",
41+
"transcript_id": "ENST00000373582",
42+
"taxon_id": 9606,
43+
"biotype": "protein_coding",
44+
"tags": [
45+
"CCDS",
46+
"basic",
47+
"complete"
48+
],
49+
"tsl": 1
50+
}
51+
}
52+
},
53+
"ENSG00000253293": {
54+
"_id": "ENSG00000253293",
55+
"name": "HOXA10",
56+
"feature": "gene",
57+
"taxon_id": 9606,
58+
"chromosome": "7",
59+
"species": "homo_sapiens",
60+
"biotype": "protein_coding",
61+
"transcripts": {
62+
"ENSP00000283921": {
63+
"_id": "ENSP00000283921",
64+
"feature": "protein",
65+
"gene_id": "ENSG00000253293",
66+
"transcript_name": "HOXA10-201",
67+
"transcript_id": "ENST00000283921",
68+
"taxon_id": 9606,
69+
"biotype": "protein_coding",
70+
"tags": [
71+
"CCDS",
72+
"basic",
73+
"complete"
74+
],
75+
"tsl": 1
76+
},
77+
"ENSP00000479619": {
78+
"_id": "ENSP00000479619",
79+
"feature": "protein",
80+
"gene_id": "ENSG00000253293",
81+
"transcript_name": "HOXA10-206",
82+
"transcript_id": "ENST00000613671",
83+
"taxon_id": 9606,
84+
"biotype": "protein_coding",
85+
"tags": [
86+
"basic",
87+
"complete"
88+
],
89+
"tsl": 5
90+
},
91+
"ENSP00000379633": {
92+
"_id": "ENSP00000379633",
93+
"feature": "protein",
94+
"gene_id": "ENSG00000253293",
95+
"transcript_name": "HOXA10-202",
96+
"transcript_id": "ENST00000396344",
97+
"taxon_id": 9606,
98+
"biotype": "protein_coding",
99+
"tags": [
100+
"basic",
101+
"complete"
102+
],
103+
"tsl": 1
104+
}
105+
}
106+
}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"ENSG00000253293": "ENSG00000253293|ENSP00000283921|9606\tENSG00000253293|ENSP00000479619|9606\nENSG00000253293|ENSP00000283921|9606\tENSG00000253293|ENSP00000379633|9606\nENSG00000253293|ENSP00000479619|9606\tENSG00000253293|ENSP00000379633|9606"}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
>ENSG00000126368|ENSP00000246672|9606
2+
MTTLDSNNNTGGVITYIGSSGSSPSRTSPESLYSDNSNGSFQSLTQGCPTYFPPSPTGSLTQDPARSFGSIPPSLSDDGSPSSSSSSSSSSSSFYNGSPPGSLQVAMEDSSRVSPSKSTSNITKLNGMVLLCKVCGDVASGFHYGVHACEGCKGFFRRSIQQNIQYKRCLKNENCSIVRINRNRCQQCRFKKCLSVGMSRDAVRFGRIPKREKQRMLAEMQSAMNLANNQLSSQCPLETSPTQHPTPGPMGPSPPPAPVPSPLVGFSQFPQQLTPPRSPSPEPTVEDVISQVARAHREIFTYAHDKLGSSPGNFNANHASGSPPATTPHRWENQGCPPAPNDNNTLAAQRHNEALNGLRQAPSSYPPTWPPGPAHHSCHQSNSNGHRLCPTHVYAAPEGKAPANSPRQGNSKNVLLACPMNMYPHGRSGRTVQEIWEDFSMSFTPAVREVVEFAKHIPGFRDLSQHDQVTLLKAGTFEVLMVRFASLFNVKDQTVMFLSRTTYSLQELGAMGMGDLLSAMFDFSEKLNSLALTEEELGLFTAVVLVSADRSGMENSASVEQLQETLLRALRALVLKNRPLETSRFTKLLLKLPDLRTLNNMHSEKLLSFRVDAQ
3+
>ENSG00000183615|ENSP00000362684|9606
4+
MSLGLLKFQAVGEEDEEDEEGESLDSVKALTAKLQLQTRRPSYLEWTAQVQSQAWRRAQAKPGPGGPGDICGFDSMDSALEWLRRELREMQAQDRQLAGQLLRLRAQLHRLKMDQACHLHQELLDEAELELELEPGAGLALAPLLRHLGLTRMNISARRFTLC
5+
>ENSG00000253293|ENSP00000283921|9606
6+
MSARKGYLLPSPNYPTTMSCSESPAANSFLVDSLISSGRGEAGGGGGGAGGGGGGGYYAHGGVYLPPAADLPYGLQSCGLFPTLGGKRNEAASPGSGGGGGGLGPGAHGYGPSPIDLWLDAPRSCRMEPPDGPPPPPQQQPPPPPQPPQPAPQATSCSFAQNIKEESSYCLYDSADKCPKVSATAAELAPFPRGPPPDGCALGTSSGVPVPGYFRLSQAYGTAKGYGSGGGGAQQLGAGPFPAQPPGRGFDLPPALASGSADAARKERALDSPPPPTLACGSGGGSQGDEEAHASSSAAEELSPAPSESSKASPEKDSLGNSKGENAANWLTAKSGRKKRCPYTKHQTLELEKEFLFNMYLTRERRLEISRSVHLTDRQVKIWFQNRRMKLKKMNRENRIRELTANFNFS
7+
>ENSG00000253293|ENSP00000479619|9606
8+
MSCSESPAANSFLVDSLISSGRGEAGGGGGGAGGGGGGGYYAHGGVYLPPAADLPYGLQSCGLFPTLGGKRNEAASPGSGGGGGGLGPGAHGYGPSPIDLWLDAPRSCRRRAATRGWPVPRAAPGARFRSPARASLRLGRCGPEGASPRFAAAPHAGLRQRRGLAGRRGGARVVLGRGGALPGPFREQQSLAGEGFPGQFQR
9+
>ENSG00000253293|ENSP00000379633|9606
10+
MCQGNSKGENAANWLTAKSGRKKRCPYTKHQTLELEKEFLFNMYLTRERRLEISRSVHLTDRQVKIWFQNRRMKLKKMNRENRIRELTANFNFS
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#linearized
2+
Pfam
3+
#normal
4+
fLPS
5+
COILS2
6+
SEG
7+
SignalP
8+
TMHMM
9+
#checked
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
geneID ncbiID orthoID FAS_F FAS_B
2+
ENSG00000126368|ENSP00000246672|9606 ncbi9606 ENSG00000126368|HOMGRP81c552f7c12b|9606 0.0 0.0
3+
ENSG00000183615|ENSP00000362684|9606 ncbi9606 ENSG00000183615|HOM4b9cf06933c8e78|9606 0.1368 0.9986
4+
ENSG00000253293|ENSP00000283921|9606 ncbi9606 ENSG00000253293|ENSP00000479619|9606 0.3324 0.6492
5+
ENSG00000253293|ENSP00000283921|9606 ncbi9606 ENSG00000253293|ENSP00000379633|9606 0.371 0.9274
6+
ENSG00000253293|ENSP00000479619|9606 ncbi9606 ENSG00000253293|ENSP00000379633|9606 0.0 0.0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{"ENSG00000126368": "000000160.json",
2+
"ENSG00000183615": "000000004.json",
3+
"ENSG00000253293": "000000071.json"}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{"ENSG00000183615": {
2+
"ENSP00000362684": {
3+
"ENSP00000362684": 1.0,
4+
"HOMnoORFGroup": 0.0,
5+
"HOM4b9cf06933c8e78": 0.9986
6+
},
7+
"HOMnoORFGroup": {
8+
"ENSP00000362684": 0.0,
9+
"HOMnoORFGroup": 1.0,
10+
"HOM4b9cf06933c8e78": 0.0
11+
},
12+
"HOM4b9cf06933c8e78": {
13+
"ENSP00000362684": 0.1368,
14+
"HOMnoORFGroup": 0.0,
15+
"HOM4b9cf06933c8e78": 1.0
16+
}
17+
}}

0 commit comments

Comments
 (0)