Skip to content

conkit-plot peval fails to match sequences #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sadiogo opened this issue Jan 18, 2022 · 50 comments · Fixed by #99
Closed

conkit-plot peval fails to match sequences #96

sadiogo opened this issue Jan 18, 2022 · 50 comments · Fixed by #99

Comments

@sadiogo
Copy link

sadiogo commented Jan 18, 2022

General Information

  • ConKit version: 0.12.0
  • Python version: 3.6.9
  • Environment (if applicable): WSL

Example

A minimal example to reproduce the error:

../.local/bin/conkit-plot cmap conkit.jones jones conkit.rr casprr
or
../.local/bin/conkit-plot cmap conkit.jones jones conkit.mat ccmpred

The .jones, .rr and .mat file were generated from the conkit-predict script, which worked fine. I used them to exclude the possibility that my original files were the problem; my original goal was use a reference structure to determine which contacts were true positives, but I was getting a traceback. So I tried just plotting without the reference structure and I got the same traceback.

Traceback

The Python traceback

File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 441, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 283, in main
    con.set_sequence_register()
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/core/contactmap.py", line 503, in set_sequence_register
    c.res1 = self.sequence.seq[res1_index - 1]
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/core/contact.py", line 268, in res1
    self._res1 = Contact._set_residue(amino_acid)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/core/contact.py", line 483, in _set_residue
    raise ValueError("Unknown amino acid: {} (assert all is uppercase!)".format(amino_acid))
ValueError: Unknown amino acid: - (assert all is uppercase!)
@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 19, 2022

Hi @sadiogo, I think the problem here might be that you are providing a MSA file (conkit.jones) instead of a FASTA file with the sequence of the protein of interest. Have you tried:

../.local/bin/conkit-plot cmap conkit.fasta fasta conkit.rr casprr

Where conkit.fasta is the FASTA file that you used as input for conkit-predict. Please let me know whether this solves the issue or the error persists.

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

Hi @FilomenoSanchez, I did try that. I get the following traceback when I use a single fasta sequence (ungapped) :

  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 441, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 283, in main
    con.set_sequence_register()
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/core/contactmap.py", line 503, in set_sequence_register
    c.res1 = self.sequence.seq[res1_index - 1]
IndexError: string index out of range

When I use a single fasta sequence, but gapped, I get the same Unknown amino acid: - error.

By the way, I provided my own alignment when I used conkit-predict to generate the .mat and .rr files. Does this affect anything?

@FilomenoSanchez
Copy link
Member

The traceback you have shown in your last comment occurs when the residue numbering in the contact map and the sequence do not match, particularly when the sequence in the FASTA file is shorter than it should. For example, the sequence in the FASTA file has 100 residues but the residue numbers in the contact map go all the way up to 200. I think this might have occurred because you provided a gapped alignment as input for conkit-predict, and the numbering in the resulting map was shifted because of these gaps. This might be something I'll have to fix later, but for the time being I think you should be able to fix this doing the following:

  1. Take the single sequence FASTA file (but gapped) and replace gap characters "-" with "X", which stands for unknown residue.
  2. Save the file conkit_modified.fasta and repeat the command:
    ../.local/bin/conkit-plot cmap conkit_modified.fasta fasta conkit.rr casprr

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

Thanks, that worked!

In retrospect, I had a gapped fasta alignment which I converted to a3m using conkit-convert, and that is what I used as input for conkit-predict. I opened the a3m file now and saw that the first sequence has gaps in it, which is not the way a3m files usually work, right? Perhaps there is a bug when converting fasta alignments to a3m?

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 20, 2022

I am not sure whether A3M format specifications explicitly prohibit gaps in the first sequence of the alignment? In any case, it is not trivial to remove gaps from the template sequence when converting from FASTA to A3M, as this would require to recalculate the alignment with all the sequences included in the MSA all over again (AFAIK), and it's well beyond conkit's original purpose. I think the only realistic fix here would be to make conkit interpret gaps in the FASTA sequence just as if they were unknown residues.

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

My knowledge on a3m files is entirely based on my experience using the MPI Bioinformatics Toolkit. From what I've seen, the first sequence is always gapless (which is propitious, since it is generally the query sequence). In any case, they have a service there called FormatSeq which can convert fasta alignments to a3m. I'm not sure the source code for that is available in the hhsuite, but you can reference people to the MPI Bioinformatics Toolkit if need be.

@FilomenoSanchez
Copy link
Member

Yes indeed it seems that the A3M includes a gapless query sequence (my knowledge about this format is also limited). The hhstuite provides a script reformat.pl capable of dealing with this, it shouldn't be too difficult to reverse-engineer it into python and add it to conkit, I will open another issue for this. Since your original problem seems to be fixed now, I will close this issue but feel free to re-open it if otherwise.

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

One last question, @FilomenoSanchez. When plotting the reference structure to assess false positive contacts, the structure being added has to satisfy one or both of these conditions (?):

  1. The number of residues in the PDB must be equal to the query sequence length?
  2. The PDB file must be a structure model of the query sequence, i.e., they must have identical sequences?

I think these questions also apply to ConPlot? (very nice server, btw)

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 20, 2022

@sadiogo

In the case of ConPlot:

  1. The number of residues does not need to be identical, what is important is that the sequence has at least as many residues as the PDB file.
  2. The residue numbers in the PDB structure have to match the order of residues in the provided sequence. This way we account for gaps in the protein model. For example, for the following FASTA sequence:
 > sequence
MAVFY

You would need a PDB with the following residue numbers:

 1   2   3   4   5
MET ALA VAL PHE TYR

However, a PDB with missing residues 2-3 would still be valid, as long as the residue numbering is consistent:

 1   4   5
MET PHE TYR

Also please note that ConPlot cannot interpret gaps in the FASTA sequence. Conkit is a bit more clever dealing with gaps in PDB files, but I would generally recommend following the logic explained here when working with any of these two tools.

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

Okay, so from what you told me, this should work(?):

>sequence
MAVFYDD

PDB:

 1   4   5 
ILE TYR TRP

Which is equivalent to saying:

SEQ: MAVFYDD
PDB: I--YW--

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 20, 2022

Yes, this should work (if you have noticed it doesn't work in ConPlot, please open an issue at rigdenlab/conplot). Is the residue sequence mismatch intentional (residue 1 is MET in the FASTA and ILE in PDB)? The residue names are parsed from the FASTA sequence and are used in the tooltip display that appears when you hover over contacts and the diagonal of the plot (in ConPlot), so that information won't match the PDB (however this input will still produce a contact map).

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

Yes, it's intentional. I was simulating the situation wherein I have predicted a contact map from an alignment of homologous sequences and want to compare it with the contact map of a distant homologous structure (e.g., <30% identity). In this case, I just need to trim any eventual insertions that the template structure may have (be them at the N- or C-terminus, or within the structure). The a3m format is especially good for detecting these insertions because they will appear as lowercase letters in the alignment between the query sequence and the template structure.

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

It didn't work in both Conkit and ConPlot; I've opened a separate issue in the Conplot project.
In conkit, when I run:

conkit-plot cmap -p 4e7n_mod.pdb -pf pdb alignment.a3m a3m conkit.mat ccmpred

That tries to simulate the following situation:

>sequence
MAVFYD

PDB:

 1   2   3  4   5   6 
ILE SER THR TYR TRP GLU

Which is equivalent to saying:

SEQ: MAVFYD
PDB: ISTYWE

I get the following traceback:

  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 441, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 303, in main
    reference = conkit.io.read(args.reffile, args.refformat)[0]
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/io/__init__.py", line 131, in read
    hierarchy = parser_in.read(f_in, **kwargs)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/io/pdb.py", line 288, in read
    return self._read(structure, f_id, distance_cutoff, atom_type)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/io/pdb.py", line 172, in _read
    contact_map.add(contact)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/core/entity.py", line 218, in add
    raise ValueError("%s defined twice" % str(entity.id))
ValueError: (35, 38) defined twice

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 20, 2022

You would usually get this error when there are multiple residues sharing the same residue number in the PDB. In this case there are several residues with repeated numbers (38, 95, 172, 173, 186 and 221) which are causing the problem. They have been added using insertion codes, but conkit does not support this. What I would suggest here is that you re-assign the residue numbers in the PDB file so that it matches the sequence in the FASTA file and there are no repeated residue numbers.

For the record, input files can be found here: rigdenlab/conplot#139

@sadiogo
Copy link
Author

sadiogo commented Jan 20, 2022

It worked in ConPlot, but in Conkit I still get a traceback:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 441, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/command_line/conkit_plot.py", line 309, in main
    con_matched = con_sliced.match(reference, match_other=True, renumber=True, remove_unmatched=True)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/core/contactmap.py", line 811, in match
    contact_map1 = ContactMap._renumber(contact_map1, contact_map1_keymap, contact_map2_keymap)
  File "/home/sadiogo/.local/lib/python3.6/site-packages/conkit/core/contactmap.py", line 1080, in _renumber
    raise ValueError("Should never get here")
ValueError: Should never get here

It appears I've gone somewhere forbidden lol

For plot purposes, ConPlot solved my problem. But I'm insisting on using Conkit because I want to run the Precision Evaluation script.

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 21, 2022

This is a bit strange, I wasn't able to reproduce this error. I am sending attached the data (data.zip) I used and the plot I created when I ran the following command:

python3 -m conkit.command_line.conkit_plot cmap -p 4e7n_renumbered.pdb -pf pdb seq.fasta a3m conkit.mat ccmpred

I am using conkit 0.12.0 on python 3.8. What version of conkit and python are you using? You can check the exact conkit version if you open a python terminal and type the following:

import conkit
conkit.__version__

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

I reproduced your call using those files and got the same error.
I'm using Python 3.6.9 and conkit 0.12.0.

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

I've installed python 3.8 and attempted to install conkit, but I get the following error:

Building wheels for collected packages: conkit
  Building wheel for conkit (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-ykdbqezn
       cwd: /tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/
  Complete output (116 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/conkit
  copying conkit/version.py -> build/lib.linux-x86_64-3.8/conkit
  copying conkit/__init__.py -> build/lib.linux-x86_64-3.8/conkit
  creating build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/cdhit.py -> build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/ccmpred.py -> build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/bbcontacts.py -> build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/hhblits.py -> build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/hhfilter.py -> build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/psicov.py -> build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/__init__.py -> build/lib.linux-x86_64-3.8/conkit/applications
  copying conkit/applications/jackhmmer.py -> build/lib.linux-x86_64-3.8/conkit/applications
  creating build/lib.linux-x86_64-3.8/conkit/command_line
  copying conkit/command_line/conkit_msatool.py -> build/lib.linux-x86_64-3.8/conkit/command_line
  copying conkit/command_line/conkit_convert.py -> build/lib.linux-x86_64-3.8/conkit/command_line
  copying conkit/command_line/conkit_precision.py -> build/lib.linux-x86_64-3.8/conkit/command_line
  copying conkit/command_line/conkit_predict.py -> build/lib.linux-x86_64-3.8/conkit/command_line
  copying conkit/command_line/conkit_plot.py -> build/lib.linux-x86_64-3.8/conkit/command_line
  copying conkit/command_line/__init__.py -> build/lib.linux-x86_64-3.8/conkit/command_line
  creating build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/entity.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/sequence.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/sequencefile.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/contactmap.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/mappings.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/struct.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/__init__.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/contact.py -> build/lib.linux-x86_64-3.8/conkit/core
  copying conkit/core/contactfile.py -> build/lib.linux-x86_64-3.8/conkit/core
  creating build/lib.linux-x86_64-3.8/conkit/core/ext
  copying conkit/core/ext/__init__.py -> build/lib.linux-x86_64-3.8/conkit/core/ext
  creating build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/membrain.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/comsat.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/ncont.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/pcons.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/casp.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/_parser.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/ccmpred.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/jones.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/epcmap.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/stockholm.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/mapalign.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/fasta.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/pdb.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/rosetta.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/clustal.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/freecontact.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/aleigen.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/bbcontacts.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/a2m.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/bclcontact.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/a3m.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/plmdca.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/psicov.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/evfold.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/__init__.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/gremlin.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/_iotools.py -> build/lib.linux-x86_64-3.8/conkit/io
  copying conkit/io/_cache.py -> build/lib.linux-x86_64-3.8/conkit/io
  creating build/lib.linux-x86_64-3.8/conkit/misc
  copying conkit/misc/energyfunction.py -> build/lib.linux-x86_64-3.8/conkit/misc
  copying conkit/misc/distances.py -> build/lib.linux-x86_64-3.8/conkit/misc
  copying conkit/misc/bandwidth.py -> build/lib.linux-x86_64-3.8/conkit/misc
  copying conkit/misc/selector.py -> build/lib.linux-x86_64-3.8/conkit/misc
  copying conkit/misc/__init__.py -> build/lib.linux-x86_64-3.8/conkit/misc
  copying conkit/misc/selectalg.py -> build/lib.linux-x86_64-3.8/conkit/misc
  creating build/lib.linux-x86_64-3.8/conkit/misc/ext
  copying conkit/misc/ext/__init__.py -> build/lib.linux-x86_64-3.8/conkit/misc/ext
  creating build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/precisionevaluation.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/contactmapchord.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/contactmapmatrix.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/figure.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/contactdensity.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/tools.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/contactmap.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/sequencecoverage.py -> build/lib.linux-x86_64-3.8/conkit/plot
  copying conkit/plot/__init__.py -> build/lib.linux-x86_64-3.8/conkit/plot
  running egg_info
  writing conkit.egg-info/PKG-INFO
  writing dependency_links to conkit.egg-info/dependency_links.txt
  writing requirements to conkit.egg-info/requires.txt
  writing top-level names to conkit.egg-info/top_level.txt
  reading manifest file 'conkit.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'conkit.egg-info/SOURCES.txt'
  copying conkit/core/ext/c_contactmap.pyx -> build/lib.linux-x86_64-3.8/conkit/core/ext
  copying conkit/core/ext/c_sequencefile.pyx -> build/lib.linux-x86_64-3.8/conkit/core/ext
  copying conkit/misc/ext/c_bandwidth.pyx -> build/lib.linux-x86_64-3.8/conkit/misc/ext
  running build_ext
  cythoning conkit/core/ext/c_contactmap.pyx to conkit/core/ext/c_contactmap.c
  /home/sadiogo/.local/lib/python3.8/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/conkit/core/ext/c_contactmap.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  cythoning conkit/core/ext/c_sequencefile.pyx to conkit/core/ext/c_sequencefile.c
  /home/sadiogo/.local/lib/python3.8/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/conkit/core/ext/c_sequencefile.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  cythoning conkit/misc/ext/c_bandwidth.pyx to conkit/misc/ext/c_bandwidth.c
  /home/sadiogo/.local/lib/python3.8/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/conkit/misc/ext/c_bandwidth.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  building 'conkit.core.ext.c_contactmap' extension
  creating build/temp.linux-x86_64-3.8
  creating build/temp.linux-x86_64-3.8/conkit
  creating build/temp.linux-x86_64-3.8/conkit/core
  creating build/temp.linux-x86_64-3.8/conkit/core/ext
  x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/sadiogo/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -c conkit/core/ext/c_contactmap.c -o build/temp.linux-x86_64-3.8/conkit/core/ext/c_contactmap.o -O3 -ffast-math -march=native -pipe -fopenmp
  conkit/core/ext/c_contactmap.c:6:10: fatal error: Python.h: No such file or directory
   #include "Python.h"
            ^~~~~~~~~~
  compilation terminated.
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for conkit
Running setup.py clean for conkit
Failed to build conkit
Installing collected packages: conkit
    Running setup.py install for conkit ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-y3xtj15b/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/sadiogo/.local/include/python3.8/conkit
         cwd: /tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/
    Complete output (110 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/conkit
    copying conkit/version.py -> build/lib.linux-x86_64-3.8/conkit
    copying conkit/__init__.py -> build/lib.linux-x86_64-3.8/conkit
    creating build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/cdhit.py -> build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/ccmpred.py -> build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/bbcontacts.py -> build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/hhblits.py -> build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/hhfilter.py -> build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/psicov.py -> build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/__init__.py -> build/lib.linux-x86_64-3.8/conkit/applications
    copying conkit/applications/jackhmmer.py -> build/lib.linux-x86_64-3.8/conkit/applications
    creating build/lib.linux-x86_64-3.8/conkit/command_line
    copying conkit/command_line/conkit_msatool.py -> build/lib.linux-x86_64-3.8/conkit/command_line
    copying conkit/command_line/conkit_convert.py -> build/lib.linux-x86_64-3.8/conkit/command_line
    copying conkit/command_line/conkit_precision.py -> build/lib.linux-x86_64-3.8/conkit/command_line
    copying conkit/command_line/conkit_predict.py -> build/lib.linux-x86_64-3.8/conkit/command_line
    copying conkit/command_line/conkit_plot.py -> build/lib.linux-x86_64-3.8/conkit/command_line
    copying conkit/command_line/__init__.py -> build/lib.linux-x86_64-3.8/conkit/command_line
    creating build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/entity.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/sequence.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/sequencefile.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/contactmap.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/mappings.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/struct.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/__init__.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/contact.py -> build/lib.linux-x86_64-3.8/conkit/core
    copying conkit/core/contactfile.py -> build/lib.linux-x86_64-3.8/conkit/core
    creating build/lib.linux-x86_64-3.8/conkit/core/ext
    copying conkit/core/ext/__init__.py -> build/lib.linux-x86_64-3.8/conkit/core/ext
    creating build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/membrain.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/comsat.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/ncont.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/pcons.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/casp.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/_parser.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/ccmpred.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/jones.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/epcmap.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/stockholm.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/mapalign.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/fasta.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/pdb.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/rosetta.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/clustal.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/freecontact.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/aleigen.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/bbcontacts.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/a2m.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/bclcontact.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/a3m.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/plmdca.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/psicov.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/evfold.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/__init__.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/gremlin.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/_iotools.py -> build/lib.linux-x86_64-3.8/conkit/io
    copying conkit/io/_cache.py -> build/lib.linux-x86_64-3.8/conkit/io
    creating build/lib.linux-x86_64-3.8/conkit/misc
    copying conkit/misc/energyfunction.py -> build/lib.linux-x86_64-3.8/conkit/misc
    copying conkit/misc/distances.py -> build/lib.linux-x86_64-3.8/conkit/misc
    copying conkit/misc/bandwidth.py -> build/lib.linux-x86_64-3.8/conkit/misc
    copying conkit/misc/selector.py -> build/lib.linux-x86_64-3.8/conkit/misc
    copying conkit/misc/__init__.py -> build/lib.linux-x86_64-3.8/conkit/misc
    copying conkit/misc/selectalg.py -> build/lib.linux-x86_64-3.8/conkit/misc
    creating build/lib.linux-x86_64-3.8/conkit/misc/ext
    copying conkit/misc/ext/__init__.py -> build/lib.linux-x86_64-3.8/conkit/misc/ext
    creating build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/precisionevaluation.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/contactmapchord.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/contactmapmatrix.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/figure.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/contactdensity.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/tools.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/contactmap.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/sequencecoverage.py -> build/lib.linux-x86_64-3.8/conkit/plot
    copying conkit/plot/__init__.py -> build/lib.linux-x86_64-3.8/conkit/plot
    running egg_info
    writing conkit.egg-info/PKG-INFO
    writing dependency_links to conkit.egg-info/dependency_links.txt
    writing requirements to conkit.egg-info/requires.txt
    writing top-level names to conkit.egg-info/top_level.txt
    reading manifest file 'conkit.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'conkit.egg-info/SOURCES.txt'
    copying conkit/core/ext/c_contactmap.pyx -> build/lib.linux-x86_64-3.8/conkit/core/ext
    copying conkit/core/ext/c_sequencefile.pyx -> build/lib.linux-x86_64-3.8/conkit/core/ext
    copying conkit/misc/ext/c_bandwidth.pyx -> build/lib.linux-x86_64-3.8/conkit/misc/ext
    running build_ext
    skipping 'conkit/core/ext/c_contactmap.c' Cython extension (up-to-date)
    skipping 'conkit/core/ext/c_sequencefile.c' Cython extension (up-to-date)
    skipping 'conkit/misc/ext/c_bandwidth.c' Cython extension (up-to-date)
    building 'conkit.core.ext.c_contactmap' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/conkit
    creating build/temp.linux-x86_64-3.8/conkit/core
    creating build/temp.linux-x86_64-3.8/conkit/core/ext
    x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/sadiogo/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -c conkit/core/ext/c_contactmap.c -o build/temp.linux-x86_64-3.8/conkit/core/ext/c_contactmap.o -O3 -ffast-math -march=native -pipe -fopenmp
    conkit/core/ext/c_contactmap.c:6:10: fatal error: Python.h: No such file or directory
     #include "Python.h"
              ^~~~~~~~~~
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ifik9dmj/conkit_44e39083b73346fcb021bb7f44fbdbeb/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-y3xtj15b/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/sadiogo/.local/include/python3.8/conkit Check the logs for full command output.

@FilomenoSanchez
Copy link
Member

You are missing some packages, you can fix that using sudo apt-get install python3-dev (assuming you are running a debian based distro)

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

Worked! But for some reason I had to use sudo apt-get install python3.8-dev instead of just python-dev.

I reran

python3 -m conkit.command_line.conkit_plot cmap -p 4e7n_renumbered.pdb -pf pdb seq.fasta a3m conkit.mat ccmpred

and still get the weird traceback:

  File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_plot.py", line 441, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_plot.py", line 309, in main
    con_matched = con_sliced.match(reference, match_other=True, renumber=True, remove_unmatched=True)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/core/contactmap.py", line 811, in match
    contact_map1 = ContactMap._renumber(contact_map1, contact_map1_keymap, contact_map2_keymap)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/core/contactmap.py", line 1080, in _renumber
    raise ValueError("Should never get here")
ValueError: Should never get here

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 21, 2022

Could you try using the latest commit instead of version 0.12? You can replace your current version with the latest commit doing the following (you might need to create the directory ~/opt/ or you can also choose whatever directory you want to keep the cloned repo with the source code):

cd ~/opt/
git clone https://github.com/rigdenlab/conkit
cd $(python3.8 -c "import conkit ; import os ; print(os.path.dirname(conkit.__file__))")
mv conkit conkit_v0.12
ln -s ~/opt/conkit/conkit ./

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

I had to do cd .. before mv conkit conkit_v0.12, but it worked! Finally got the plot generated, in both python3.8 and python3.6.

Thanks @FilomenoSanchez !

@FilomenoSanchez
Copy link
Member

Excellent! Not sure what caused the error but I suppose it will get fixed with the next version release.

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

Ah! Now I tried running:

python3 -m conkit.command_line.conkit_plot peval -j 0.1 -min 0 -max 2.0 4e7n_renumbered.pdb pdb seq.fasta a3m co
nkit.mat ccmpred

and I got the traceback again:

File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_plot.py", line 549, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_plot.py", line 524, in main
    con_matched = con.match(pdb, renumber=True, remove_unmatched=True)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/core/contactmap.py", line 855, in match
    contact_map1 = ContactMap._renumber(contact_map1, contact_map1_keymap, contact_map2_keymap)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/core/contactmap.py", line 1162, in _renumber
    raise ValueError("Should never get here")
ValueError: Should never get here

@FilomenoSanchez
Copy link
Member

This time I was able to reproduce the error, I will need some time to figure out what is going on here... I'll come back to you when I have something.

@FilomenoSanchez FilomenoSanchez changed the title conkit-plot.py has a problem dealing with gaps ~~conkit-plot.py has a problem dealing with gaps~~ conkit-plot peval fails to match sequences Jan 21, 2022
@FilomenoSanchez FilomenoSanchez changed the title ~~conkit-plot.py has a problem dealing with gaps~~ conkit-plot peval fails to match sequences ~~conkit-plot.py has a problem dealing with gaps conkit-plot peval fails to match sequences Jan 21, 2022
@FilomenoSanchez FilomenoSanchez changed the title ~~conkit-plot.py has a problem dealing with gaps conkit-plot peval fails to match sequences conkit-plot peval fails to match sequences Jan 21, 2022
@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

I believe the peval tag eventually calls the conkit-precision script, right? So I ran:

python3 -m conkit.command_line.conkit_precision 4e7n_renumbered.pdb pdb seq.fasta a3m conkit.mat ccmpred

and was able to get a precision score at least (although the score seem a bit too 'precise' given the plot):

Min sequence separation for contacting residues: 5
Contact list cutoff factor: 1.000000 * L
/home/sadiogo/.local/lib/python3.8/site-packages/conkit/core/contactmap.py:253: UserWarning: Some contacts between the ContactMaps are unmatched due to non-identical sequences. The precision value might be inaccurate.
  warnings.warn(
Precision score: 0.941441

@FilomenoSanchez
Copy link
Member

This is a bug, I located the source of the problem and I will be pushing a fix at some point later today. Once I do this you will be able to fix this by pulling the latest commit to the repository you cloned on your machine. If you don't want to wait until I push the fix, you can fix this in your local repository with the following changes:

diff --git a/conkit/command_line/conkit_plot.py b/conkit/command_line/conkit_plot.py
index 40bdc95..14f5f66 100644
--- a/conkit/command_line/conkit_plot.py
+++ b/conkit/command_line/conkit_plot.py
@@ -520,6 +520,8 @@ def main(argv=None):
         else:
             pdb = conkit.io.read(args.pdbfile, "pdb")[0]
 
+        pdb.sequence = seq
+        pdb.set_sequence_register()
         pdb = pdb.as_contactmap()
         con_matched = con.match(pdb, renumber=True, remove_unmatched=True)

@FilomenoSanchez
Copy link
Member

I suspect the conkit-precision script has a similar bug, it also needs the following changes:

diff --git a/conkit/command_line/conkit_precision.py b/conkit/command_line/conkit_precision.py
index c27b548..fd94ebf 100644
--- a/conkit/command_line/conkit_precision.py
+++ b/conkit/command_line/conkit_precision.py
@@ -79,7 +79,11 @@ def main():
         pdb = conkit.io.read(args.pdbfile, args.pdbformat)[args.pdbchain]
     else:
         pdb = conkit.io.read(args.pdbfile, args.pdbformat)[0]
+
     seq = conkit.io.read(args.seqfile, args.seqformat)[0]
+    pdb.sequence = seq
+    pdb.set_sequence_register()
+    pdb = pdb.as_contactmap()
     con = conkit.io.read(args.confile, args.conformat)[0]
 
     con.sequence = seq

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

I will try modifying those. But before I do, I've come up on another issue. Instead of plotting the query sequence against a distant homologous structure, I attempted to plot it against a structure model (thus, same sequence as query) built by using the distant structure as template. I ran the following:

python3 -m conkit.command_line.conkit_plot cmap -p sw
iss_hissa_model.pdb -pf pdb Aa_helicase_DCR1.fasta a3m HiSSA-raptor-results.txt casp

and I got the following traceback:

  File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_plot.py", line 549, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_plot.py", line 328, in main
    con = conkit.io.read(args.confile, args.conformat)[0]
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/io/__init__.py", line 144, in read
    hierarchy = parser_in.read(f_in, **kwargs)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/io/casp.py", line 124, in read
    res1_entry, res2_entry, lb, ub, raw_score = RE_SPLIT.split(line)
ValueError: too many values to unpack (expected 5)

The same thing happens when using peval -j 0.1 -min 0 -max 2.0.

I've attached those files should you want to test it yourself.
hissa-data.zip

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 21, 2022

You are not specifying the correct formats, the contact predictions that you are trying to use are in the CASP MODE 2 format (these are actually inter-residue distance predictions). Also the sequence is in FASTA format not A3M. The command should be:

python3 -m conkit.command_line.conkit_plot cmap -p swiss_hissa_model.pdb -pf pdb Aa_helicase_DCR1.fasta fasta HiSSA-raptor-results.txt caspmode2

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

I had tried using the caspmode2 format before in ConPlot and I received an error message saying:

The file you just attempted to upload does not comply with the file format guidelines. Make sure that you are uploading the correct file and you have selected the correct format. If you are not sure about how the file format looks like, you can read about each format on our help page. If you are sure that the format of your file is correct, please report the bug on the 'Get in touch with us' tab.

But when I upload using the CASPRR-MODE 1 format, it works. Because of this I didn't attempt using capmode2 in Conkit. Give it a try yourself trying to upload that distance file in ConPlot with caspmode2.

As for the fasta and a3m format, they are essentially the same if you have just one sequence in the file, so the a3m tags works too. You've used it yourself for fasta file lol:

This is a bit strange, I wasn't able to reproduce this error. I am sending attached the data (data.zip) I used and the plot I created when I ran the following command:

python3 -m conkit.command_line.conkit_plot cmap -p 4e7n_renumbered.pdb -pf pdb seq.fasta a3m conkit.mat ccmpred

I am using conkit 0.12.0 on python 3.8. What version of conkit and python are you using? You can check the exact conkit version if you open a python terminal and type the following:

import conkit
conkit.__version__

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 21, 2022

Yes the sequence format does not matter in this case. Regarding your input, the problem here is that in your predictions there are inter-residue distances that have probabilities higher than 1. ConPlot has stricter user-input sanity checks than Conkit, which is why the former complains and the latter doesn't. For example, line no. 4 in HiSSA-raptor-results.txt:

106 112 1.828 0.192 1.524 0.111 0.005 0.000 0.000 0.000 0.000 0.000 0.000

Distance bin p2 gets assigned a probability of 1.524, and p0 is 1.828. This doesn't make sense and it contradicts format specifications (probabilities have to be 0-1, and also p1+p2+...+p10=1.0).

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

That's strange. So the CASPRR-MODE1 in ConPlot accepts that anyway? That file was generated by Raptor Contact Predict, and they specifically say in here:

Contact result file
The contact prediction results in a '.RR' formatted file for each input protein sequence. This file follows the format used in CASP ROLL, click this link for details.

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 21, 2022

Yes ConPlot only has this sanity check when using CASPRR-MODE2 for two reasons:

  • From personal experience some files that claim to be formatted using CASPRR-MODE2 break a bunch of format rules. I still want to be able to plot the contact maps for these files (sometimes), which is why I decided not to run the same sanity check if the format is CASPRR-MODE1 (never thought someone would notice).
  • If you upload a distance file where some bins have been assigned probabilities higher than 1.0, the code breaks when the user tries to plot a distogram (for this step I really need no probability above 1.0). Which is why I check this in ConPlot but not in ConKit.

Regarding your file, it seems that it doesn't follow CASP ROLL as the authors claim:

 The (five column) RR format:

   i  j  d1  d2  p

Which clearly doesn't match this file:

PFRMAT RR
METHOD deep dilated convolutional residual neural networks by Jinbo Xu ([email protected])
MODEL 1
106 112 1.828 0.192 1.524 0.111 0.005 0.000 0.000 0.000 0.000 0.000 0.000
104 137 1.610 0.011 0.805 0.795 0.168 0.042 0.008 0.002 0.001 0.000 0.002
500 506 1.586 0.074 1.002 0.509 0.173 0.053 0.013 0.003 0.001 0.000 0.005

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

Yeah, makes no sense at all. They must've performed some type of normalization on the probabilities, just can't figure what it was. Or maybe they forgot to apply the Softmax function; by applying it you make p1+p2+...+p10=1.0.

Do you reckon this affects the Precision evaluation? I've changed the files as you showed me. Now when I run:

python3 -m conkit.command_line.conkit_precision 4e7n_renumbered.pdb pdb seq.fasta a3m conkit.mat ccmpred

I get:

Min sequence separation for contacting residues: 5
Contact list cutoff factor: 1.000000 * L
Precision score: 0.079295

which makes a lot more sense than the previous 0.941441.

I've also managed to produce the precision evaluation plots.

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 21, 2022

Probably this file format issue doesn't affect conkit-precision results. The only thing I can think of right now is that the contacts get sorted using p0 and that only the first L x cutoff contacts are selected. As long as it holds true that the higher p0 the more likely two residues are predicted to be in contact (distance < 8Å), then conkit-precision should work as expected, even if the top ranking contacts have a p0 > 1.0.

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

Same is true for plotting then, since the L x cutoff also applies there. I could write a script for applying the Softmax function to the 10 bins though, should I ever need the probabilities. The reason I use RaptorX Contact is because it let's me use my own MSAs for predicting contacts (and it also does a better job than CCMpred on shallow MSAs, provided they are curated),

@FilomenoSanchez
Copy link
Member

If they forgot to apply the softmax function as you suggest, then I think conkit-precision results might be affected. I'm not sure if it is then possible to compare p0 values across different residue pairs, making sorting contacts by p0 meaningless. But this really depends how the scores that come out of their DL algorithm behave (I have no idea, so I'm just speculating). I would suggest applying the softmax just to be safe.

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 21, 2022

Also, I would suggest trRosetta (https://yanglab.nankai.edu.cn/trRosetta/). It let's you provide a MSA as input, and from my personal experience it gives more accurate results than Raptor-X. I also never had issues with file formatting when using the trRosetta server. Both ConKit and ConPlot are compatible with the output created with trRosetta (the format is rosetta_npz, if you donwload the tar ball when the job is done, there should be a file named seq.npz inside that you can use as input).

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

Nice one! I'm trying that now. I typically use trRosetta for homology modeling, but since they have options "for not using templates and homologous sequences", I can also use them for the contact maps :)

@sadiogo
Copy link
Author

sadiogo commented Jan 21, 2022

Some feedback on trRosetta contact maps: the accuracy/precision skyrocketed! I found two setbacks though; first, I can't actually open the file to see what's on it. How to you do it? And second, the file is gigantic (200mb in my case) and I can't open it in Conplot Server because it takes too long.

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 23, 2022

Yes from my personal experience trRosetta produces better results than Raptor-X. If you want to obtain even more accurate results you could give a try to AlphaFold, which also produces distance predictions compatible with ConKit and ConPlot. You will need to get it installed and run it on your machine, it is also available in some google colab servers but they don't give back the distance predictions, only the protein model. I'm not sure if it accepts MSAs as input or what homology modelling options it has, but it is worth a try. Regarding your questions:

  • You cannot open the seq.npz file created by trRosetta with a normal text editor because it is binary. The .npz stands for a zipped file that contains several numpy.array objects. If you want to look what's inside the file, you can either load the array in python using np.load and check the contents or convert it into a text file format like casprr-mode-2 running conkit-convert seq.npz rosettanpz seq.casprr2 caspmode2

  • Yes you will struggle to upload such file to ConPlot, the request will time-out before such large file has time to get uploaded. There are two ways around this. First, which is what I would recommend, install ConPlot on your machine and upload the file to the server running on your localhost. This is much faster as the the app reads the data straight from the filesystem, instead of having to wait for a file transfer through the internet. The second option is to convert seq.npz into a lighter file format, like for example casprr-mode-2 if you want to preserve distance information. You can do this using conkit-convert as explained before.

@sadiogo
Copy link
Author

sadiogo commented Jan 23, 2022

I use AlphaFold quite a lot by means of the ColabFold notebooks, which allows adjusting many of the input parameters. The output provided there is also much better, and includes a .raw file that appears to be a contact prediction. In any case, the problem is AlphaFold is "too" precise. First, because it adds homologous sequences to the input MSA (in the advanced notebook it is possible to avoid this, but I haven't actually converted the msa.pickle output to fasta to confirm that the sequences are unchanged; I wrote a script for that but can't be able to find it anymore). Second, and most importantly, AlphaFold updates the MSA (and the distogram) during every iteration into the neural network's transformer (the 'Evoformer'). Basically, it optimizes the MSA to produce the best contact prediction possible. I don't want this, I want the contact prediction to be based entirely upon the original shallow MSA, without any modifications to its columns. So I guess trRosetta is better for my current purpose.

As for the npz file, I will try to convert with conkit and see how that goes! Thanks for all the help and suggestions!

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 23, 2022

I just merged the bug fix, you should be able to pull the latest commit and use conkit-plot peval without file modifications in your local repo. The changes are essentially the same to those I described earlier so there's probably no need for you to do this if you want to keep using your current version.

@sadiogo
Copy link
Author

sadiogo commented Jan 23, 2022

If you give me the exact instructions for merging, I'll do it. That way I'll also fix the other minor commits you made.

@FilomenoSanchez
Copy link
Member

Go to the directory where you cloned the repo (so for example cd ~/opt/conkit), run git checkout conkit/* followed by git pull origin

@sadiogo
Copy link
Author

sadiogo commented Jan 27, 2022

Good thing I pulled the commit, cause I bumped into an error that got fixed after that.

I've managed to convert the npz to psicov, casmode2 and casprr, but I get a trace back when trying to convert into ccmpred:

Traceback (most recent call last):
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_convert.py", line 87, in <module>
    main()
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/command_line/conkit_convert.py", line 77, in main
    conkit.io.convert(args.infile, args.informat, args.outfile, args.outformat)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/io/__init__.py", line 94, in convert
    write(fname_out, format_out, hierarchy)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/io/__init__.py", line 187, in write
    parser_out.write(f_out, hierarchy, **kwargs)
  File "/home/sadiogo/.local/lib/python3.8/site-packages/conkit/io/ccmpred.py", line 129, in write
    raise TypeError("Python3 requires f_handle to be in 'wb' or 'ab' mode")
TypeError: Python3 requires f_handle to be in 'wb' or 'ab' mode

I am now running conplot locally and things are running smoothly. However, when I try setting L/0 to show all contacts, I get a completely black plot (I've attached the image). I that how it should work?

A minor detail about Conplot: if I accidently try to upload a file with the wrong format, I can't upload the correct file after I receive the error message. To be able to upload, I have to change the file format, click upload, close the upload window, change the file format back to the one I wanted and click upload.

A question, How do I cite conkit? I've only found your paper on conplot.

newplot

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 27, 2022

Hi, the ccmpred error that you found seems to be another bug. I just opened an issue for it. The reason is that this is a very old parser that got written years ago when conkit was still python2 & python3 compatible (now it's only py3) and we required a few input checks that are no longer needed. It seems this part of the code got forgotten when we were changing things. Just as before, I will be fixing the code later today, but in the meanwhile you can do the following:

diff --git a/conkit/io/ccmpred.py b/conkit/io/ccmpred.py
index 4291489..d61c44c 100644
--- a/conkit/io/ccmpred.py
+++ b/conkit/io/ccmpred.py
@@ -120,13 +120,7 @@ class CCMpredParser(ContactFileParser):
         ------
         :exc:`RuntimeError`
            More than one contact map in the hierarchy
-        :exc:`TypeError`
-           Python3 requires f_handle to be in `wb` or `ab` mode
-
         """
-        # Python3 support requires bytes mode
-        if sys.version_info.major == 3 and not (f_handle.mode == "wb" or f_handle.mode == "ab"):
-            raise TypeError("Python3 requires f_handle to be in 'wb' or 'ab' mode")
 
         # Double check the type of hierarchy and reconstruct if necessary
         contact_file = self._reconstruct(hierarchy)

Removing those lines should be fine, as I said the check is no longer needed. Regarding conplot, the behaviour that you observe for L/0 is completely normal. When you set the L factor to 0, all the information in the input file gets displayed. In a trrosetta file (and also ccmpred file) the information for all the possible residue pairs is included, even if p=0 and it's very unlikely that there is a contact. This is why you get a black square (all residues pairs have a contact, some fo them will have very low probability). Regarding L/1 and L/2 being the same, they might be similar in appearance but definitely they should not be identical (if they are this is a bug).

@FilomenoSanchez
Copy link
Member

The citation for conkit is here: https://doi.org/10.1093/bioinformatics/btx148
What you have observed with Conplot file uploads is normal (unfortunately), it is a limitation of the web frame we use and hence there's nothing we can do to fix it. The problem is that the client (your laptop) only contacts to the server after changes occur in the values of certain variables, which then triggers a response of the server. In this particular case, after you click "Upload" the client only sends information contained in the file if the filename changes with respect whatever was stored before. What this means is that if you try to upload a file, and then a file with the same name that for example you have in a different directory, it looks as if nothing happens (because actually nothing happens, the filename didn't change). If you try to upload a file in the wrong format, you get the error message, change it to the correct format, and try to upload the same file, again nothing happens because the filename variable didn't change. You need to try to upload something with a differnt name (or leave it blank which restores the filename variable to an empty string) before you can upload the file again.

@sadiogo
Copy link
Author

sadiogo commented Jan 29, 2022

Regarding L/1 and L/2 being the same, they might be similar in appearance but definitely they should not be identical (if they are this is a bug).

You were right, they were similar but not identical.

What you have observed with Conplot file uploads is normal (unfortunately), it is a limitation of the web frame we use and hence there's nothing we can do to fix it.

I had a problem like this with javascript and ruby on rails once. Took me a whole day to fix it. It appears trivial but is actually a lot of trouble for a minor user experience benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants