Skip to content

Commit e29546d

Browse files
author
Francois Berenger
committed
new file: INSTALL.txt
1 parent 76936b2 commit e29546d

File tree

1 file changed

+88
-0
lines changed

1 file changed

+88
-0
lines changed

INSTALL.txt

+88
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
Molenc install guide
2+
====================
3+
4+
Author: Francois Berenger
5+
Date: 6th July 2022
6+
7+
Example installation instructions on a fresh Debian 11.3 system.
8+
On Ubuntu Linux, installation should be very similar.
9+
10+
On Mac computers, this software has worked in the past, but
11+
installation is a pain; hence we don't maintain anymore
12+
neither recommend this setup.
13+
14+
The Bash shell is assumed for all commands.
15+
16+
Sudo rights are assumed for the user performing the installation.
17+
18+
I) Install system-wide packages
19+
-------------------------------
20+
21+
$ sudo apt install git opam python3-pip python3-numpy
22+
23+
II) Configure the OCaml package manager
24+
---------------------------------------
25+
26+
$ opam init -y
27+
$ eval `opam config env` # path setup for ocaml executables
28+
# might be needed in your ~/.bashrc
29+
30+
III) Install OCaml packages
31+
---------------------------
32+
33+
$ opam depext -i molenc # this will also install rdkit system-wide
34+
35+
II) Install user-space packages
36+
-------------------------------
37+
38+
$ pip3 install six # required by chemo-standardizer
39+
$ pip3 install chemo-standardizer # requires system-wide rdkit
40+
41+
III) Tests
42+
----------
43+
44+
Test the molecular standardiser is correctly installed.
45+
It is used by molenc in case molecules need to be standardized.
46+
47+
$ standardiser -h
48+
49+
If not, it may be missing from PATH:
50+
51+
$ export PATH=$PATH:~/.local/bin # might be needed in your ~/.bashrc
52+
$ standardiser -h # test again
53+
54+
IV) Encode some molecules
55+
-------------------------
56+
57+
Get some molecules in the SMILES format:
58+
59+
$ wget https://raw.githubusercontent.com/UnixJunkie/molenc/master/data/chembl_antivirals.smi -O antivirals.smi
60+
61+
Encode those molecules using counted atom pairs fingerprint:
62+
63+
$ molenc.sh --pairs -i antivirals.smi -o antivirals_std.AP
64+
65+
Look at what was obtained:
66+
$ head -1 antivirals_std.AP
67+
CHEMBL807,0.0,[2:6;8:1;15:3;25:12;26:2;70:3;93:3;372:6;393:6;407:1;412:2;453:3;466:2;524:9;917:9;1095:3;1742:1;1776:3;2063:3;2576:4;2646:1;4428:3;5906:2;5916:1;6005:2]
68+
69+
V) Encode more molecules with an existing encoding dictionary
70+
-------------------------------------------------------------
71+
72+
Let's say we want to encode some new molecules using an existing encoding dictionary
73+
(a dictionary was created in the previous step for antivirals.smi).
74+
In the real world, you might want the encoding dictionary to cover the whole ChEMBL database
75+
(or your company's whole compound collection), so that the dictionary is exhaustive enough.
76+
77+
In the following, you need to replace MY_MOLECULES.smi with the SMILES file of your choice.
78+
79+
$ molenc.sh --pairs -d antivirals.smi.dix -i MY_MOLECULES.smi -o MY_MOLECULES_std.AP
80+
81+
Concluding remarks
82+
------------------
83+
84+
Molenc is a research software prototype.
85+
As such, it might be be a little difficult to install and under-documented.
86+
So is the fate of research by-products.
87+
Don't hesitate to contact the author in case you cannot install the software,
88+
find any bug or encounter some problems while using it.

0 commit comments

Comments
 (0)