|
| 1 | +Molenc install guide |
| 2 | +==================== |
| 3 | + |
| 4 | +Author: Francois Berenger |
| 5 | +Date: 6th July 2022 |
| 6 | + |
| 7 | +Example installation instructions on a fresh Debian 11.3 system. |
| 8 | +On Ubuntu Linux, installation should be very similar. |
| 9 | + |
| 10 | +On Mac computers, this software has worked in the past, but |
| 11 | +installation is a pain; hence we don't maintain anymore |
| 12 | +neither recommend this setup. |
| 13 | + |
| 14 | +The Bash shell is assumed for all commands. |
| 15 | + |
| 16 | +Sudo rights are assumed for the user performing the installation. |
| 17 | + |
| 18 | +I) Install system-wide packages |
| 19 | +------------------------------- |
| 20 | + |
| 21 | +$ sudo apt install git opam python3-pip python3-numpy |
| 22 | + |
| 23 | +II) Configure the OCaml package manager |
| 24 | +--------------------------------------- |
| 25 | + |
| 26 | +$ opam init -y |
| 27 | +$ eval `opam config env` # path setup for ocaml executables |
| 28 | + # might be needed in your ~/.bashrc |
| 29 | + |
| 30 | +III) Install OCaml packages |
| 31 | +--------------------------- |
| 32 | + |
| 33 | +$ opam depext -i molenc # this will also install rdkit system-wide |
| 34 | + |
| 35 | +II) Install user-space packages |
| 36 | +------------------------------- |
| 37 | + |
| 38 | +$ pip3 install six # required by chemo-standardizer |
| 39 | +$ pip3 install chemo-standardizer # requires system-wide rdkit |
| 40 | + |
| 41 | +III) Tests |
| 42 | +---------- |
| 43 | + |
| 44 | +Test the molecular standardiser is correctly installed. |
| 45 | +It is used by molenc in case molecules need to be standardized. |
| 46 | + |
| 47 | +$ standardiser -h |
| 48 | + |
| 49 | +If not, it may be missing from PATH: |
| 50 | + |
| 51 | +$ export PATH=$PATH:~/.local/bin # might be needed in your ~/.bashrc |
| 52 | +$ standardiser -h # test again |
| 53 | + |
| 54 | +IV) Encode some molecules |
| 55 | +------------------------- |
| 56 | + |
| 57 | +Get some molecules in the SMILES format: |
| 58 | + |
| 59 | +$ wget https://raw.githubusercontent.com/UnixJunkie/molenc/master/data/chembl_antivirals.smi -O antivirals.smi |
| 60 | + |
| 61 | +Encode those molecules using counted atom pairs fingerprint: |
| 62 | + |
| 63 | +$ molenc.sh --pairs -i antivirals.smi -o antivirals_std.AP |
| 64 | + |
| 65 | +Look at what was obtained: |
| 66 | +$ head -1 antivirals_std.AP |
| 67 | +CHEMBL807,0.0,[2:6;8:1;15:3;25:12;26:2;70:3;93:3;372:6;393:6;407:1;412:2;453:3;466:2;524:9;917:9;1095:3;1742:1;1776:3;2063:3;2576:4;2646:1;4428:3;5906:2;5916:1;6005:2] |
| 68 | + |
| 69 | +V) Encode more molecules with an existing encoding dictionary |
| 70 | +------------------------------------------------------------- |
| 71 | + |
| 72 | +Let's say we want to encode some new molecules using an existing encoding dictionary |
| 73 | +(a dictionary was created in the previous step for antivirals.smi). |
| 74 | +In the real world, you might want the encoding dictionary to cover the whole ChEMBL database |
| 75 | +(or your company's whole compound collection), so that the dictionary is exhaustive enough. |
| 76 | + |
| 77 | +In the following, you need to replace MY_MOLECULES.smi with the SMILES file of your choice. |
| 78 | + |
| 79 | +$ molenc.sh --pairs -d antivirals.smi.dix -i MY_MOLECULES.smi -o MY_MOLECULES_std.AP |
| 80 | + |
| 81 | +Concluding remarks |
| 82 | +------------------ |
| 83 | + |
| 84 | +Molenc is a research software prototype. |
| 85 | +As such, it might be be a little difficult to install and under-documented. |
| 86 | +So is the fate of research by-products. |
| 87 | +Don't hesitate to contact the author in case you cannot install the software, |
| 88 | +find any bug or encounter some problems while using it. |
0 commit comments