@@ -5,25 +5,24 @@ MolEnc: a molecular encoder using rdkit and OCaml.
5
5
[ ![ DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.3546675.svg )] ( https://doi.org/10.5281/zenodo.3546675 )
6
6
7
7
The implemented fingerprint is J-L Faulon's "Signature Molecular Descriptor"
8
- (SMD[ 1] ).
8
+ (SMD [ 1] ).
9
9
This is an unfolded-counted chemical fingerprint.
10
10
Such fingerprints are less lossy than famous chemical fingerprints like ECFP4.
11
11
SMD encoding doesn't introduce feature collisions upon encoding.
12
12
Also, a feature dictionary is created at encoding time.
13
13
This dictionary can be used later on to map a given feature index to an
14
14
atom environment.
15
+ Molenc also implements unfolded-counted atom pairs [ 2] .
15
16
16
- We recommend using a radius of zero to one (molenc.sh -r 0:1 ...) or
17
+ For SMD, we recommend using a radius of zero to one (molenc.sh -r 0:1 ...) or
17
18
zero to two.
18
19
19
- Currently, the fingerprint can be run using atom types
20
+ Currently, the atom typing scheme being used is:
20
21
(#pi-electrons, element symbol, #HA neighbors, formal charge).
21
22
22
23
In the future, we might add pharmacophore feature points[ 3]
23
24
(Donor, Acceptor, PosIonizable, NegIonizable, Aromatic, Hydrophobe),
24
25
to allow a fuzzier description of molecules.
25
- It is also planned to support atom pairs[ 2] in addition
26
- to or in combination with SMD.
27
26
28
27
# How to install the software
29
28
@@ -51,10 +50,16 @@ or using the software or if you have any question.
51
50
52
51
```
53
52
molenc.sh -i input.smi -o output.txt
54
- [-d encoding.dix]; reuse existing dictionary
55
- [-r i:j]; fingerprint radius (default=0:1)
56
- [--seq]; sequential mode (disable parallelization)
57
- [--no-std]; don't standardize input file molecules
53
+ [-d encoding.dix]: reuse existing feature dictionary
54
+ [-r i:j]: fingerprint radius (default=0:1)
55
+ [--pairs]: use atom pairs instead of Faulon's FP
56
+ [-m <int>]: maximum allowed atom-pair distance
57
+ (default: no limit)
58
+ [--seq]: sequential mode (disable parallelization)
59
+ [-v]: debug mode; keep temp files
60
+ [-n <int>]: max jobs in parallel
61
+ [-c <int>]: chunk size
62
+ [--no-std]: don't standardize input file molecules
58
63
ONLY USE IF THEY HAVE ALREADY BEEN STANDARDIZED
59
64
```
60
65
0 commit comments