Skip to content

Commit bba761b

Browse files
committed
updated readme
1 parent 9ee2c41 commit bba761b

File tree

1 file changed

+14
-9
lines changed

1 file changed

+14
-9
lines changed

README.md

+14-9
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,24 @@ MolEnc: a molecular encoder using rdkit and OCaml.
55
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3546675.svg)](https://doi.org/10.5281/zenodo.3546675)
66

77
The implemented fingerprint is J-L Faulon's "Signature Molecular Descriptor"
8-
(SMD[1]).
8+
(SMD [1]).
99
This is an unfolded-counted chemical fingerprint.
1010
Such fingerprints are less lossy than famous chemical fingerprints like ECFP4.
1111
SMD encoding doesn't introduce feature collisions upon encoding.
1212
Also, a feature dictionary is created at encoding time.
1313
This dictionary can be used later on to map a given feature index to an
1414
atom environment.
15+
Molenc also implements unfolded-counted atom pairs [2].
1516

16-
We recommend using a radius of zero to one (molenc.sh -r 0:1 ...) or
17+
For SMD, we recommend using a radius of zero to one (molenc.sh -r 0:1 ...) or
1718
zero to two.
1819

19-
Currently, the fingerprint can be run using atom types
20+
Currently, the atom typing scheme being used is:
2021
(#pi-electrons, element symbol, #HA neighbors, formal charge).
2122

2223
In the future, we might add pharmacophore feature points[3]
2324
(Donor, Acceptor, PosIonizable, NegIonizable, Aromatic, Hydrophobe),
2425
to allow a fuzzier description of molecules.
25-
It is also planned to support atom pairs[2] in addition
26-
to or in combination with SMD.
2726

2827
# How to install the software
2928

@@ -51,10 +50,16 @@ or using the software or if you have any question.
5150

5251
```
5352
molenc.sh -i input.smi -o output.txt
54-
[-d encoding.dix]; reuse existing dictionary
55-
[-r i:j]; fingerprint radius (default=0:1)
56-
[--seq]; sequential mode (disable parallelization)
57-
[--no-std]; don't standardize input file molecules
53+
[-d encoding.dix]: reuse existing feature dictionary
54+
[-r i:j]: fingerprint radius (default=0:1)
55+
[--pairs]: use atom pairs instead of Faulon's FP
56+
[-m <int>]: maximum allowed atom-pair distance
57+
(default: no limit)
58+
[--seq]: sequential mode (disable parallelization)
59+
[-v]: debug mode; keep temp files
60+
[-n <int>]: max jobs in parallel
61+
[-c <int>]: chunk size
62+
[--no-std]: don't standardize input file molecules
5863
ONLY USE IF THEY HAVE ALREADY BEEN STANDARDIZED
5964
```
6065

0 commit comments

Comments
 (0)