Skip to content

Commit 8c11a56

Browse files
committed
Incorporate @ceriottm final feedback
1 parent 75e1e30 commit 8c11a56

File tree

1 file changed

+24
-23
lines changed

1 file changed

+24
-23
lines changed

paper/paper.md

+24-23
Original file line numberDiff line numberDiff line change
@@ -25,25 +25,26 @@ bibliography: paper.bib
2525

2626
# Summary
2727

28-
The number of materials or molecules that can be created by combining
29-
different chemical elements in various proportions and spatial arrangements is
30-
enormous. Computational chemistry can be used to generate databases containing
31-
billions of potential structures [@Ruddigkeit2012], and predict some of the
32-
associated properties [@Montavon2013; @Ramakrishnan2014]. Unfortunately, the
33-
very large number of structures makes exploring such database — to understand
28+
The number of materials or molecules that can be created by combining different
29+
chemical elements in various proportions and spatial arrangements is enormous.
30+
Computational chemistry can be used to generate databases containing billions of
31+
potential structures [@Ruddigkeit2012], and predict some of the associated
32+
properties [@Montavon2013; @Ramakrishnan2014]. Unfortunately, the very large
33+
number of structures makes exploring such database — to understand
3434
structure-property relations or find the *best* structure for a given
35-
application — a daunting task. In the recent years, multiple molecular
36-
*descriptors* [@Behler2007; @Bartok2013; @Willatt2019] have been developed to
37-
compute structural similarities between materials or molecules, incorporating
38-
physically-relevant information and symmetries. These descriptors can be used
39-
for unsupervised machine learning applications, such as clustering or
40-
classification of the different structures, and high-throughput screening of
41-
database for specific properties [@Maier2007; @De2017; @Hautier2019].
42-
Unfortunately, the dimensionality of most descriptors is very high, which makes
43-
the resulting classifications, clustering or mapping very hard to visualize.
44-
Additional dimensionality reduction algorithm [@Schlkopf1998; @Ceriotti2011;
45-
@McInnes2018] can reduce the number of relevant dimensions to a handful,
46-
creating 2D or 3D maps of the full database.
35+
application — a daunting task. In recent years, multiple molecular
36+
*representations* [@Behler2007; @Bartok2013; @Willatt2019] have been developed
37+
to compute structural similarities between materials or molecules, incorporating
38+
physically-relevant information and symmetries. The features associated with
39+
these representations can be used for unsupervised machine learning
40+
applications, such as clustering or classification of the different structures,
41+
and high-throughput screening of database for specific properties [@Maier2007;
42+
@De2017; @Hautier2019]. Unfortunately, The dimensionality of these features (as
43+
well as most of other descriptors used in chemical and materials informatics) is
44+
very high, which makes the resulting classifications, clustering or mapping very
45+
hard to visualize. Additional dimensionality reduction algorithm
46+
[@Schlkopf1998; @Ceriotti2011; @McInnes2018] can reduce the number of relevant
47+
dimensions to a handful, creating 2D or 3D maps of the full database.
4748

4849
![The Qm7b database [@Montavon2013] visualized with chemiscope](screenshot.png)
4950

@@ -55,11 +56,11 @@ point corresponds to a chemical entity. The axes, color, and style of each point
5556
can be set to represent a property or a structural descriptor to visualize
5657
structure-property relations directly. Structural descriptors are not computed
5758
directly by chemiscope, but must be obtained from one of the many codes
58-
implementing such descriptors [@librascal; @QUIP]. Since the most common
59+
implementing general-purpose atomic representation [@librascal; @QUIP] or more specialized descriptors. Since the most common
5960
descriptors can be very high dimensional, it can be convenient to apply a
6061
dimensionality reduction algorithm that maps them to a lower-dimensional space
6162
for easier visualization. For example the sketch-map algorithm [@Ceriotti2011]
62-
was used with the Smooth Overlap of Atomic Positions descriptor [@Bartok2013] to
63+
was used with the Smooth Overlap of Atomic Positions representation [@Bartok2013] to
6364
generate the visualization in Figure 1. The right panel displays the
6465
three-dimensional structure of the chemical entities, possibly including
6566
periodic repetition for crystals. Visualizing the chemical structure can help to
@@ -99,14 +100,14 @@ slower, while still handling 100k points easily.
99100
The use of web technologies makes chemiscope usable from different operating
100101
systems without the need to develop, maintain and package the code for each
101102
operating system. It also means that we can provide an online service at
102-
http://chemiscope.org allowing users to visualize their own dataset without
103-
any local installation. Chemiscope is implemented as a library of re-usable
103+
http://chemiscope.org allowing users to visualize their own dataset without any
104+
local installation. Chemiscope is implemented as a library of re-usable
104105
components linked together via callbacks. This makes it easy to modify the
105106
default interface to generate more elaborate visualizations, for example,
106107
displaying multiple maps generated with different parameters of a dimensionality
107108
reduction algorithm. Chemiscope can also be distributed in a standalone mode,
108109
where the code and a predefined dataset are merged together as a single HTML
109-
file. This standalone mode is useful for archival purposes, for example, as
110+
file. This standalone mode is useful for archival purposes, for example as
110111
supplementary information for a published article and for use in corporate
111112
environments with sensitive datasets.
112113

0 commit comments

Comments
 (0)