@@ -25,25 +25,26 @@ bibliography: paper.bib
25
25
26
26
# Summary
27
27
28
- The number of materials or molecules that can be created by combining
29
- different chemical elements in various proportions and spatial arrangements is
30
- enormous. Computational chemistry can be used to generate databases containing
31
- billions of potential structures [ @Ruddigkeit2012 ] , and predict some of the
32
- associated properties [ @Montavon2013 ; @Ramakrishnan2014 ] . Unfortunately, the
33
- very large number of structures makes exploring such database — to understand
28
+ The number of materials or molecules that can be created by combining different
29
+ chemical elements in various proportions and spatial arrangements is enormous.
30
+ Computational chemistry can be used to generate databases containing billions of
31
+ potential structures [ @Ruddigkeit2012 ] , and predict some of the associated
32
+ properties [ @Montavon2013 ; @Ramakrishnan2014 ] . Unfortunately, the very large
33
+ number of structures makes exploring such database — to understand
34
34
structure-property relations or find the * best* structure for a given
35
- application — a daunting task. In the recent years, multiple molecular
36
- * descriptors* [ @Behler2007 ; @Bartok2013 ; @Willatt2019 ] have been developed to
37
- compute structural similarities between materials or molecules, incorporating
38
- physically-relevant information and symmetries. These descriptors can be used
39
- for unsupervised machine learning applications, such as clustering or
40
- classification of the different structures, and high-throughput screening of
41
- database for specific properties [ @Maier2007 ; @De2017 ; @Hautier2019 ] .
42
- Unfortunately, the dimensionality of most descriptors is very high, which makes
43
- the resulting classifications, clustering or mapping very hard to visualize.
44
- Additional dimensionality reduction algorithm [ @Schlkopf1998 ; @Ceriotti2011 ;
45
- @McInnes2018 ] can reduce the number of relevant dimensions to a handful,
46
- creating 2D or 3D maps of the full database.
35
+ application — a daunting task. In recent years, multiple molecular
36
+ * representations* [ @Behler2007 ; @Bartok2013 ; @Willatt2019 ] have been developed
37
+ to compute structural similarities between materials or molecules, incorporating
38
+ physically-relevant information and symmetries. The features associated with
39
+ these representations can be used for unsupervised machine learning
40
+ applications, such as clustering or classification of the different structures,
41
+ and high-throughput screening of database for specific properties [ @Maier2007 ;
42
+ @De2017 ; @Hautier2019 ] . Unfortunately, The dimensionality of these features (as
43
+ well as most of other descriptors used in chemical and materials informatics) is
44
+ very high, which makes the resulting classifications, clustering or mapping very
45
+ hard to visualize. Additional dimensionality reduction algorithm
46
+ [ @Schlkopf1998 ; @Ceriotti2011 ; @McInnes2018 ] can reduce the number of relevant
47
+ dimensions to a handful, creating 2D or 3D maps of the full database.
47
48
48
49
![ The Qm7b database [ @Montavon2013 ] visualized with chemiscope] ( screenshot.png )
49
50
@@ -55,11 +56,11 @@ point corresponds to a chemical entity. The axes, color, and style of each point
55
56
can be set to represent a property or a structural descriptor to visualize
56
57
structure-property relations directly. Structural descriptors are not computed
57
58
directly by chemiscope, but must be obtained from one of the many codes
58
- implementing such descriptors [ @librascal ; @QUIP ] . Since the most common
59
+ implementing general-purpose atomic representation [ @librascal ; @QUIP ] or more specialized descriptors . Since the most common
59
60
descriptors can be very high dimensional, it can be convenient to apply a
60
61
dimensionality reduction algorithm that maps them to a lower-dimensional space
61
62
for easier visualization. For example the sketch-map algorithm [ @Ceriotti2011 ]
62
- was used with the Smooth Overlap of Atomic Positions descriptor [ @Bartok2013 ] to
63
+ was used with the Smooth Overlap of Atomic Positions representation [ @Bartok2013 ] to
63
64
generate the visualization in Figure 1. The right panel displays the
64
65
three-dimensional structure of the chemical entities, possibly including
65
66
periodic repetition for crystals. Visualizing the chemical structure can help to
@@ -99,14 +100,14 @@ slower, while still handling 100k points easily.
99
100
The use of web technologies makes chemiscope usable from different operating
100
101
systems without the need to develop, maintain and package the code for each
101
102
operating system. It also means that we can provide an online service at
102
- http://chemiscope.org allowing users to visualize their own dataset without
103
- any local installation. Chemiscope is implemented as a library of re-usable
103
+ http://chemiscope.org allowing users to visualize their own dataset without any
104
+ local installation. Chemiscope is implemented as a library of re-usable
104
105
components linked together via callbacks. This makes it easy to modify the
105
106
default interface to generate more elaborate visualizations, for example,
106
107
displaying multiple maps generated with different parameters of a dimensionality
107
108
reduction algorithm. Chemiscope can also be distributed in a standalone mode,
108
109
where the code and a predefined dataset are merged together as a single HTML
109
- file. This standalone mode is useful for archival purposes, for example, as
110
+ file. This standalone mode is useful for archival purposes, for example as
110
111
supplementary information for a published article and for use in corporate
111
112
environments with sensitive datasets.
112
113
0 commit comments