Skip to content

Commit 12e95cc

Browse files
committed
Turn the persistence into multiple pages and...
...add some clarifications and cross references. ...make `immer-persist` be `immer::persist`.
1 parent 73be573 commit 12e95cc

File tree

7 files changed

+350
-289
lines changed

7 files changed

+350
-289
lines changed

doc/index.rst

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,21 @@ Contents
1717
utilities
1818
memory
1919

20+
.. toctree::
21+
:caption: Persistence
22+
:maxdepth: 3
23+
24+
persist/introduction
25+
persist/serialization
26+
persist/transformation
27+
persist/reference
28+
2029
.. toctree::
2130
:caption: Experimental
2231
:maxdepth: 3
2332

2433
python
2534
guile
26-
persist
2735

2836
----
2937

doc/persist/introduction.rst

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
Introduction
2+
===============
3+
4+
The ``immer::persist`` library persists persistent data structures,
5+
allowing the preservation structural sharing of ``immer`` containers
6+
when serializing, deserializing or transforming the data.
7+
8+
9+
.. warning:: This library is still experimental and it's API may
10+
change in the future. The headers can be found in
11+
``immer/extra/persist/...`` and the ``extra`` subpath will be
12+
removed once it's interface stabilises.
13+
14+
Dependencies
15+
------------
16+
17+
In addition to the `dependencies <introduction.html#dependencies>`_ of
18+
``immer``, this library makes use of **C++20**, `Boost.Hana
19+
<https://boostorg.github.io/hana/>`_, `fmt <https://fmt.dev/>`_ and
20+
`cereal <https://uscilab.github.io/cereal/>`_.
21+
22+
Why?
23+
---------
24+
25+
Preserving structural sharing on disk
26+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
27+
28+
*Structural sharing* allows ``immer`` containers to be efficient. At
29+
runtime, two distinct containers can be operated on independently but
30+
internally they share nodes and use memory efficiently in that
31+
way.
32+
33+
However when such containers are serialized in a trival form, for
34+
example, as JSON lists, this sharing is lost: they become truly
35+
independent---the same data is stored multiple times on disk, and
36+
later, when read back into memory, the program has lost the structural
37+
sharing.
38+
39+
This library operates on the internal structure of ``immer``
40+
containers: allowing it to be serialized, deserialized and
41+
transformed. This enables more efficient storage, particularly when
42+
many nodes are reused, and, even more importantly, preserving
43+
structural sharing after deserializing the containers.
44+
45+
46+
Transforming data with structural sharing
47+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48+
49+
Consider this scenario where you have multiple
50+
``immer::vector<std::string>``, where the various instances are
51+
derived from one another. Some of these vectors would be completely
52+
identical, while others would have just a few elements different. This
53+
scenario is not uncommon, for example, when `implement the undo
54+
history of an application by preseriving the previous
55+
states <https://sinusoid.es/lager/modularity.html#genericity>`_.
56+
57+
The goal is to apply a transformation function to these vectors with
58+
something like ``std::transform``.
59+
60+
A direct approach would be to take each vector and create a new vector
61+
by applying the transformation function for each element. However,
62+
after this process, all the structural sharing of the original
63+
containers would be lost---the result would be multiple independent
64+
vectors without any structural sharing, and the transformation may
65+
have been applied unnecessarily multiple times to identical elements
66+
that were previously shared.
67+
68+
This library enables the application of the transformation function
69+
directly on the nodes, preserving structural sharing. Additionally,
70+
regardless of how many times a node is reused, the transformation
71+
needs to be performed only once.
72+
73+
.. _pools:
74+
How?
75+
------
76+
77+
To solve this problem, this library introduces the notion of a *pool*.
78+
79+
A **pool** represents a *set* of ``immer`` containers of a specific
80+
type. For example, we may have a pool that contains all
81+
``immer::vector<int>`` of our document. You can think of it as a small
82+
database of ``immer`` containers. When serializing the pool, the
83+
internal structure of all those ``immer`` containers is written as
84+
whole, preserving the structural sharing between those containers.
85+
86+
Note that for the most part, the user of the library is not concerned
87+
with pools, as they are generated automatically from your
88+
data-structures. However, you may become aware of them in the JSON
89+
output or when transforming recursive data structures.

doc/persist/reference.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
Reference
2+
=========
3+
4+
These are the interfaces of the various parts of the
5+
``immer::persist`` library.
6+
7+
Policy
8+
------
9+
10+
.. doxygengroup:: Persist-policy
11+
:project: immer
12+
:content-only:
13+
14+
15+
API Overview
16+
------------
17+
18+
.. doxygengroup:: persist-api
19+
:project: immer
20+
:content-only:
21+
22+
23+
Transform API
24+
---------------
25+
26+
.. doxygengroup:: Persist-transform
27+
:project: immer
28+
:content-only:
29+
30+
31+
Exceptions
32+
----------
33+
34+
.. doxygengroup:: Persist-exceptions
35+
:project: immer
36+
:content-only:

doc/persist/serialization.rst

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
Serialization
2+
=========
3+
4+
Serializing your data structures using ``immer::persist`` allows you
5+
preserve the *structural sharing* across sessions of your application.
6+
7+
This has multiple practical use cases, like storing the undo history
8+
or the clipboard of a complex application, or applying advanced
9+
logging techniques.
10+
11+
The library serializes multiple containers together via the notion of
12+
a :ref:`pool<pools>`. These pools are produced automatically and
13+
represent in the JSON the internal structure (trees) that implement
14+
the Immer containers.
15+
16+
Example
17+
-------
18+
.. _first-example:
19+
20+
For this example, we'll use a ``document`` type that contains two
21+
``immer`` vectors.
22+
23+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
24+
:language: c++
25+
:start-after: intro/start-types
26+
:end-before: intro/end-types
27+
28+
Let's say we have two vectors ``v1`` and ``v2``, where ``v2`` is
29+
derived from ``v1`` so that it shares data with it:
30+
31+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
32+
:language: c++
33+
:start-after: intro/start-prepare-value
34+
:end-before: intro/end-prepare-value
35+
36+
We can serialize the document using ``cereal`` with this:
37+
38+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
39+
:language: c++
40+
:start-after: intro/start-serialize-with-cereal
41+
:end-before: intro/end-serialize-with-cereal
42+
43+
Generating a JSON like this one:
44+
45+
.. code-block:: c++
46+
47+
{"value0": {"ints": [1, 2, 3], "ints2": [1, 2, 3, 4, 5, 6]}}
48+
49+
As you can see, ``ints`` and ``ints2`` contain the full linearization
50+
of each vector. The structural sharing between these two data
51+
structures is not represented in its serialized form.
52+
53+
Using pools
54+
-----------
55+
56+
First, let's make the ``document`` struct compatible with
57+
``boost::hana``. This way, the ``persist`` library can automatically
58+
determine what :ref:`pool<pools>` types are needed, and to name the
59+
pools.
60+
61+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
62+
:language: c++
63+
:start-after: intro/start-adapt-document-for-hana
64+
:end-before: intro/end-adapt-document-for-hana
65+
66+
Then using ``immer::persist`` we can serialize it with:
67+
68+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
69+
:language: c++
70+
:start-after: intro/start-serialize-with-persist
71+
:end-before: intro/end-serialize-with-persist
72+
73+
Which generates some JSON like this:
74+
75+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
76+
:language: c++
77+
:start-after: include:intro/start-persist-json
78+
:end-before: include:intro/end-persist-json
79+
80+
As you can see, the value is serialized with every ``immer`` container
81+
replaced by an identifier. This identifier is a key into a
82+
:ref:`pool<pools>`, which is serialized just after.
83+
84+
.. note::
85+
Currently, ``immer-persist`` makes a distiction between
86+
pools used for saving containers (*output* pools) and for loading
87+
containers (*input* pools), similar to ``cereal`` with its
88+
``InputArchive`` and ``OutputArchive`` distiction.
89+
90+
Currently, ``immer-persist`` focuses on JSON as the serialization
91+
format and uses the ``cereal`` library internally. In principle, other
92+
formats and serialization libraries could be supported in the future.
93+
sharing across sessions.
94+
95+
You can see in the out that the nodes of the trees that make up the
96+
``immer`` containers are directly represented in the JSON and, because
97+
we are representing all the containers as a whole, those nodes that
98+
are referenced in multiple trees can be stored only once. That same
99+
structure is preserved when reading the pool back from disk and
100+
reconstructing the vectors (and other containers) from it, thus
101+
allowing us to preserve the structural sharing across sessions.
102+
103+
Custom policies
104+
----------
105+
106+
We can use policy to control the names of the pools for each container.
107+
108+
For this example, let's define a new document type ``doc_2``. It will
109+
also contain another type ``extra_data`` with a ``vector`` of
110+
``strings`` in it. To demonstrate the responsibilities of the policy,
111+
the ``doc_2`` type will not be a ``boost::hana::Struct`` and will not
112+
allow for compile-time reflection.
113+
114+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
115+
:language: c++
116+
:start-after: include:start-doc_2-type
117+
:end-before: include:end-doc_2-type
118+
119+
We define the ``doc_2_policy`` as following:
120+
121+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
122+
:language: c++
123+
:start-after: include:start-doc_2_policy
124+
:end-before: include:end-doc_2_policy
125+
126+
The ``get_pool_types`` function returns the types of containers that
127+
should be serialized with pools, in this case it's both ``vector`` of
128+
``ints`` and ``strings``. The ``save`` and ``load`` functions control
129+
the name of the document node, in this case it is ``doc2_value``. And
130+
the ``get_pool_name`` overloaded functions supply the name of the pool
131+
for each corresponding ``immer`` container. To create and serialize a
132+
value of ``doc_2``, you can use the following approach:
133+
134+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
135+
:language: c++
136+
:start-after: include:start-doc_2-cereal_save_with_pools
137+
:end-before: include:end-doc_2-cereal_save_with_pools
138+
139+
The serialized JSON looks like this:
140+
141+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
142+
:language: c++
143+
:start-after: include:start-doc_2-json
144+
:end-before: include:end-doc_2-json
145+
146+
And it can also be loaded from JSON like this:
147+
148+
.. literalinclude:: ../../test/extra/persist/test_for_docs.cpp
149+
:language: c++
150+
:start-after: include:start-doc_2-load
151+
:end-before: include:end-doc_2-load
152+
153+
This example also demonstrates a scenario in which the main document
154+
type ``doc_2`` contains another type ``extra_data`` with a
155+
``vector``. As you can see in the resulting JSON, nested types are
156+
also serialized with pools: ``"extra": {"comments": 1}``. Only the ID
157+
of the ``comments`` ``vector`` is serialized instead of its content.

0 commit comments

Comments
 (0)