Skip to content

Commit d3f2287

Browse files
committed
docs: add doc describing manual memory management
1 parent 3863356 commit d3f2287

File tree

1 file changed

+84
-0
lines changed

1 file changed

+84
-0
lines changed

docs/memory.md

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Memory Management
2+
3+
## Background
4+
5+
Pebble has two significant sources of memory usage: MemTables and the
6+
Block Cache. MemTables buffer data that has been written to the WAL
7+
but not yet flushed to an SSTable. The Block Cache provides a cache of
8+
uncompressed SSTable data blocks.
9+
10+
Originally, Pebble used regular Go memory allocation for the memory
11+
backing both MemTables and the Block Cache. This was problematic as it
12+
put significant pressure on the Go GC. The higher the bandwidth of
13+
memory allocations, the more work GC has to do to reclaim the
14+
memory. In order to lessen the pressure on the Go GC, an "allocation
15+
cache" was introduced to the Block Cache which allowed reusing the
16+
memory backing cached blocks in most circumstances. This produced a
17+
dramatic reduction in GC pressure and a measurable performance
18+
improvement in CockroachDB workloads.
19+
20+
Unfortunately, the use of Go allocated memory still caused a
21+
problem. CockroachDB running on top of Pebble often resulted in an RSS
22+
(resident set size) 2x what it was when using RocksDB. The cause of
23+
this effect is due to the Go runtime's heuristic for triggering GC:
24+
25+
> A collection is triggered when the ratio of freshly allocated data
26+
> to live data remaining after the previous collection reaches this
27+
> percentage.
28+
29+
This percentage can be configured by the `GOGC` environment variable
30+
or by calling `debug.SetGCPercent`. The default value is `100`, which
31+
means that GC is triggered when the freshly allocated data is equal to
32+
the amount of live data at the end of the last collection period. This
33+
generally works well in practice, but the Pebble Block Cache is often
34+
configured to be 10s of gigabytes in size. Waiting for 10s of
35+
gigabytes of data to be allocated before triggering a GC results in
36+
very large Go heap sizes.
37+
38+
## Manual Memory Management
39+
40+
Attempting to adjust `GOGC` to account for the significant amount of
41+
memory used by the Block Cache is fraught. What value should be used?
42+
`10%`? `20%`? Should the setting be tuned dynamically? Rather than
43+
introducing a heuristic which may have cascading effects on the
44+
application using Pebble, we decided to move the Block Cache and
45+
MemTable memory out of the Go heap. This is done by using the C memory
46+
allocator, though it could also be done by providing a simple memory
47+
allocator in Go which uses `mmap` to allocate memory.
48+
49+
In order to support manual memory management for the Block Cache and
50+
MemTables, Pebble needs to precisely track their lifetime. This was
51+
already being done for the MemTable in order to account for its memory
52+
usage in metrics. It was mostly being done for the Block Cache. Values
53+
stores in the Block Cache are reference counted and are returned to
54+
the "alloc cache" when their reference count falls
55+
to 0. Unfortunately, this tracking wasn't precise and there were
56+
numerous cases where the cache values were being leaked. This was
57+
acceptable in a world where the Go GC would clean up after us. It is
58+
unacceptable if the leak becomes permanent.
59+
60+
## Leak Detection
61+
62+
In order to find all of the cache value leaks, Pebble has a leak
63+
detection facility built on top of
64+
[`runtime.SetFinalizer`](https://golang.org/pkg/runtime/#SetFinalizer). A
65+
finalizer is a function associated with an object which is run when
66+
the object is no longer reachable. On the surface, this sounds perfect
67+
as a facility for performing all memory reclamation. Unfortunately,
68+
finalizers are generally frowned upon by the Go implementors, and come
69+
with very loose guarantees:
70+
71+
> The finalizer is scheduled to run at some arbitrary time after the
72+
> program can no longer reach the object to which obj points. There is
73+
> no guarantee that finalizers will run before a program exits, so
74+
> typically they are useful only for releasing non-memory resources
75+
> associated with an object during a long-running program
76+
77+
This language is somewhat frightening, but in practice finalizers are
78+
run at the end of every GC period. Pebble does not use finalizers for
79+
correctness, but instead uses them for its leak detection facility. In
80+
the block cache, a finalizer is associated with the Go allocated
81+
`cache.Value` object. When the finalizer is run, it checks that the
82+
buffer backing the `cache.Value` has been freed. This leak detection
83+
facility is enabled by the `"invariants"` build tag which is enabled
84+
by the Pebble unit tests.

0 commit comments

Comments
 (0)