|
| 1 | +# Memory Management |
| 2 | + |
| 3 | +## Background |
| 4 | + |
| 5 | +Pebble has two significant sources of memory usage: MemTables and the |
| 6 | +Block Cache. MemTables buffer data that has been written to the WAL |
| 7 | +but not yet flushed to an SSTable. The Block Cache provides a cache of |
| 8 | +uncompressed SSTable data blocks. |
| 9 | + |
| 10 | +Originally, Pebble used regular Go memory allocation for the memory |
| 11 | +backing both MemTables and the Block Cache. This was problematic as it |
| 12 | +put significant pressure on the Go GC. The higher the bandwidth of |
| 13 | +memory allocations, the more work GC has to do to reclaim the |
| 14 | +memory. In order to lessen the pressure on the Go GC, an "allocation |
| 15 | +cache" was introduced to the Block Cache which allowed reusing the |
| 16 | +memory backing cached blocks in most circumstances. This produced a |
| 17 | +dramatic reduction in GC pressure and a measurable performance |
| 18 | +improvement in CockroachDB workloads. |
| 19 | + |
| 20 | +Unfortunately, the use of Go allocated memory still caused a |
| 21 | +problem. CockroachDB running on top of Pebble often resulted in an RSS |
| 22 | +(resident set size) 2x what it was when using RocksDB. The cause of |
| 23 | +this effect is due to the Go runtime's heuristic for triggering GC: |
| 24 | + |
| 25 | +> A collection is triggered when the ratio of freshly allocated data |
| 26 | +> to live data remaining after the previous collection reaches this |
| 27 | +> percentage. |
| 28 | +
|
| 29 | +This percentage can be configured by the `GOGC` environment variable |
| 30 | +or by calling `debug.SetGCPercent`. The default value is `100`, which |
| 31 | +means that GC is triggered when the freshly allocated data is equal to |
| 32 | +the amount of live data at the end of the last collection period. This |
| 33 | +generally works well in practice, but the Pebble Block Cache is often |
| 34 | +configured to be 10s of gigabytes in size. Waiting for 10s of |
| 35 | +gigabytes of data to be allocated before triggering a GC results in |
| 36 | +very large Go heap sizes. |
| 37 | + |
| 38 | +## Manual Memory Management |
| 39 | + |
| 40 | +Attempting to adjust `GOGC` to account for the significant amount of |
| 41 | +memory used by the Block Cache is fraught. What value should be used? |
| 42 | +`10%`? `20%`? Should the setting be tuned dynamically? Rather than |
| 43 | +introducing a heuristic which may have cascading effects on the |
| 44 | +application using Pebble, we decided to move the Block Cache and |
| 45 | +MemTable memory out of the Go heap. This is done by using the C memory |
| 46 | +allocator, though it could also be done by providing a simple memory |
| 47 | +allocator in Go which uses `mmap` to allocate memory. |
| 48 | + |
| 49 | +In order to support manual memory management for the Block Cache and |
| 50 | +MemTables, Pebble needs to precisely track their lifetime. This was |
| 51 | +already being done for the MemTable in order to account for its memory |
| 52 | +usage in metrics. It was mostly being done for the Block Cache. Values |
| 53 | +stores in the Block Cache are reference counted and are returned to |
| 54 | +the "alloc cache" when their reference count falls |
| 55 | +to 0. Unfortunately, this tracking wasn't precise and there were |
| 56 | +numerous cases where the cache values were being leaked. This was |
| 57 | +acceptable in a world where the Go GC would clean up after us. It is |
| 58 | +unacceptable if the leak becomes permanent. |
| 59 | + |
| 60 | +## Leak Detection |
| 61 | + |
| 62 | +In order to find all of the cache value leaks, Pebble has a leak |
| 63 | +detection facility built on top of |
| 64 | +[`runtime.SetFinalizer`](https://golang.org/pkg/runtime/#SetFinalizer). A |
| 65 | +finalizer is a function associated with an object which is run when |
| 66 | +the object is no longer reachable. On the surface, this sounds perfect |
| 67 | +as a facility for performing all memory reclamation. Unfortunately, |
| 68 | +finalizers are generally frowned upon by the Go implementors, and come |
| 69 | +with very loose guarantees: |
| 70 | + |
| 71 | +> The finalizer is scheduled to run at some arbitrary time after the |
| 72 | +> program can no longer reach the object to which obj points. There is |
| 73 | +> no guarantee that finalizers will run before a program exits, so |
| 74 | +> typically they are useful only for releasing non-memory resources |
| 75 | +> associated with an object during a long-running program |
| 76 | +
|
| 77 | +This language is somewhat frightening, but in practice finalizers are |
| 78 | +run at the end of every GC period. Pebble does not use finalizers for |
| 79 | +correctness, but instead uses them for its leak detection facility. In |
| 80 | +the block cache, a finalizer is associated with the Go allocated |
| 81 | +`cache.Value` object. When the finalizer is run, it checks that the |
| 82 | +buffer backing the `cache.Value` has been freed. This leak detection |
| 83 | +facility is enabled by the `"invariants"` build tag which is enabled |
| 84 | +by the Pebble unit tests. |
0 commit comments