Closed
Description
This is an issue to track the new layered store implementation.
The current branch: https://github.com/tomjridge/irmin/tree/2022-04-22_layers_rebased_on_3.2.0
Older branches:
- https://github.com/mirage/irmin/tree/2022_layered
- https://github.com/tomjridge/irmin/tree/2022-03-15_layers_rebased
A recent tezos branch, with additional code to trigger gc every so often, is here: https://github.com/tomjridge/tezos/tree/2022-03-14_layers
Victor's branch, to integrate layers into Tezos properly, is here: https://gitlab.com/nomadic-labs/tezos/-/tree/vicall@tomjridge@layered_store
Todo (additional entries to be added when discovered):
- Add clear documentation for IO.Unix interface used by pack_store.ml, so it is possible to work out what the semantics is
- see Document IO_intf.ml #1758 and index's Add docs for src/unix/raw.mli index#380
- Implement external sorting and other external routines via mmaps
- sorting
- extent calculation
- Port/rework prototype code from https://github.com/tomjridge/sparse-file/tree/master/src into a subdirectory under irmin-pack
- Change the store pack file to use a control+objstore+suffix ("layers") rather than a plain file
- X Identify the exact interface used by the pack_store
- X Determine how to implement this interface on top of the layers
- Implement a replacement IO, suitable for layers
- Implement the missing part of the worker: the calculation of reachable objects from a commit
- Implement a simple mechanism to trigger GC from a given commit
- Proper integration with irmin APIs
- X how to trigger GC
- how to properly compute reachability from a commit (still needs looking at - want to avoid use of create_reach.exe)
- Test, for example, by replaying some existing trace and periodically performing GC on a recent commit
- X Get trace replay with GC every n commits working ; this is working
- X Get tezos node bootstrapping with GC working
- X Get baking node working, with RO irmin instances
- X Test restart behaviour, when killing a process in the middle of bootstrapping (for instance); TJR: I tested this quite a bit, and things seemed ok; still likely there are errors, if we kill a process at an inopportune time; could do with more testing
- Bug fixing (at 2022-04-21)
- X RO implementation needs finishing
- Unbounded memory usage when using layers, compared to main; TJR: after finishing RO impl, cannot reproduce this error
- After stopping a node, restart attempts to read from gap; likely this is caused by some startup behaviour of a tezos-node e.g. it attempts to access an "old" commit, or the parent of the current GC commit; TJR: after finishing RO impl, cannot reproduce this error
- Benchmarking; perhaps refinement of the code (eg calculation of reachable objects)
- Proper testing and performance measuring for Tezos use case - they want to GC every cycle, but only keep the last 6 cycles; how does this affect timings for Repo.iter? What is the impact on IO? Also, what is the space overhead? (presumably we need an extra 6 cycles worth of storage if we are GC'ing from 6 cycles ago - this will be copied to the next suffix file; and on top of this we have the sparse file overhead for live objects from the commit, 3GB currently)
- "Hardening" pass, where all the FIXMEs are addressed, corner cases fixed, etc.
- Merging into main irmin repo