Tracking: layered store 2022 Q1

This is an issue to track the new layered store implementation.

The current branch: https://github.com/tomjridge/irmin/tree/2022-04-22_layers_rebased_on_3.2.0 

Older branches:
* https://github.com/mirage/irmin/tree/2022_layered
* https://github.com/tomjridge/irmin/tree/2022-03-15_layers_rebased 

A recent tezos branch, with additional code to trigger gc every so often, is here: https://github.com/tomjridge/tezos/tree/2022-03-14_layers

Victor's branch, to integrate layers into Tezos properly, is here: https://gitlab.com/nomadic-labs/tezos/-/tree/vicall@tomjridge@layered_store

Todo (additional entries to be added when discovered):
- [x] Add clear documentation for IO.Unix interface used by pack_store.ml, so it is possible to work out what the semantics is
  - see #1758 and index's https://github.com/mirage/index/pull/380
- [x] Implement external sorting and other external routines via mmaps
  - [x] sorting
  - [x] extent calculation
- [x] Port/rework prototype code from https://github.com/tomjridge/sparse-file/tree/master/src into a subdirectory under irmin-pack
- [x] Change the store pack file to use a control+objstore+suffix ("layers") rather than a plain file
  - X Identify the exact interface used by the pack_store
  - X Determine how to implement this interface on top of the layers
  - Implement a replacement IO, suitable for layers
- [x] Implement the missing part of the worker: the calculation of reachable objects from a commit
- [x] Implement a simple mechanism to trigger GC from a given commit
- [x] Proper integration with irmin APIs
  - X how to trigger GC
  - how to properly compute reachability from a commit (still needs looking at - want to avoid use of create_reach.exe)
- [x] Test, for example, by replaying some existing trace and periodically performing GC on a recent commit
  - X Get trace replay with GC every n commits working ; this is working
  - X Get tezos node bootstrapping with GC working
  - X Get baking node working, with RO irmin instances
  - X Test restart behaviour, when killing a process in the middle of bootstrapping (for instance); TJR: I tested this quite a bit, and things seemed ok; still likely there are errors, if we kill a process at an inopportune time; could do with more testing
- [x] Bug fixing (at 2022-04-21)
  - X RO implementation needs finishing
  - Unbounded memory usage when using layers, compared to main; TJR: after finishing RO impl, cannot reproduce this error
  - After stopping a node, restart attempts to read from gap; likely this is caused by some startup behaviour of a tezos-node e.g. it attempts to access an "old" commit, or the parent of the current GC commit; TJR: after finishing RO impl, cannot reproduce this error
- [ ] Benchmarking; perhaps refinement of the code (eg calculation of reachable objects)
- [ ] Proper testing and performance measuring for Tezos use case - they want to GC every cycle, but only keep the last 6 cycles; how does this affect timings for Repo.iter? What is the impact on IO? Also, what is the space overhead? (presumably we need an extra 6 cycles worth of storage if we are GC'ing from 6 cycles ago - this will be copied to the next suffix file; and on top of this we have the sparse file overhead for live objects from the commit, 3GB currently)
- [ ] "Hardening" pass, where all the FIXMEs are addressed, corner cases fixed, etc.
- [ ] Merging into main irmin repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking: layered store 2022 Q1 #1753

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tracking: layered store 2022 Q1 #1753

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions