Skip to content

Optimize reading disjoint levels #26

Closed
@marvin-j97

Description

@marvin-j97

Reading disjoint levels needs to be smarter...

  • the check if a level is disjoint needs to be very fast, it should probably be cached inside each Level (Level.is_disjunct: bool), and only recalculated after level changes to avoid unnecessary work
  • benchmarks can already be found, see tree_get_pairs
  • need unit tests for first/last kv for different states of the tree, and for functions like Level::is_disjunct
  • check for disjoint levels and use MultiReader for tree iter/range/prefix ops
  • Tree::iter, ::range, ::prefix tests over disjoint levels

Reading disjoint level current behaviour

Uses MergeIterator
image
First blocks of all segments are loaded, then compared to get minimum key.
image
After that, blocks are cached inside the iterator, so only once the first block is consumed, the next block is pulled.

Reading disjoint level optimized version

Uses MultiReader
image
Once block is consumed, pull next block.
image
Once segment is consumed, start loading block of next segment.
image

For a full range scans, the there is no change in performance, as the blocks would need to be loaded anyway. But for operations like:

  • Tree first/last kv
  • is_empty
  • ranges or prefixes that cover many segments, but are not fully read (i.e. .prefix("a").take(10))

the performance impact can be noticeable, as the number of I/O ops is O(n), n = number of segments, and especially Levelled compaction has typically more segments than Tiered (but disjoint). With the optimization, the number of I/O ops is O(f), f = number of non-empty levels, so typically O(7) ~ O(1)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions