Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment: Make Tsavorite allocators unmanaged #1112

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

PaulusParssinen
Copy link
Contributor

@PaulusParssinen PaulusParssinen commented Mar 19, 2025

wip. Testing if we can stop using POH for the various Tsavorite buffers and making the aligned allocations faster and simpler by using the new NativeMemory.Aligned* APIs. (i.e. delegate to same allocation logic and APIs as original FASTER C++ impl)

Some notes about the progress

  • Figure out why GenericAllocator logic started reading garbage for unblittable keys and values.
    • Was it relying on the buffer pool's over-allocation during alignment? Seems so at the moment.
    • I forgot the original pool allocations were zero-init arrays, makes sense, that fixed the corruption of the first read record. The tail record keeps reading unitialized memory, if we don't over-allocate! (512 vs. 528 or something, IIRC)
    • Maybe something up in the log alignment arithmetics. Maybe? Need figure out why we are reading outside the rounded sector size that we ask for from the pool, and rely on the overallocation to zero out the last record info used in the generic allocator.
      • Seems like there's more logic with this assumption judging from the AVs.
    • Making Allocators disposal free the its owned buffers has started to break test that reused session from already disposed log (3cc1c0a) or where log was disposed before the device which could have its IO threads still operating on it (3ef93aa).
  • The properties ValidOffset, RequiredBytes and AvailableBytes seem to be record allocation concepts and probably don't belong to the aligned memory class. Consider pulling them out while we're here.

* Using NativeMemory alignment APIs
* Make BlittableFrame NativeMemory based
Also remove one unnecessary fixed statement usage for copying memory
…cessing invalid queued buffers while log itself is being disposed
* We could return empty sentinel instance?
* Also do over-sized alloc for BlittableFrames
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented Mar 20, 2025

Trying to get tests pass so I can start to properly diagnose what parts of the allocator's record (de)serialization is messing around beyond the allocated bounds that the code explicitly requests for (the code currently is seemingly relying on the pool to allocate much more behind during alignment than what it expects per the "contract", i.e. the length of the memory returned from pool is the smallest segment to fulfill that request)

The changes made to the way allocator handles the memory buffers (that were GC tracked arrays before this PR) has "exposed" that the allocators pages were never explicitly unrooted by the usual disposal procedures. By changing to unmanaged buffers, we must of course explicitly free them and this seems to have upsetted asynchronous/multithreaded log tests in couple cases that I could test and diagnose properly.

The issue is that Azure device tests seems to now be failing due to this change, i.e. was relying on those buffers to be available after disposal of the log. I currently have no capability to try setup Azure blob storage for testing (I'm broke).

Little bit out of the depth in certain parts of the storage logic, so thought to ask first: Is there existing mechanism in the storage peristence layer to either make the tests wait for all log device IO to finish before starting the clean-up logic (in-memory device had blocking logic in its disposal, do we want to build IAsyncDisposable one?), or existing logic/approach to telling the device to "abort everything, the buffers you (the device) have queued from allocator maybe/are bad". This is what I think is currently going wrong in the Azure device tests. cc @badrishc or anyone

* Avoid temporary buffer and copying, operate on the buffer we get from pool directly.
* Pool documentation adjustments
@badrishc
Copy link
Collaborator

We may (dont recall 100% at the moment) be simply letting the GC handle the clean up of pages, for the reason that it was overhead to track and ensure the actual completion of IO-in-flight, and this was thought to be unnecessary as it was a test-only problem. So we just let the GC handle it. With explicit native memory, things get more complicated. We will need to drain the IO. There is a CompletePending API for the KV, but it is possible other parts of the code do not have similar mechanisms yet? TsavoriteLog uses an _disposed variable, but we need to make sure to wait for races where the callback thread might have just gone past the _disposed check.

cc @TedHartMS

@badrishc
Copy link
Collaborator

badrishc commented Mar 25, 2025

the code currently is seemingly relying on the pool to allocate much more behind during alignment than what it expects per the "contract"

@PaulusParssinen - can you show an example where this is occurring?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants