Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket-based API for HashTable #613

Open
orlp opened this issue Mar 22, 2025 · 4 comments
Open

Bucket-based API for HashTable #613

orlp opened this issue Mar 22, 2025 · 4 comments

Comments

@orlp
Copy link

orlp commented Mar 22, 2025

I want to propose the following API for HashTable. On OccupiedEntry I would like to add

/// Return in which bucket this entry is located.
///
/// Note: this index is not stable if the hash table is rehashed.
pub fn get_bucket(&self) -> usize;

On HashTable itself I would like to add the following four functions:

/// Returns the bucket index an entry in the hash table is located at, or `None` otherwise.
///
/// Note: this index is not stable if the hash table is rehashed.
pub fn find_bucket(&self, hash: u64, eq: impl FnMut(&T) -> bool) -> Option<usize>;

/// Return a reference to the value located at `bucket_idx`, or `None` otherwise.
pub fn at_bucket(&self, bucket_idx: usize) -> Option<&T>;

/// Return an `OccupiedEntry` for the entry located at `bucket_idx`, or `None` otherwise.
pub fn at_bucket_mut(&mut self, bucket_idx: usize) -> Option<OccupiedEntry<T>>;

/// Returns the total number of buckets in this `HashTable`.
pub fn num_buckets(&self) -> usize;

There are multiple reasons why I'd like this.

  1. It allows incredibly flexible iteration over the hash table. I am aware that this iteration would be suboptimal compared to a SIMD-based iteration which checks the tags of 16 elements in one go, but the flexibility it provides is well worth it. For example: it makes parallel iteration over the hash table trivial, in whatever permutation you'd like.

  2. It allows intrusive data structures to interweave with the hash table, so long as you are careful to avoid rehashing. An example I ran into recently was the writing of a fixed-capacity LRU cache. When the LRU cache wants to evict the least recently used key it has to do a lookup for that key. It could instead simply store the bucket index and use that to directly remove the key, as the hash table never rehashes in this example.

  3. It allows additional algorithms to run safely and soundly (e.g. no pointers into the hash table which requires unsafe code and which may or may not get invalidated with the latest flavor of stacked/tree borrows) while referring into the hash table. For example: sorting the keys of a hash table while not cloning the keys or taking them out of the table.

@cuviper
Copy link
Member

cuviper commented Mar 22, 2025

I think this would help indexmap's OccupiedEntry as well, since a bucket index doesn't hold a lifetime. I could store a simpler (&mut IndexMapCore, usize) instead of needing separate references for my other fields. It would also remove the need for the awkward hash_table::AbsentEntry, because I could just map find_entry to an index (i.e. find_bucket) to avoid the NLL limitations.

@moulins
Copy link
Contributor

moulins commented Mar 24, 2025

  1. It allows intrusive data structures to interweave with the hash table, so long as you are careful to avoid rehashing. An example I ran into recently was the writing of a fixed-capacity LRU cache. When the LRU cache wants to evict the least recently used key it has to do a lookup for that key. It could instead simply store the bucket index and use that to directly remove the key, as the hash table never rehashes in this example.

This assumes that deleting an element doesn't displace other elements, which is currently true but not technically guaranteed, as far as I can tell?

@orlp
Copy link
Author

orlp commented Mar 24, 2025

This assumes that deleting an element doesn't displace other elements, which is currently true but not technically guaranteed, as far as I can tell?

I think that's correct.

@cuviper
Copy link
Member

cuviper commented Mar 24, 2025

In theory, insertions could displace existing items as well.

The standard library definitely shouldn't make any guarantees about this, but would it be so bad if hashbrown did?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants