Bucket-based API for HashTable #613

orlp · 2025-03-22T13:39:37Z

I want to propose the following API for HashTable. On OccupiedEntry I would like to add

/// Return in which bucket this entry is located.
///
/// Note: this index is not stable if the hash table is rehashed.
pub fn get_bucket(&self) -> usize;

On HashTable itself I would like to add the following four functions:

/// Returns the bucket index an entry in the hash table is located at, or `None` otherwise.
///
/// Note: this index is not stable if the hash table is rehashed.
pub fn find_bucket(&self, hash: u64, eq: impl FnMut(&T) -> bool) -> Option<usize>;

/// Return a reference to the value located at `bucket_idx`, or `None` otherwise.
pub fn at_bucket(&self, bucket_idx: usize) -> Option<&T>;

/// Return an `OccupiedEntry` for the entry located at `bucket_idx`, or `None` otherwise.
pub fn at_bucket_mut(&mut self, bucket_idx: usize) -> Option<OccupiedEntry<T>>;

/// Returns the total number of buckets in this `HashTable`.
pub fn num_buckets(&self) -> usize;

There are multiple reasons why I'd like this.

It allows incredibly flexible iteration over the hash table. I am aware that this iteration would be suboptimal compared to a SIMD-based iteration which checks the tags of 16 elements in one go, but the flexibility it provides is well worth it. For example: it makes parallel iteration over the hash table trivial, in whatever permutation you'd like.
It allows intrusive data structures to interweave with the hash table, so long as you are careful to avoid rehashing. An example I ran into recently was the writing of a fixed-capacity LRU cache. When the LRU cache wants to evict the least recently used key it has to do a lookup for that key. It could instead simply store the bucket index and use that to directly remove the key, as the hash table never rehashes in this example.
It allows additional algorithms to run safely and soundly (e.g. no pointers into the hash table which requires unsafe code and which may or may not get invalidated with the latest flavor of stacked/tree borrows) while referring into the hash table. For example: sorting the keys of a hash table while not cloning the keys or taking them out of the table.

The text was updated successfully, but these errors were encountered:

cuviper · 2025-03-22T15:09:40Z

I think this would help indexmap's OccupiedEntry as well, since a bucket index doesn't hold a lifetime. I could store a simpler (&mut IndexMapCore, usize) instead of needing separate references for my other fields. It would also remove the need for the awkward hash_table::AbsentEntry, because I could just map find_entry to an index (i.e. find_bucket) to avoid the NLL limitations.

moulins · 2025-03-24T13:44:28Z

It allows intrusive data structures to interweave with the hash table, so long as you are careful to avoid rehashing. An example I ran into recently was the writing of a fixed-capacity LRU cache. When the LRU cache wants to evict the least recently used key it has to do a lookup for that key. It could instead simply store the bucket index and use that to directly remove the key, as the hash table never rehashes in this example.

This assumes that deleting an element doesn't displace other elements, which is currently true but not technically guaranteed, as far as I can tell?

orlp · 2025-03-24T14:40:50Z

This assumes that deleting an element doesn't displace other elements, which is currently true but not technically guaranteed, as far as I can tell?

I think that's correct.

cuviper · 2025-03-24T17:21:05Z

In theory, insertions could displace existing items as well.

The standard library definitely shouldn't make any guarantees about this, but would it be so bad if hashbrown did?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bucket-based API for HashTable #613

Bucket-based API for HashTable #613

orlp commented Mar 22, 2025 •

edited

Loading

cuviper commented Mar 22, 2025

moulins commented Mar 24, 2025

orlp commented Mar 24, 2025

cuviper commented Mar 24, 2025

Bucket-based API for HashTable #613

Bucket-based API for HashTable #613

Comments

orlp commented Mar 22, 2025 • edited Loading

cuviper commented Mar 22, 2025

moulins commented Mar 24, 2025

orlp commented Mar 24, 2025

cuviper commented Mar 24, 2025

orlp commented Mar 22, 2025 •

edited

Loading