Skip to content

Speed up checking of iterator compatibility #2077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 6, 2025
Merged

Conversation

bbannier
Copy link
Member

@bbannier bbannier commented Jun 3, 2025

We were previously using a control block which held a weak_ptr to the
protected data. This was pretty inefficient for a number of reasons:

  • access to the controlled data always required a weak_ptr::lock which
    created a temporary shared_ptr copy and immediately destroyed it after
    access
  • to check whether the control block was expired we used lock instead
    of expired which introduced the same overhead
  • to check compatibility of iterators we compared shared_ptrs to the
    control data which again required full locks instead of using
    owner_before

This patch introduces a new control block data structure and uses it
across all classes which previously held ad hoc implementations
(Bytes, Map, Set, Vector). The main improvement is that we now
separate tracking liveliness and the data, and use a better
implementation for control block equality checks. With the new
implementation I see throughput improvements across the board for
anything needing to iterate, e.g., I see bytes iteration being up to 30x
faster in trivial setups; the code from the original issue is now 10x
faster.

Closes #1663.

@bbannier bbannier self-assigned this Jun 3, 2025
@bbannier bbannier linked an issue Jun 3, 2025 that may be closed by this pull request
@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch from db63e48 to 9040552 Compare June 4, 2025 07:18
@bbannier bbannier changed the title Speed up most container iterators Speed up checking of iterator compatibility Jun 4, 2025
@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch 2 times, most recently from cce85e4 to b19ce61 Compare June 4, 2025 08:28
@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch from b19ce61 to 59c31dd Compare June 4, 2025 09:19
Copy link

codspeed-hq bot commented Jun 4, 2025

CodSpeed Performance Report

Merging #2077 will not alter performance

Comparing topic/bbannier/issue-1663 (1aa11f1) with main (1c965df)

Summary

✅ 90 untouched benchmarks
🆕 16 new benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 iterate_bytes[len:1000000] N/A 10.5 ms N/A
🆕 iterate_bytes[len:10000] N/A 105.6 µs N/A
🆕 iterate_bytes[len:100] N/A 2.1 µs N/A
🆕 iterate_bytes[len:1] N/A 1 µs N/A
🆕 iterate_map[len:1000000] N/A 41.1 ms N/A
🆕 iterate_map[len:10000] N/A 412.7 µs N/A
🆕 iterate_map[len:100] N/A 5.5 µs N/A
🆕 iterate_map[len:1] N/A 1.2 µs N/A
🆕 iterate_set[len:1000000] N/A 33.8 ms N/A
🆕 iterate_set[len:10000] N/A 339.1 µs N/A
🆕 iterate_set[len:100] N/A 4.3 µs N/A
🆕 iterate_set[len:1] N/A 590 ns N/A
🆕 iterate_vector[len:1000000] N/A 10 ms N/A
🆕 iterate_vector[len:10000] N/A 101 µs N/A
🆕 iterate_vector[len:100] N/A 2 µs N/A
🆕 iterate_vector[len:1] N/A 1.1 µs N/A

@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch 2 times, most recently from 573f603 to f251c36 Compare June 4, 2025 09:53
@bbannier bbannier marked this pull request as ready for review June 4, 2025 10:32
@bbannier bbannier requested a review from rsmmr June 4, 2025 12:17
Copy link
Member

@rsmmr rsmmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit torn here. It's great to see the speed-up of course, and factoring out the control block logic makes a lot of sense too. However, there's a drawback of not relying on lock() anymore to get access to the data: the code is no longer thread-safe.

That's one of these things that we currently don't need: the way we use Spicy doesn't require thread-safety, and we have some other places that aren't thread-safe either. But iterators are a pretty fundamental piece to the runtime, and if we ever wanted to become thread-safe, we'd have to revert some of this (and it would probably take a while to even just notice the issue).

Let me ask: the PR does multiple optimizations. Do you know how much performance we'd lose if we switched back to access to the controlled data through shared_ptr+lock, but left the other improvements (expired/owner_before) in?

@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch from cd764f1 to 7adcd13 Compare June 5, 2025 12:27
@bbannier
Copy link
Member Author

bbannier commented Jun 5, 2025

I'm a bit torn here. It's great to see the speed-up of course, and factoring out the control block logic makes a lot of sense too. However, there's a drawback of not relying on lock() anymore to get access to the data: the code is no longer thread-safe.

[..]

Do you know how much performance we'd lose if we switched back to access to the controlled data through shared_ptr+lock, but left the other improvements (expired/owner_before) in?

At least for the iterate_bytes benchmark not going through the shared_ptr costs about 2x more, but the more significant improvement is in the owner_before-style check for compatibility.

That said, I believe thread-safety is a red hering since even before access to the data behind the iterator was not threadsafe (e.g., vector data could be mutated concurrently while we accessed it through a still valid iterator). The only thread-safety feature of shared_ptr is that its refcount is atomic, and a threadsafe runtime library we would need a lot of reworking in how we store and access data. If user code was using iterators to access const containers concurrently before this should still work now.

I'd still opt for accessing the data via the controlled raw ptr since 2x is a significant performance impact for loop-heavy code. With the new data structure we will have a single place to work on should we ever try to make the runtime library threadsafe.

@rsmmr
Copy link
Member

rsmmr commented Jun 5, 2025

That said, I believe thread-safety is a red hering

Ok, I agree. Actually the "thread-safety" I was referring to isn't full safety, but the case where separate threads can work on independent data without causing trouble. And thinking about that again now, I believe that covers the case here as well. I had some case in mind earlier but that was indeed a red herring.

@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch from 7adcd13 to a9ca939 Compare June 5, 2025 13:14
@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch 2 times, most recently from 993ca78 to 523624a Compare June 6, 2025 09:41
@bbannier bbannier requested a review from rsmmr June 6, 2025 11:04
rsmmr
rsmmr previously approved these changes Jun 6, 2025
bbannier added 5 commits June 6, 2025 14:09
We were previously using a control block which held a weak_ptr to the
protected data. This was pretty inefficient for a number of reasons:

- access to the controlled data always required a `weak_ptr::lock` which
  created a temporary shared_ptr copy and immediately destroyed it after
  access
- to check whether the control block was expired we used `lock` instead
  of `expired` which introduced the same overhead
- to check compatibility of iterators we compared shared_ptrs to the
  control data which again required full locks instead of using
  `owner_before`

This patch introduces a new control block data structure and uses it
across all classes which previously held ad hoc implementations
(`Bytes`, `Map`, `Set`, `Vector`). The main improvement is that we now
separate tracking liveliness and the data, and use a better
implementation for control block equality checks. With the new
implementation I see throughput improvements across the board for
anything needing to iterate, e.g., I see bytes iteration being up to 30x
faster in trivial setups; the code from the original issue is now 10x
faster.

Closes #1663.
This patch repurposes the existing `hilti-rt-fiber-benchmark` to be a
more general benchmark suite of HILTI runtime behaviors.
This diagnostic returns false positives in code we do not own and in
ways which are hard to work around with pragmas, see e.g.,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111273.
@bbannier bbannier force-pushed the topic/bbannier/issue-1663 branch from 523624a to 1aa11f1 Compare June 6, 2025 12:17
@bbannier bbannier merged commit b78fe98 into main Jun 6, 2025
10 of 20 checks passed
@bbannier bbannier deleted the topic/bbannier/issue-1663 branch June 6, 2025 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bytes for loop performance
3 participants