Speed up checking of iterator compatibility #2077

bbannier · 2025-06-03T17:01:38Z

We were previously using a control block which held a weak_ptr to the
protected data. This was pretty inefficient for a number of reasons:

access to the controlled data always required a weak_ptr::lock which
created a temporary shared_ptr copy and immediately destroyed it after
access
to check whether the control block was expired we used lock instead
of expired which introduced the same overhead
to check compatibility of iterators we compared shared_ptrs to the
control data which again required full locks instead of using
owner_before

This patch introduces a new control block data structure and uses it
across all classes which previously held ad hoc implementations
(Bytes, Map, Set, Vector). The main improvement is that we now
separate tracking liveliness and the data, and use a better
implementation for control block equality checks. With the new
implementation I see throughput improvements across the board for
anything needing to iterate, e.g., I see bytes iteration being up to 30x
faster in trivial setups; the code from the original issue is now 10x
faster.

Closes #1663.

codspeed-hq · 2025-06-04T09:29:34Z

CodSpeed Performance Report

Merging #2077 will not alter performance

_{Comparing topic/bbannier/issue-1663 (1aa11f1) with main (1c965df)}

Summary

✅ 90 untouched benchmarks
🆕 16 new benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
🆕	`iterate_bytes[len:1000000]`	N/A	10.5 ms	N/A
🆕	`iterate_bytes[len:10000]`	N/A	105.6 µs	N/A
🆕	`iterate_bytes[len:100]`	N/A	2.1 µs	N/A
🆕	`iterate_bytes[len:1]`	N/A	1 µs	N/A
🆕	`iterate_map[len:1000000]`	N/A	41.1 ms	N/A
🆕	`iterate_map[len:10000]`	N/A	412.7 µs	N/A
🆕	`iterate_map[len:100]`	N/A	5.5 µs	N/A
🆕	`iterate_map[len:1]`	N/A	1.2 µs	N/A
🆕	`iterate_set[len:1000000]`	N/A	33.8 ms	N/A
🆕	`iterate_set[len:10000]`	N/A	339.1 µs	N/A
🆕	`iterate_set[len:100]`	N/A	4.3 µs	N/A
🆕	`iterate_set[len:1]`	N/A	590 ns	N/A
🆕	`iterate_vector[len:1000000]`	N/A	10 ms	N/A
🆕	`iterate_vector[len:10000]`	N/A	101 µs	N/A
🆕	`iterate_vector[len:100]`	N/A	2 µs	N/A
🆕	`iterate_vector[len:1]`	N/A	1.1 µs	N/A

hilti/runtime/include/util.h

rsmmr

I'm a bit torn here. It's great to see the speed-up of course, and factoring out the control block logic makes a lot of sense too. However, there's a drawback of not relying on lock() anymore to get access to the data: the code is no longer thread-safe.

That's one of these things that we currently don't need: the way we use Spicy doesn't require thread-safety, and we have some other places that aren't thread-safe either. But iterators are a pretty fundamental piece to the runtime, and if we ever wanted to become thread-safe, we'd have to revert some of this (and it would probably take a while to even just notice the issue).

Let me ask: the PR does multiple optimizations. Do you know how much performance we'd lose if we switched back to access to the controlled data through shared_ptr+lock, but left the other improvements (expired/owner_before) in?

hilti/runtime/include/util.h

bbannier · 2025-06-05T12:27:03Z

I'm a bit torn here. It's great to see the speed-up of course, and factoring out the control block logic makes a lot of sense too. However, there's a drawback of not relying on lock() anymore to get access to the data: the code is no longer thread-safe.

[..]

Do you know how much performance we'd lose if we switched back to access to the controlled data through shared_ptr+lock, but left the other improvements (expired/owner_before) in?

At least for the iterate_bytes benchmark not going through the shared_ptr costs about 2x more, but the more significant improvement is in the owner_before-style check for compatibility.

That said, I believe thread-safety is a red hering since even before access to the data behind the iterator was not threadsafe (e.g., vector data could be mutated concurrently while we accessed it through a still valid iterator). The only thread-safety feature of shared_ptr is that its refcount is atomic, and a threadsafe runtime library we would need a lot of reworking in how we store and access data. If user code was using iterators to access const containers concurrently before this should still work now.

I'd still opt for accessing the data via the controlled raw ptr since 2x is a significant performance impact for loop-heavy code. With the new data structure we will have a single place to work on should we ever try to make the runtime library threadsafe.

rsmmr · 2025-06-05T12:55:38Z

That said, I believe thread-safety is a red hering

Ok, I agree. Actually the "thread-safety" I was referring to isn't full safety, but the case where separate threads can work on independent data without causing trouble. And thinking about that again now, I believe that covers the case here as well. I had some case in mind earlier but that was indeed a red herring.

CMakeLists.txt

hilti/runtime/include/util.h

We were previously using a control block which held a weak_ptr to the protected data. This was pretty inefficient for a number of reasons: - access to the controlled data always required a `weak_ptr::lock` which created a temporary shared_ptr copy and immediately destroyed it after access - to check whether the control block was expired we used `lock` instead of `expired` which introduced the same overhead - to check compatibility of iterators we compared shared_ptrs to the control data which again required full locks instead of using `owner_before` This patch introduces a new control block data structure and uses it across all classes which previously held ad hoc implementations (`Bytes`, `Map`, `Set`, `Vector`). The main improvement is that we now separate tracking liveliness and the data, and use a better implementation for control block equality checks. With the new implementation I see throughput improvements across the board for anything needing to iterate, e.g., I see bytes iteration being up to 30x faster in trivial setups; the code from the original issue is now 10x faster. Closes #1663.

This patch repurposes the existing `hilti-rt-fiber-benchmark` to be a more general benchmark suite of HILTI runtime behaviors.

This diagnostic returns false positives in code we do not own and in ways which are hard to work around with pragmas, see e.g., https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111273.

bbannier self-assigned this Jun 3, 2025

bbannier linked an issue Jun 3, 2025 that may be closed by this pull request

bytes for loop performance #1663

Closed

bbannier force-pushed the topic/bbannier/issue-1663 branch from db63e48 to 9040552 Compare June 4, 2025 07:18

bbannier changed the title ~~Speed up most container iterators~~ Speed up checking of iterator compatibility Jun 4, 2025

bbannier force-pushed the topic/bbannier/issue-1663 branch 2 times, most recently from cce85e4 to b19ce61 Compare June 4, 2025 08:28

bbannier mentioned this pull request Jun 4, 2025

Add codspeed integration #2076

Merged

bbannier force-pushed the topic/bbannier/issue-1663 branch from b19ce61 to 59c31dd Compare June 4, 2025 09:19

bbannier force-pushed the topic/bbannier/issue-1663 branch 2 times, most recently from 573f603 to f251c36 Compare June 4, 2025 09:53

bbannier commented Jun 4, 2025

View reviewed changes

hilti/runtime/include/util.h Outdated Show resolved Hide resolved

bbannier marked this pull request as ready for review June 4, 2025 10:32

bbannier requested a review from rsmmr June 4, 2025 12:17

rsmmr reviewed Jun 5, 2025

View reviewed changes

bbannier force-pushed the topic/bbannier/issue-1663 branch from cd764f1 to 7adcd13 Compare June 5, 2025 12:27

bbannier force-pushed the topic/bbannier/issue-1663 branch from 7adcd13 to a9ca939 Compare June 5, 2025 13:14

timwoj reviewed Jun 5, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

rsmmr reviewed Jun 6, 2025

View reviewed changes

hilti/runtime/include/util.h Outdated Show resolved Hide resolved

bbannier force-pushed the topic/bbannier/issue-1663 branch 2 times, most recently from 993ca78 to 523624a Compare June 6, 2025 09:41

bbannier requested a review from rsmmr June 6, 2025 11:04

rsmmr previously approved these changes Jun 6, 2025

View reviewed changes

bbannier added 5 commits June 6, 2025 14:09

Add benchmark for iteration of rt containers.

6643b8e

This patch repurposes the existing `hilti-rt-fiber-benchmark` to be a more general benchmark suite of HILTI runtime behaviors.

Make -Warray-bounds not fatal with gcc-13.0.

8b31259

This diagnostic returns false positives in code we do not own and in ways which are hard to work around with pragmas, see e.g., https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111273.

Bump main CI image to ubuntu-24.04.

a7cbce0

Fix readability-qualified-auto lint.

1aa11f1

bbannier dismissed rsmmr’s stale review via 1aa11f1 June 6, 2025 12:17

bbannier force-pushed the topic/bbannier/issue-1663 branch from 523624a to 1aa11f1 Compare June 6, 2025 12:17

bbannier merged commit b78fe98 into main Jun 6, 2025
10 of 20 checks passed

bbannier deleted the topic/bbannier/issue-1663 branch June 6, 2025 12:17

bbannier mentioned this pull request Jun 6, 2025

Make -Wstringop-overflow not fatal with gcc. #2084

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up checking of iterator compatibility #2077

Speed up checking of iterator compatibility #2077

Uh oh!

bbannier commented Jun 3, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

rsmmr left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bbannier commented Jun 5, 2025 •

edited

Loading

Uh oh!

rsmmr commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Speed up checking of iterator compatibility #2077

Speed up checking of iterator compatibility #2077

Uh oh!

Conversation

bbannier commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #2077 will not alter performance

Summary

Benchmarks breakdown

Uh oh!

Uh oh!

rsmmr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bbannier commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rsmmr commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bbannier commented Jun 3, 2025 •

edited

Loading

codspeed-hq bot commented Jun 4, 2025 •

edited

Loading

bbannier commented Jun 5, 2025 •

edited

Loading