Parallelize Soroban #3

sisuresh · 2025-02-03T19:06:06Z

Description

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

dmkozh

I haven't looked much into the transaction/operation changes yet, but I've already got some comments re ledger manager/application logic.

dmkozh · 2025-02-04T00:11:46Z

src/ledger/LedgerManagerImpl.cpp

-            Hash subSeed = sorobanBasePrngSeed;
-            // If tx can use the seed, we need to compute a sub-seed for it.
-            if (tx->isSoroban())
+#ifdef ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION


I think we don't need any ifdef guards for this change:

I don't think it interacts with any XDR changes at all

It's also highly unlikely to get into an intermediate release, so there is no need to do that even if we wanted to be extra cautious

The compile-time checks are actually arguably less safe as they defer the discovery of potential breakages in the current protocol

Good point. Removed.

dmkozh · 2025-02-04T00:42:40Z

src/util/numeric.h

@@ -55,4 +55,6 @@ bool bigDivideUnsigned(uint64_t& result, uint64_t A, uint64_t B, uint64_t C,

 // This only implements ROUND_DOWN
 uint64_t bigSquareRoot(uint64_t a, uint64_t b);
+
+template <typename T> T add_sat(T a, T b);


nit: inconsistent naming style (addSaturating or saturatingAdd)?

Renamed to saturatingAdd, but after the ttl logic change, it's actually no longer used, so I might just remove it.

dmkozh · 2025-02-04T00:53:55Z

src/ledger/LedgerManagerImpl.cpp

+
+        auto const& thread = stage.at(i);
+
+        roTtlDeltas.emplace_back(std::async(


The default policy for std::async is std::launch::async | std::launch::deferred . Since you're calling the get on the futures one-by-one, it seems to me that you'll end up spawning threads sequentially (according to the deferred spec, the thread is only launched when you first try to get the future value). Probably we should specify the policy to be just async, but I might be misunderstanding the spec as well.

Good catch. I verified that execution was interleaved, but it sounds like the behavior here is implementation specific. Anyways, I remove the async usage for thread.

dmkozh · 2025-02-04T01:08:19Z

src/ledger/LedgerManagerImpl.cpp

+                        auto currLiveUntil =
+                            ltxe.current().data.ttl().liveUntilLedgerSeq;
+
+                        releaseAssertOrThrow(


This assertion is not correct, is it? One can delete an entry in tx 1, and then re-create it with smaller TTL in tx 2. We only enforce non-decreasing TTL per transaction, but not across transactions. It's probably worth adding a respective test as well.

Good catch. Fixed and added a test.

dmkozh · 2025-02-04T01:17:34Z

src/ledger/LedgerManagerImpl.cpp

+        }
+    }
+
+    for (auto const& kvp : cumulativeRoTtlDeltas)


One thing I haven't realized with the new TTL algorithm is tx meta. We might need to rethink how we emit TTL meta now. In the worst case we'd need to switch back from deltas to the max value, but maybe we can come up with something better...

Update the ttl reconciliation logic.

dmkozh · 2025-02-04T01:30:36Z

src/ledger/LedgerManagerImpl.cpp

+
+                if (opType != RESTORE_FOOTPRINT && lk.type() == TTL &&
+                    it->second.mLedgerEntry && updatedLe &&
+                    isReadOnlyTTLIt != isReadOnlyTTLMap.end() &&


The entry can't be missing from the map, can it? Maybe an assertion would be more appropriate to ensure we didn't mess up anywhere.

Added an assert. I still need to go over all of the asserts to make sure they're safe and the ones that throw aren't caught when they shouldn't be.

dmkozh · 2025-02-04T01:31:37Z

src/ledger/LedgerManagerImpl.cpp

+
+                auto isReadOnlyTTLIt = isReadOnlyTTLMap.find(lk);
+
+                if (opType != RESTORE_FOOTPRINT && lk.type() == TTL &&


I don't think it's possible for restore op to modify TTL of RO entries, so not sure if the first condition is necessary.

Yeah this an outdated change that was no longer relevant. Removed.

dmkozh · 2025-02-04T01:38:15Z

src/ledger/LedgerManagerImpl.cpp

+                break;
+            }
+
+            auto ltxe = ltx.loadWithoutRecord(lk);


Shouldn't we also populate the ledger load metrics here? (currently these are a part of the soroban ops implementation).
Also, I see the code for these metrics remains in the operations, but in case if the limit is exceeded, we will never reach the operation code at all (so at least the error handling code in the operations seems redundant).

dmkozh · 2025-02-04T01:41:23Z

src/ledger/LedgerManagerImpl.cpp

+                switch (opType)
+                {
+                case INVOKE_HOST_FUNCTION:
+                    sorobanOpResult.tr().invokeHostFunctionResult().code(


We also need to populate the respective events here, don't we?

dmkozh · 2025-02-04T01:44:10Z

src/ledger/LedgerManagerImpl.cpp

+LedgerManagerImpl::collectEntries(AbstractLedgerTxn& ltx, Thread const& txs)
+{
+    ThreadEntryMap entryMap;
+    auto getEntries = [&](TransactionFrameBasePtr tx,


I've left some comments regarding this implementation below, but I think this whole method might benefit from factoring out the operation-specific logic into the operations, i.e. add a function like 'preloadEntries' or something like that, and add the respective metrics/error handling there. If there is too much duplication, maybe consider factoring out the common logic into a separate function as well.

Refactored a bit

src/transactions/TransactionFrame.cpp

sisuresh · 2025-02-14T21:17:30Z

src/ledger/LedgerManagerImpl.cpp

+                auto it = entryMap.find(lk);
+                releaseAssertOrThrow(it != entryMap.end());
+
+                auto opType =


src/transactions/ExtendFootprintTTLOpFrame.cpp

SirTyson

Wrt the state archival bits, I think we can remove meta generation and most archival meta info from the ltx. This is the end state though. If we need to keep some legacy ltx stuff around during development, that's fine, but I'd like to move away from generating restore meta via the ltx regardless, since this probably simplifies auto restore changes coming up.

SirTyson · 2025-02-25T21:36:34Z

src/transactions/TransactionFrame.cpp

+                // right before we cal into the invariants.
+            }
+
+            if (op->getOperation().body.type() == RESTORE_FOOTPRINT)


I think it might be easier just to build both the restore and op meta above and consolidate the logic. The only reason we had the post process function before was due to the limitations of ltx. Now that we have to build all meta ourselves anyway without the ltx, this post processing step is unnecessary. I think this may help simplify auto restore too.

SirTyson · 2025-02-25T21:47:14Z

src/ledger/LedgerManagerImpl.cpp

+        {
+            if (key.type() != TTL)
+            {
+                ltxInner.addRestoredFromLiveBucketListKey(key);


We should be able to get rid of addRestoredFromLiveBucketListKey and LedgerTxn::mRestoredKeys.liveBucketList entirely. The only reason we need to store liveBucketList restoration events in the ltx is for the meta post processing step. However, if we do that without the ltx, this is no longer necessary.

We still need to keep track of hotArchiveBucketList restores at the ltx level though. Maybe we can rename this to something like "removeKeyFromHotArchive." If we no longer generate restore meta via ltx, we just need to know what keys to delete from the HotArchive on ledger close.

src/transactions/InvokeHostFunctionOpFrame.cpp

sisuresh · 2025-02-27T00:37:52Z

src/transactions/ExtendFootprintTTLOpFrame.cpp

+    auto callback = [&metrics, &sorobanData, &resources, &res,
+                     &config](LedgerKey const& lk, uint32_t entrySize) -> bool {
+        metrics.mLedgerReadByte += entrySize;
+        if (resources.readBytes < metrics.mLedgerReadByte)


This flow needs to be updated for extend and restore. We shouldn't load entries in certain cases (like if the entry has been archived here).

This is a relatively simple algorithm, but it should be serviceable for traffic with relatively low amount of transitive IO conflicts.

dmkozh reviewed Feb 4, 2025

View reviewed changes

sisuresh commented Feb 5, 2025

View reviewed changes

src/transactions/TransactionFrame.cpp Show resolved Hide resolved

sisuresh commented Feb 14, 2025

View reviewed changes

src/transactions/ExtendFootprintTTLOpFrame.cpp Outdated Show resolved Hide resolved

sisuresh force-pushed the par-squash branch from 696561d to f93eaaa Compare February 15, 2025 00:30

dmkozh force-pushed the parallel_txset_nomination branch 2 times, most recently from ba6137b to 8afa754 Compare February 20, 2025 23:26

sisuresh force-pushed the par-squash branch from f0b5988 to 32e5546 Compare February 22, 2025 01:09

SirTyson reviewed Feb 25, 2025

View reviewed changes

sisuresh commented Feb 26, 2025

View reviewed changes

src/transactions/InvokeHostFunctionOpFrame.cpp Outdated Show resolved Hide resolved

sisuresh force-pushed the par-squash branch 2 times, most recently from d73d552 to c59330c Compare February 26, 2025 23:30

sisuresh commented Feb 27, 2025

View reviewed changes

dmkozh force-pushed the parallel_txset_nomination branch 2 times, most recently from a893b04 to b38da27 Compare April 23, 2025 18:58

sisuresh force-pushed the par-squash branch from 913839d to d4d637b Compare April 23, 2025 23:25

Initial algorithm for nominating valid 'parallel' tx sets.

74debea

This is a relatively simple algorithm, but it should be serviceable for traffic with relatively low amount of transitive IO conflicts.

sisuresh force-pushed the par-squash branch from d4d637b to c4146ad Compare April 25, 2025 16:43

dmkozh force-pushed the parallel_txset_nomination branch 4 times, most recently from 008b38c to e98138f Compare April 25, 2025 19:13

sisuresh force-pushed the par-squash branch from c4146ad to 3cf2844 Compare April 25, 2025 19:22

sisuresh added 5 commits April 25, 2025 12:24

Parallelize Soroban

dfd9835

Parallel soroban state archival cleanup

242aafa

Stop throwing exceptions on unrecoverable errors

a5db3c2

Update tests

0b756e8

Refund for parallel soroban

0a81d9d

sisuresh force-pushed the par-squash branch from 3cf2844 to 0a81d9d Compare April 25, 2025 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize Soroban #3

Parallelize Soroban #3

sisuresh commented Feb 3, 2025

dmkozh left a comment

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

dmkozh Feb 4, 2025

sisuresh Feb 14, 2025

sisuresh Feb 14, 2025

SirTyson left a comment

SirTyson Feb 25, 2025

SirTyson Feb 25, 2025

sisuresh Feb 27, 2025


		auto const& thread = stage.at(i);

		roTtlDeltas.emplace_back(std::async(


		auto isReadOnlyTTLIt = isReadOnlyTTLMap.find(lk);

		if (opType != RESTORE_FOOTPRINT && lk.type() == TTL &&

Parallelize Soroban #3

Are you sure you want to change the base?

Parallelize Soroban #3

Conversation

sisuresh commented Feb 3, 2025

Description

Checklist

dmkozh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SirTyson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment