Skip to content

Parallelize Soroban #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: parallel_txset_nomination
Choose a base branch
from

Conversation

sisuresh
Copy link

@sisuresh sisuresh commented Feb 3, 2025

Description

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

Copy link
Owner

@dmkozh dmkozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked much into the transaction/operation changes yet, but I've already got some comments re ledger manager/application logic.

Hash subSeed = sorobanBasePrngSeed;
// If tx can use the seed, we need to compute a sub-seed for it.
if (tx->isSoroban())
#ifdef ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need any ifdef guards for this change:

  • I don't think it interacts with any XDR changes at all
  • It's also highly unlikely to get into an intermediate release, so there is no need to do that even if we wanted to be extra cautious
  • The compile-time checks are actually arguably less safe as they defer the discovery of potential breakages in the current protocol

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed.

@@ -55,4 +55,6 @@ bool bigDivideUnsigned(uint64_t& result, uint64_t A, uint64_t B, uint64_t C,

// This only implements ROUND_DOWN
uint64_t bigSquareRoot(uint64_t a, uint64_t b);

template <typename T> T add_sat(T a, T b);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent naming style (addSaturating or saturatingAdd)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to saturatingAdd, but after the ttl logic change, it's actually no longer used, so I might just remove it.


auto const& thread = stage.at(i);

roTtlDeltas.emplace_back(std::async(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default policy for std::async is std::launch::async | std::launch::deferred . Since you're calling the get on the futures one-by-one, it seems to me that you'll end up spawning threads sequentially (according to the deferred spec, the thread is only launched when you first try to get the future value). Probably we should specify the policy to be just async, but I might be misunderstanding the spec as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I verified that execution was interleaved, but it sounds like the behavior here is implementation specific. Anyways, I remove the async usage for thread.

auto currLiveUntil =
ltxe.current().data.ttl().liveUntilLedgerSeq;

releaseAssertOrThrow(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is not correct, is it? One can delete an entry in tx 1, and then re-create it with smaller TTL in tx 2. We only enforce non-decreasing TTL per transaction, but not across transactions. It's probably worth adding a respective test as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed and added a test.

}
}

for (auto const& kvp : cumulativeRoTtlDeltas)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I haven't realized with the new TTL algorithm is tx meta. We might need to rethink how we emit TTL meta now. In the worst case we'd need to switch back from deltas to the max value, but maybe we can come up with something better...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the ttl reconciliation logic.


if (opType != RESTORE_FOOTPRINT && lk.type() == TTL &&
it->second.mLedgerEntry && updatedLe &&
isReadOnlyTTLIt != isReadOnlyTTLMap.end() &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entry can't be missing from the map, can it? Maybe an assertion would be more appropriate to ensure we didn't mess up anywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert. I still need to go over all of the asserts to make sure they're safe and the ones that throw aren't caught when they shouldn't be.


auto isReadOnlyTTLIt = isReadOnlyTTLMap.find(lk);

if (opType != RESTORE_FOOTPRINT && lk.type() == TTL &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible for restore op to modify TTL of RO entries, so not sure if the first condition is necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this an outdated change that was no longer relevant. Removed.

break;
}

auto ltxe = ltx.loadWithoutRecord(lk);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also populate the ledger load metrics here? (currently these are a part of the soroban ops implementation).
Also, I see the code for these metrics remains in the operations, but in case if the limit is exceeded, we will never reach the operation code at all (so at least the error handling code in the operations seems redundant).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

switch (opType)
{
case INVOKE_HOST_FUNCTION:
sorobanOpResult.tr().invokeHostFunctionResult().code(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to populate the respective events here, don't we?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

LedgerManagerImpl::collectEntries(AbstractLedgerTxn& ltx, Thread const& txs)
{
ThreadEntryMap entryMap;
auto getEntries = [&](TransactionFrameBasePtr tx,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments regarding this implementation below, but I think this whole method might benefit from factoring out the operation-specific logic into the operations, i.e. add a function like 'preloadEntries' or something like that, and add the respective metrics/error handling there. If there is too much duplication, maybe consider factoring out the common logic into a separate function as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored a bit

auto it = entryMap.find(lk);
releaseAssertOrThrow(it != entryMap.end());

auto opType =
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

@dmkozh dmkozh force-pushed the parallel_txset_nomination branch 2 times, most recently from ba6137b to 8afa754 Compare February 20, 2025 23:26
Copy link

@SirTyson SirTyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt the state archival bits, I think we can remove meta generation and most archival meta info from the ltx. This is the end state though. If we need to keep some legacy ltx stuff around during development, that's fine, but I'd like to move away from generating restore meta via the ltx regardless, since this probably simplifies auto restore changes coming up.

// right before we cal into the invariants.
}

if (op->getOperation().body.type() == RESTORE_FOOTPRINT)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be easier just to build both the restore and op meta above and consolidate the logic. The only reason we had the post process function before was due to the limitations of ltx. Now that we have to build all meta ourselves anyway without the ltx, this post processing step is unnecessary. I think this may help simplify auto restore too.

{
if (key.type() != TTL)
{
ltxInner.addRestoredFromLiveBucketListKey(key);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to get rid of addRestoredFromLiveBucketListKey and LedgerTxn::mRestoredKeys.liveBucketList entirely. The only reason we need to store liveBucketList restoration events in the ltx is for the meta post processing step. However, if we do that without the ltx, this is no longer necessary.

We still need to keep track of hotArchiveBucketList restores at the ltx level though. Maybe we can rename this to something like "removeKeyFromHotArchive." If we no longer generate restore meta via ltx, we just need to know what keys to delete from the HotArchive on ledger close.

@sisuresh sisuresh force-pushed the par-squash branch 2 times, most recently from d73d552 to c59330c Compare February 26, 2025 23:30
auto callback = [&metrics, &sorobanData, &resources, &res,
&config](LedgerKey const& lk, uint32_t entrySize) -> bool {
metrics.mLedgerReadByte += entrySize;
if (resources.readBytes < metrics.mLedgerReadByte)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flow needs to be updated for extend and restore. We shouldn't load entries in certain cases (like if the entry has been archived here).

@dmkozh dmkozh force-pushed the parallel_txset_nomination branch 2 times, most recently from a893b04 to b38da27 Compare April 23, 2025 18:58
This is a relatively simple algorithm, but it should be serviceable for traffic with relatively low amount of transitive IO conflicts.
@dmkozh dmkozh force-pushed the parallel_txset_nomination branch 4 times, most recently from 008b38c to e98138f Compare April 25, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants