Add option to validate transactions in the background #4617

bboston7 · 2025-01-09T19:27:33Z

This change adds a new option EXPERIMENTAL_BACKGROUND_TX_VALIDATION that causes stellar-core to validate any transactions received over-the-wire in the background. It also wraps all ledger state needed to validate transactions into a new ExtendedLedgerSnapshot object.

src/herder/HerderImpl.cpp

src/transactions/TransactionFrameBase.h

src/herder/TransactionQueue.cpp

src/herder/TransactionQueue.h

src/main/AppConnector.cpp

src/herder/TransactionQueue.h

src/herder/TransactionQueue.cpp

marta-lokhova · 2025-01-10T02:01:05Z

thanks for putting this together! I'm excited about moving more of transaction processing to the background. It also seems like making tx queue thread-safe is not as dangerous/scary as we thought, which is great. My two main suggestions are to consolidate all read-only ledger state into a single object to avoid footguns, and re-work recvTransaction flow a little bit to remove main thread completely (this way we'll have a synchronous flow independent of main thread, from when the message is popped off the wire all the way to tx queue insertion)

Closes stellar#4316 and stellar#4607. This change adds a new option `EXPERIMENTAL_BACKGROUND_TX_VALIDATION` that causes stellar-core to validate any transactions received over-the-wire in the background. It also wraps all ledger state needed to validate transactions into a new `ExtendedLedgerSnapshot` object.

bboston7 · 2025-04-08T18:27:20Z

I reworked this PR to do only validation in the background, as that's what our perf testing showed had the largest impact. It's now ready for review.

marta-lokhova · 2025-04-08T22:31:33Z

src/main/Config.h

    bool BACKGROUND_OVERLAY_PROCESSING;

    // Enable parallel block application (experimental)
    bool EXPERIMENTAL_PARALLEL_LEDGER_APPLY;

+    // Enable parallel transaction validation (experimental)
+    bool BACKGROUND_TX_VALIDATION;


Nit: for consistency with the rest of the flags in the codebase, should this be named EXPERIMENTAL_BACKGROUND_TX_VALIDATION?

marta-lokhova · 2025-04-08T22:48:46Z

src/overlay/Peer.h

@@ -503,12 +503,12 @@ class CapacityTrackedMessage : private NonMovableOrCopyable
 {
    std::weak_ptr<Peer> const mWeakPeer;
    StellarMessage const mMsg;
-    std::optional<Hash> mMaybeHash;
+    std::shared_ptr<Hash const> mMaybeHash;


Curious, why this change?

This helps to avoid copying Hash objects between various threads. This way we can just copy the pointer instead.

marta-lokhova · 2025-04-08T23:43:18Z

src/main/ApplicationImpl.cpp

+    if (mConfig.BACKGROUND_TX_VALIDATION)
+    {
+        // Keep priority unchanged as transaction validation is time-sensitive
+        mTxValidationThread =


Is it a bit too early to decide if we need a separate thread for this? I think our overlay thread is currently far from being fully utilized, so we could just use it? (and add a new thread if needed in the future)

marta-lokhova · 2025-04-08T23:46:03Z

src/overlay/OverlayManagerImpl.cpp

+                                 .header.scpValue.closeTime;
+            auto offset =
+                getUpperBoundCloseTimeOffset(mApp.getAppConnector(), closeTime);
+            mApp.postOnTxValidationThread(


It seems that we're still bouncing between threads to process a single transaction? Every new transaction is received on the overlay thread. Then, it's scheduled to be processed on the main thread (but not much happens, so it sort of just dead weight for the scheduler?). Then, checkValid is performed on a special tx validation thread, and the result is posted back to main. I think this means that this flow will be as fast as the main thread (scheduler being the bottleneck). I think adding scheduler to the flow might prevent us from observing performance benefits, similar to what we discussed in this comment.

I think we could do validation in Peer::recvAuthenticatedMessage which happens in the background already. We can also probably just use the overlay thread itself for validation.

marta-lokhova · 2025-04-09T00:07:02Z

src/ledger/LedgerStateSnapshot.h

+
+// A subclass of LedgerSnapshot that contains additional snapshots necessary for
+// transaction validation.
+class ExtendedLedgerSnapshot : public LedgerSnapshot


Is there a reason to keep both LedgerSnapshot and ExtendedLedgerSnapshot? If it's just scope, it's fine to clean this up in the follow-up, but wanted to confirm.

marta-lokhova · 2025-04-09T00:25:09Z

src/ledger/LedgerStateSnapshot.h

+
+  private:
+    std::shared_ptr<const Config> const mConfig;
+    std::optional<const SorobanNetworkConfig> const mSorobanNetworkConfig;


I feel a bit uneasy SorobanNetworkConfig lives here: LedgerSnapshot was meant to be simply a ledger state wrapper, that supports both LedgerTxn and BucketListSnapshot underneath. I was hoping to not cache anything at this layer to avoid ledger state inconsistencies and other footguns. The only exception is mLedgerHeader, which has to be modified for validation purposes. SorobanNetworkConfig really belongs to immutable bucketlist snapshots, since the struct is just a different representation of what lives in the bucketlist already. Could we move it there?

marta-lokhova · 2025-04-09T00:26:30Z

src/overlay/OverlayManagerImpl.h

+    // Ledger snapshot used for background transaction validation. Must be
+    // updated (replaced) after each new ledger via the `updateSnapshots`
+    // function.
+    std::unique_ptr<ExtendedLedgerSnapshot const> mLedgerSnapshot;


It's safer to create a new LedgerSnapshot every time you need to use it to avoid caching inconsistencies.

marta-lokhova · 2025-04-09T00:27:48Z

src/overlay/OverlayManagerImpl.h

@@ -164,6 +165,8 @@ class OverlayManagerImpl : public OverlayManager
    void start() override;
    void shutdown() override;

+    void updateSnapshots() override;


I don't think this is needed: the underlying bucket snapshot in LedgerSnapshot is managed by LedgerManager already, so if you create a new LedgerSnapshot every time instead of caching it, you'll automatically get the most recent version of the ledger. No need for snapshot mutex either, since bucketlist are already refreshed in a thread-safe manner.

marta-lokhova · 2025-04-09T00:45:12Z

src/herder/TransactionQueue.cpp

-        txResult =
-            tx->checkValid(mApp.getAppConnector(), ls, 0, 0,
-                           getUpperBoundCloseTimeOffset(mApp, closeTime));
+        txResult = tx->checkValid(


Are there visible performance benefits when doing checkValid vs just calling verifySig on all the signers to populate the existing cache? Reason I'm asking is because skipping checkValid here means that we might be potentially adding invalid transactions to the queue (if ledger state changed while the transaction was being processed). This can be handled, but I wanted to make sure we'd actually get the benefits compared to a much simpler verifySig option. Note that in case of verifySig, we could simply call it in Peer::recvAuthenticatedMessage similar how it's done for SCP messages, and then still do checkValid on main thread. Because verifySig uses the global cache, we would just get a cache hit and avoid the extra cost of signature verification.

From tracy testing, signature verification makes up 2/3s to 3/4s of checkValid. So it's the largest component, but it's also not everything. I'd expect that if we were to scope this down to just signature verifcation, we'd get results that are slightly worse than 2/3s of 3/4s of what we're seeing here, just due to potential lock contention within the cache and the small possibility for bad luck with eviction (it's a random eviction cache, right?).

As far as invalid transactions in the queue go, I was under the impression that the validity of transactions in the queue was already "best effort", especially with Soroban where we can't validate much anyway? That's not to say we should make the problem "worse" without a good reason, but surely we already handle invalid transactions in the queue, right?

marta-lokhova · 2025-04-09T00:54:19Z

src/ledger/LedgerStateSnapshot.cpp

+                                               bool forApply)
+    : LedgerSnapshot(ltx)
+    , mConfig(app.getConfigPtr())
+    , mSorobanNetworkConfig(forApply


I think something here isn't right: ExtendedLedgerSnapshot is supposed to be "the canonical representation of ledger state". Other subsystems are meant to use the snapshot and query it for data, such as ledger entries, soroban config, ledger header etc. But here we still have a dependency on LM to maintain correctness of the config. In #4607 I was hoping to remove cached mSorobanNetworkConfig completely from LedgerManager, and have it rely on the snapshot instead. The nice thing about having all of the state inside the snapshot is that it's a lot less error-prone; the snapshot will always contain correct read-only state. If some parts of the state are managed by other classes like LedgerManager, that creates a risk of divergence if not updated correctly.

bboston7 requested a review from marta-lokhova January 9, 2025 19:27

marta-lokhova requested changes Jan 10, 2025

View reviewed changes

marta-lokhova reviewed Jan 10, 2025

View reviewed changes

src/herder/TransactionQueue.cpp Outdated Show resolved Hide resolved

bboston7 force-pushed the bg-tx-queue branch from 1697978 to 6410694 Compare January 16, 2025 00:57

SirTyson mentioned this pull request Feb 11, 2025

CAP-0065 - Reusable module cache #4621

Merged

5 tasks

bboston7 force-pushed the bg-tx-queue branch from 7416db3 to 5b2dc94 Compare March 12, 2025 22:51

bboston7 force-pushed the bg-tx-queue branch from d3e9ccc to e579033 Compare March 25, 2025 18:28

bboston7 force-pushed the bg-tx-queue branch from 9267fde to 177e8d5 Compare April 8, 2025 18:08

bboston7 changed the base branch from master to release/v22.3.0 April 8, 2025 18:09

bboston7 changed the title ~~WIP: Background tryAdd functionality in TransactionQueue~~ Add option to validate transactions in the background Apr 8, 2025

bboston7 marked this pull request as ready for review April 8, 2025 18:10

marta-lokhova requested changes Apr 9, 2025

View reviewed changes

Add option to validate transactions in the background #4617

Are you sure you want to change the base?

Add option to validate transactions in the background #4617

Uh oh!

Conversation

bboston7 commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marta-lokhova commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bboston7 commented Apr 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bboston7 commented Jan 9, 2025 •

edited

Loading

marta-lokhova commented Jan 10, 2025 •

edited

Loading