SIMD-0186: Transaction Data Size Specification

2501babe · 2501babe · commit a364ce762ca5 · 2024-10-20T21:57:38.000-07:00
diff --git a/proposals/0186-transaction-data-size-specification.md b/proposals/0186-transaction-data-size-specification.md
@@ -0,0 +1,121 @@
+---
+simd: '0186'
+title: Transaction Data Size Specification
+authors:
+  - Hanako Mumei
+category: Standard
+type: Core
+status: Review
+created: 2024-10-20
+feature: (fill in with feature tracking issues once accepted)
+---
+
+## Summary
+
+Before a transaction can be executed, every account it may read from or write to
+must be loaded, including any programs it may call. The amount of data a
+transaction is allowed to load is capped, and if it exceeds that limit, loading
+is aborted. This functionality is already implemented in the validator. The
+purpose of this SIMD is to explicitly define how transaction size is calculated.
+
+## Motivation
+
+Transaction data size accounting is currently unspecified, and the
+implementation-defined algorithm used in the Agave client exhibits some
+surprising behaviors:
+
+* BPF loaders required by top-level invoked programs are counted against
+transaction data size. BPF loaders required by CPI invoked programs are not. If
+a required BPF loader is also invoked or included in the accounts list, it is
+counted twice.
+* The size of a program owned by the upgradeable BPF loader (henceforth
+LoaderV3) may or may not include the size of its programdata depending on how it
+is used on the transaction, in addition to counting programdata if it itself is
+included on the transaction. This means programdata may be counted zero, one, or
+two times.
+
+All validator clients must arrive at precisely the same transaction data size
+for all transactions because a difference of one byte can determine whether a
+transaction is executed or failed. Also, we want the calculated transaction data
+size to correspond closely to the actual amount of data the transaction
+requests.
+
+Therefore, this SIMD seeks to specify an algorithm that is straightforward to
+implement in a client-agnostic way, while also accurately accounting for the
+total data required by the transaction.
+
+## New Terminology
+
+N/A
+
+## Detailed Design
+
+The proposed algorithm is as follows:
+
+1. Every account explicitly included on the transaction accounts list is counted
+once and only once.
+2. A program owned by LoaderV3 also includes the size of its programdata.
+3. Other than point 2, no accounts are implicitly added to the total data size.
+
+Transactions may include a
+`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define
+a data size limit for the transaction. Otherwise, the default limit is 64MiB
+(`64 * 1024 * 1024` bytes). In the future, this default may be changed by
+amending this SIMD.
+
+If a transaction exceeds its data size limit, account loading is aborted and the
+transaction is failed. Fees will be charged once
+`enable_transaction_loading_failure_fees` is enabled.
+
+Read-only and writable accounts are treated the same. In the future, when direct
+mapping is enabled, this SIMD may be amended to count them differently.
+
+As a consequence of 1 and 2, programdata is counted twice if a transaction
+includes both programdata and the program account itself in the accounts list.
+This is partly done for ease of implementation: we always want to count
+programdata when the program is included, and there is no reason for any
+transaction to include both accounts except during initial deployment.
+
+There is no special handling for programs owned by the native loader or the
+non-upgradeable BPF loaders.
+
+Account size for programs owned by LoaderV4 is left undefined. This SIMD should
+be amended before LoaderV4 is enabled.
+
+## Alternatives Considered
+
+* Transaction data size accounting is already enabled, so the null option is to
+enshrine the current Agave behavior in the protocol. This is undesirable because
+the current behavior is highly idiosyncratic, and LoaderV3 program sizes are
+routinely undercounted.
+* Builtin programs are backed by accounts that only contain the program name as
+a string, typically making them 15-40 bytes. We could make them free when not
+instruction accounts, since they're part of the validator. However this
+adds complexity for no real benefit.
+* We include LoaderV3 programdata size in program size because almost all
+transactions will use the program account, which forces a load of programdata,
+and not use programdata directly. To be truly consistent, we might want to count
+LoaderV1 and LoaderV2 programs twice if they're instruction accounts, since the
+data does have to be loaded twice. However this adds complexity for what may be
+an Agave-specific implementation detail, and these programs are rarely used.
+
+## Impact
+
+The primary impact is this SIMD makes correctly implementing transaction data
+size accounting much easier for other validator clients.
+
+It makes transactions which include program accounts for CPI somewhat larger,
+but given the generous 64MiB limit, it is unlikely that any existing users will
+be affected.
+
+## Security Considerations
+
+Security impact is minimal because this SIMD merely simplifies an existing
+feature.
+
+This SIMD requires a feature gate.
+
+## Backwards Compatibility
+
+Transactions that call LoaderV3 programs via CPI and are extremely close to the
+64MiB limit may now exceed it.