|
| 1 | +--- |
| 2 | +simd: '0186' |
| 3 | +title: Loaded Transaction Data Size Specification |
| 4 | +authors: |
| 5 | + - Hanako Mumei |
| 6 | +category: Standard |
| 7 | +type: Core |
| 8 | +status: Review |
| 9 | +created: 2024-10-20 |
| 10 | +feature: (fill in with feature tracking issues once accepted) |
| 11 | +--- |
| 12 | + |
| 13 | +## Summary |
| 14 | + |
| 15 | +Before a transaction can be executed, every account it may read from or write to |
| 16 | +must be loaded, including any programs it may call. The amount of data a |
| 17 | +transaction is allowed to load is capped, and if it exceeds that limit, loading |
| 18 | +is aborted. This functionality is already implemented in the validator. The |
| 19 | +purpose of this SIMD is to explicitly define how loaded transaction data size is |
| 20 | +calculated. |
| 21 | + |
| 22 | +## Motivation |
| 23 | + |
| 24 | +Transaction data size accounting is currently unspecified, and the |
| 25 | +implementation-defined algorithm used in the Agave client exhibits some |
| 26 | +surprising behaviors: |
| 27 | + |
| 28 | +* BPF loaders required by instructions' program IDs are counted against |
| 29 | +transaction data size. BPF loaders required by CPI programs are not. If a |
| 30 | +required BPF loader is also included in the accounts list, it is counted twice. |
| 31 | +* The size of a program owned by LoaderV3 may or may not include the size of its |
| 32 | +programdata depending on how the program account is used on the transaction. |
| 33 | +Programdata is also itself counted if included in the transaction accounts list. |
| 34 | +This means programdata may be counted zero, one, or two times per transaction. |
| 35 | +* Due to certain quirks of implementation, loader-owned accounts which do not |
| 36 | +contain valid programs for execution may or may not be counted against the |
| 37 | +transaction data size total depending on how they are used on the transaction. |
| 38 | +This includes, but is not limited to, LoaderV3 buffer accounts, and accounts |
| 39 | +which fail ELF validation. |
| 40 | +* Accounts can be included on a transaction account list without being an |
| 41 | +instruction account, fee-payer, or program ID. These accounts are presently |
| 42 | +loaded and counted against transaction data size, although they can never be |
| 43 | +used for any purpose by the transaction. |
| 44 | + |
| 45 | +All validator clients must arrive at precisely the same transaction data size |
| 46 | +for all transactions because a difference of one byte can determine whether a |
| 47 | +transaction is executed or failed, and thus affects consensus. Also, we want the |
| 48 | +calculated transaction data size to correspond well with the actual amount of |
| 49 | +data the transaction requests. |
| 50 | + |
| 51 | +Therefore, this SIMD seeks to specify an algorithm that is straightforward to |
| 52 | +implement in a client-agnostic way, while also accurately accounting for all |
| 53 | +account data required by the transaction. |
| 54 | + |
| 55 | +## New Terminology |
| 56 | + |
| 57 | +No new terms are introduced by this SIMD, however we define these for clarity: |
| 58 | + |
| 59 | +* Instruction account: an account passed to an instruction in its accounts |
| 60 | +array, which allows the program to view the actual bytes contained in the |
| 61 | +account. CPI can only happen through programs provided as instruction accounts. |
| 62 | +* Transaction accounts list: all accounts for the transaction, which includes |
| 63 | +instruction accounts, the fee-payer, program IDs, and any extra accounts added |
| 64 | +to the list but not used for any purpose. |
| 65 | +* LoaderV3 program account: an account owned by |
| 66 | +`BPFLoaderUpgradeab1e11111111111111111111111` which contains in its account data |
| 67 | +the first four bytes `02 00 00 00` followed by a pubkey which points to an |
| 68 | +account which is defined as the program's programdata account. |
| 69 | + |
| 70 | +For the purposes of this SIMD, we make no assumptions about the contents of the |
| 71 | +programdata account. |
| 72 | + |
| 73 | +## Detailed Design |
| 74 | + |
| 75 | +The proposed algorithm is as follows: |
| 76 | + |
| 77 | +1. Given a transaction, take the unique set of account keys which are used as: |
| 78 | + |
| 79 | + * An instruction account. |
| 80 | + * A program ID for an instruction. |
| 81 | + * The fee-payer. |
| 82 | + |
| 83 | +2. Each account's size is determined solely by the byte length of its data prior |
| 84 | +to transaction execution. |
| 85 | +3. For any `LoaderV3` program account, add the size of the programdata account |
| 86 | +it references, if it exists. |
| 87 | +4. The total transaction size is the sum of these sizes. |
| 88 | + |
| 89 | +Transactions may include a |
| 90 | +`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define |
| 91 | +a data size limit for the transaction. Otherwise, the default limit is 64MiB |
| 92 | +(`64 * 1024 * 1024` bytes). |
| 93 | + |
| 94 | +If a transaction exceeds its data size limit, the transaction is failed. Fees |
| 95 | +will be charged once `enable_transaction_loading_failure_fees` is enabled. |
| 96 | + |
| 97 | +Adding required loaders to transaction data size is abolished. They are treated |
| 98 | +the same as any other account: counted if used in a manner described by 1, not |
| 99 | +counted otherwise. |
| 100 | + |
| 101 | +No account that falls outside of the three categories listed by 1 is counted |
| 102 | +against transaction data size. Validator clients are free to decline to load |
| 103 | +them. |
| 104 | + |
| 105 | +Read-only and writable accounts are treated the same. In the future, when direct |
| 106 | +mapping is enabled, this SIMD may be amended to count them differently. |
| 107 | + |
| 108 | +As a consequence of 1 and 3, for LoaderV3 programs, programdata is counted twice |
| 109 | +if a transaction explicitly references the program account and its programdata |
| 110 | +account. This is done partly for simplicity, and partly to account for the cost |
| 111 | +of maintaining the compiled program in addition to the actual bytes of |
| 112 | +the programdata account. |
| 113 | + |
| 114 | +We include programdata size in account size for LoaderV3 programs because using |
| 115 | +the program account on a transaction forces an unconditional load of programdata |
| 116 | +to compile the program for execution. We always count it, even when the program |
| 117 | +is an instruction account, because the program must be available for CPI. |
| 118 | + |
| 119 | +There is no special handling for any account owned by the native loader, |
| 120 | +LoaderV1, or LoaderV2. |
| 121 | + |
| 122 | +Account size for programs owned by LoaderV4 is left undefined. This SIMD should |
| 123 | +be amended to define the required semantics before LoaderV4 is enabled on any |
| 124 | +network. |
| 125 | + |
| 126 | +## Alternatives Considered |
| 127 | + |
| 128 | +* Transaction data size accounting is already enabled, so the null option is to |
| 129 | +enshrine the current Agave behavior in the protocol. This is undesirable because |
| 130 | +the current behavior is highly idiosyncratic, and LoaderV3 program sizes are |
| 131 | +routinely undercounted. |
| 132 | +* Builtin programs are backed by accounts that only contain the program name as |
| 133 | +a string, typically making them 15-40 bytes. We could impose a larger fixed cost |
| 134 | +for these. However, they must be made available for all programs anyway, and |
| 135 | +most of them are likely to be ported to BPF eventually, so this adds complexity |
| 136 | +for no real benefit. |
| 137 | +* Several slightly different algorithms were considered for handling LoaderV3 |
| 138 | +programs in particular, for instance only counting programs that are valid for |
| 139 | +execution in the current slot. However, this would implicitly couple transaction |
| 140 | +data size with the results of ELF validation, which is highly undesirable. |
| 141 | +* We considered loading and counting sizes for accounts on the transaction |
| 142 | +account list which are not used for any purpose. This is the current behavior, |
| 143 | +but there is no reason to load such accounts at all. |
| 144 | + |
| 145 | +## Impact |
| 146 | + |
| 147 | +The primary impact is this SIMD makes correctly implementing transaction data |
| 148 | +size accounting much easier for other validator clients. |
| 149 | + |
| 150 | +It makes the calculated size of transactions which include program accounts for |
| 151 | +CPI somewhat larger, but given the generous 64MiB limit, it is unlikely that any |
| 152 | +existing users will be affected. Based on an investigation of a 30-day window, |
| 153 | +transactions larger than 30MiB are virtually never seen. |
| 154 | + |
| 155 | +## Security Considerations |
| 156 | + |
| 157 | +Security impact is minimal because this SIMD merely simplifies an existing |
| 158 | +feature. Care must be taken to implement the rules exactly. |
| 159 | + |
| 160 | +This SIMD requires a feature gate. |
| 161 | + |
| 162 | +## Backwards Compatibility |
| 163 | + |
| 164 | +Transactions that currently have a total transaction data size close to the |
| 165 | +64MiB limit, which call LoaderV3 programs via CPI, may now exceed it and fail. |
0 commit comments