|
| 1 | +--- |
| 2 | +simd: '0321' |
| 3 | +title: VM Register 2 Instruction Data Pointer |
| 4 | +authors: |
| 5 | + - Joe Caulfield (Anza) |
| 6 | +category: Standard |
| 7 | +type: Core |
| 8 | +status: Review |
| 9 | +created: 2025-07-11 |
| 10 | +feature: (fill in with feature key and github tracking issues once accepted) |
| 11 | +--- |
| 12 | + |
| 13 | +## Summary |
| 14 | + |
| 15 | +Provide a pointer to instruction data in VM register 2 (`r2`) at program |
| 16 | +entrypoint, enabling direct access to instruction data without parsing the |
| 17 | +serialized input region. |
| 18 | + |
| 19 | +## Motivation |
| 20 | + |
| 21 | +Currently, sBPF programs must parse the entire serialized input region to |
| 22 | +locate instruction data. The serialization layout places accounts before |
| 23 | +instruction data, requiring programs to iterate through all accounts before |
| 24 | +reaching the instruction data section. This is inefficient for programs that |
| 25 | +primarily or exclusively need to access instruction data. |
| 26 | + |
| 27 | +By providing a direct pointer to instruction data in `r2`, programs can |
| 28 | +immediately access this data without any parsing overhead, resulting in |
| 29 | +improved performance and reduced compute unit consumption. |
| 30 | + |
| 31 | +## New Terminology |
| 32 | + |
| 33 | +* **Instruction data pointer**: An 8-byte pointer stored in VM register 2 that |
| 34 | + points directly to the start of the instruction data section in the input |
| 35 | + region. |
| 36 | + |
| 37 | +## Detailed Design |
| 38 | + |
| 39 | +When the feature is activated, the VM shall set register 2 (`r2`) to contain a |
| 40 | +pointer to the beginning of the instruction data section within the input |
| 41 | +region. The instruction data format remains unchanged: |
| 42 | + |
| 43 | +``` |
| 44 | +[8 bytes: data length (little-endian)][N bytes: instruction data] |
| 45 | +``` |
| 46 | + |
| 47 | +This pointer in `r2` is made available to all programs, under all loaders, |
| 48 | +regardless of whether or not the value is read. Prior to this feature, `r2` |
| 49 | +contains uninitialized data at program entrypoint. This change assumes no |
| 50 | +existing programs depend on the garbage value in `r2`. |
| 51 | + |
| 52 | +**Register Assignment:** |
| 53 | + |
| 54 | +* `r1`: Input region pointer (existing behavior) |
| 55 | +* `r2`: Pointer to instruction data section (new) |
| 56 | + |
| 57 | +**Pointer Details:** |
| 58 | + |
| 59 | +* The pointer in `r2` points to the first byte of the actual instruction data, |
| 60 | + NOT the length field. |
| 61 | +* The pointer value in `r2` is stored as a native 64-bit pointer (8 bytes) in |
| 62 | + little-endian format (x86_64). |
| 63 | +* When there is no instruction data (length = 0), `r2` still points to where |
| 64 | + the instruction data would be, immediately after the 8-byte length field. |
| 65 | +* The pointer must always point to valid memory within the input region bounds. |
| 66 | + |
| 67 | +## Alternatives Considered |
| 68 | + |
| 69 | +1. **Provide a pointer to instruction data length**: Store a pointer to the |
| 70 | + instruction data length field in `r2`. However, providing a direct pointer to |
| 71 | + the start of instruction data is more ergonomic. |
| 72 | + |
| 73 | +2. **Provide optional entrypoint parameter**: Allow programs to opt-in via a |
| 74 | + different entrypoint signature. The current approach is simpler as it avoids |
| 75 | + supporting multiple entrypoint signatures and makes the pointer universally |
| 76 | + available. This relies on the assumption that no programs depend on the |
| 77 | + garbage value previously in `r2`. |
| 78 | + |
| 79 | +3. **Modify serialization layout**: The serialization layout will eventually be |
| 80 | + overhauled with ABI v2, a comprehensive upgrade that could resolve this issue |
| 81 | + among many others. Given the significant scope of ABI v2 and potential for |
| 82 | + delays, this targeted optimization provides immediate value. |
| 83 | + |
| 84 | +## Impact |
| 85 | + |
| 86 | +On-chain programs are positively impacted by this change. The new `r2` pointer |
| 87 | +gives programs the ability to efficiently read instruction data, further |
| 88 | +customize their program's control flow and maximize compute unit effiency. |
| 89 | +However, any programs that currently depend on the uninitialized/garbage value |
| 90 | +in `r2` at entrypoint will break when this feature is activated. |
| 91 | + |
| 92 | +Validators are almost completely unaffected as the instruction data pointer is |
| 93 | +already available during serialization, and setting a register is a negligible |
| 94 | +CPU operation. |
| 95 | + |
| 96 | +Core contributors must implement this feature, which should be extremely |
| 97 | +minimally invasive, depending on the VM implementation. |
| 98 | + |
| 99 | +## Security Considerations |
| 100 | + |
| 101 | +Programs should read and validate the instruction data length (stored at `r2 - 8`) |
| 102 | +before accessing data via the `r2` pointer. Failing to check the length could |
| 103 | +result in reading unintended memory contents or out-of-bounds access attempts. |
| 104 | + |
| 105 | +Additionally, programs that currently rely on `r2` containing uninitialized or |
| 106 | +garbage data at entrypoint will experience breaking changes when this feature |
| 107 | +is activated. |
| 108 | + |
| 109 | +## Backwards Compatibility |
| 110 | + |
| 111 | +This feature is only backwards compatible for programs that currently do not |
| 112 | +read from `r2` at program entrypoint. |
| 113 | + |
| 114 | +This feature is NOT backwards compatible for any programs that depend on the |
| 115 | +uninitialized/garbage data previously in `r2`. |
0 commit comments