Skip to content

Commit a6a212a

Browse files
committed
vm register 2 instruction data pointer
1 parent 0a210df commit a6a212a

File tree

1 file changed

+115
-0
lines changed

1 file changed

+115
-0
lines changed
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
simd: '0321'
3+
title: VM Register 2 Instruction Data Pointer
4+
authors:
5+
- Joe Caulfield (Anza)
6+
category: Standard
7+
type: Core
8+
status: Review
9+
created: 2025-07-11
10+
feature: (fill in with feature key and github tracking issues once accepted)
11+
---
12+
13+
## Summary
14+
15+
Provide a pointer to instruction data in VM register 2 (`r2`) at program
16+
entrypoint, enabling direct access to instruction data without parsing the
17+
serialized input region.
18+
19+
## Motivation
20+
21+
Currently, sBPF programs must parse the entire serialized input region to
22+
locate instruction data. The serialization layout places accounts before
23+
instruction data, requiring programs to iterate through all accounts before
24+
reaching the instruction data section. This is inefficient for programs that
25+
primarily or exclusively need to access instruction data.
26+
27+
By providing a direct pointer to instruction data in `r2`, programs can
28+
immediately access this data without any parsing overhead, resulting in
29+
improved performance and reduced compute unit consumption.
30+
31+
## New Terminology
32+
33+
* **Instruction data pointer**: An 8-byte pointer stored in VM register 2 that
34+
points directly to the start of the instruction data section in the input
35+
region.
36+
37+
## Detailed Design
38+
39+
When the feature is activated, the VM shall set register 2 (`r2`) to contain a
40+
pointer to the beginning of the instruction data section within the input
41+
region. The instruction data format remains unchanged:
42+
43+
```
44+
[8 bytes: data length (little-endian)][N bytes: instruction data]
45+
```
46+
47+
This pointer in `r2` is made available to all programs, under all loaders,
48+
regardless of whether or not the value is read. Prior to this feature, `r2`
49+
contains uninitialized data at program entrypoint. This change assumes no
50+
existing programs depend on the garbage value in `r2`.
51+
52+
**Register Assignment:**
53+
54+
* `r1`: Input region pointer (existing behavior)
55+
* `r2`: Pointer to instruction data section (new)
56+
57+
**Pointer Details:**
58+
59+
* The pointer in `r2` points to the first byte of the actual instruction data,
60+
NOT the length field.
61+
* The pointer value in `r2` is stored as a native 64-bit pointer (8 bytes) in
62+
little-endian format (x86_64).
63+
* When there is no instruction data (length = 0), `r2` still points to where
64+
the instruction data would be, immediately after the 8-byte length field.
65+
* The pointer must always point to valid memory within the input region bounds.
66+
67+
## Alternatives Considered
68+
69+
1. **Provide a pointer to instruction data length**: Store a pointer to the
70+
instruction data length field in `r2`. However, providing a direct pointer to
71+
the start of instruction data is more ergonomic.
72+
73+
2. **Provide optional entrypoint parameter**: Allow programs to opt-in via a
74+
different entrypoint signature. The current approach is simpler as it avoids
75+
supporting multiple entrypoint signatures and makes the pointer universally
76+
available. This relies on the assumption that no programs depend on the
77+
garbage value previously in `r2`.
78+
79+
3. **Modify serialization layout**: The serialization layout will eventually be
80+
overhauled with ABI v2, a comprehensive upgrade that could resolve this issue
81+
among many others. Given the significant scope of ABI v2 and potential for
82+
delays, this targeted optimization provides immediate value.
83+
84+
## Impact
85+
86+
On-chain programs are positively impacted by this change. The new `r2` pointer
87+
gives programs the ability to efficiently read instruction data, further
88+
customize their program's control flow and maximize compute unit effiency.
89+
However, any programs that currently depend on the uninitialized/garbage value
90+
in `r2` at entrypoint will break when this feature is activated.
91+
92+
Validators are almost completely unaffected as the instruction data pointer is
93+
already available during serialization, and setting a register is a negligible
94+
CPU operation.
95+
96+
Core contributors must implement this feature, which should be extremely
97+
minimally invasive, depending on the VM implementation.
98+
99+
## Security Considerations
100+
101+
Programs should read and validate the instruction data length (stored at `r2 - 8`)
102+
before accessing data via the `r2` pointer. Failing to check the length could
103+
result in reading unintended memory contents or out-of-bounds access attempts.
104+
105+
Additionally, programs that currently rely on `r2` containing uninitialized or
106+
garbage data at entrypoint will experience breaking changes when this feature
107+
is activated.
108+
109+
## Backwards Compatibility
110+
111+
This feature is only backwards compatible for programs that currently do not
112+
read from `r2` at program entrypoint.
113+
114+
This feature is NOT backwards compatible for any programs that depend on the
115+
uninitialized/garbage data previously in `r2`.

0 commit comments

Comments
 (0)