-
Notifications
You must be signed in to change notification settings - Fork 179
SIMD-0166: SBPF Dynamic stack frames #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
f9fa54f
c02b8b6
8767c68
aa4ee27
c5fba11
fe00d51
06c3e5c
dd8f2b5
a75d939
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
--- | ||
simd: '0166' | ||
title: SBPF Dynamic stack frames | ||
authors: | ||
- Alexander Meißner | ||
- Alessandro Decina | ||
- Lucas Steuernagel | ||
category: Standard | ||
type: Core | ||
status: Review | ||
created: 2024-08-19T00:00:00.000Z | ||
feature: null | ||
supersedes: null | ||
superseded-by: null | ||
extends: null | ||
--- | ||
|
||
## Summary | ||
|
||
The SVM currently allocates a fixed amount of stack space to each function | ||
frame. We propose allowing programs to dynamically manage their stack space | ||
through the introduction of an explicit stack pointer. | ||
|
||
## Motivation | ||
|
||
The SVM allocates a fixed amount of memory to hold a program’s stack. Within | ||
the stack region, the virtual machine reserves 4096 bytes of stack space for | ||
each function frame. This is simultaneously limiting for functions that | ||
require more space, and wasteful for functions that require less space. | ||
|
||
For well optimized programs that don’t allocate large amounts of stack, the | ||
virtual machine currently still reserves 4096 bytes of stack for each function | ||
call, leading to suboptimal memory usage, which may cause unnecessary page | ||
faults. | ||
|
||
On the other hand, some programs are known to create large function frames - | ||
this seems common with programs that serialize a lot of data - and they have | ||
to jump through hoops to avoid overflowing the stack. The virtual machine | ||
detects when a stack overflow occurs, and it does so by implementing a stack | ||
frame gaps system whereby it inserts a virtual sentinel frame following a | ||
valid function frame. If the sentinel frame is accessed, the executing program | ||
is aborted. This system is fragile and is incompatible with direct mapping - a | ||
feature we expect to enable soon. | ||
|
||
The changes proposed in this document would allow us to optimize stack memory | ||
usage and remove the fragile stack frame gaps system. Note that we do not | ||
propose to remove the existing maximum stack space limit: stack space stays | ||
unchanged, what changes is how it is partitioned internally. | ||
|
||
## Alternatives Considered | ||
|
||
To cope with the SBF limitation of 4096 bytes for the frame size, we could | ||
have increased such a number. Even though this would solve the original | ||
problem, it would supply functions with an unnecessary amount of memory. In | ||
addition, such a solution would increase pressure on the total memory | ||
available for the call stack. Either we would need to increase the total | ||
allocation for the virtual machine or decrease the maximum call depth. | ||
|
||
## New Terminology | ||
|
||
None. | ||
|
||
## Detailed Design | ||
|
||
Bringing dynamic stack frames to the Solana Bytecode Format and its | ||
corresponding virtual machine entails changes in several aspects of the | ||
execution environment. | ||
|
||
### Changes in the execution environment | ||
|
||
We will repurpose the existing R10 register from a frame pointer to a stack | ||
pointer. In other words, it must stop representing the highest address | ||
accessible in a frame, and must now point to the lowest address in a frame. | ||
|
||
Such a change entails a change in the direction of stack growth. Presently, we | ||
stack frames on top of each other, but the memory usage within them grows | ||
downward. In the new frame setting, both the placement of new frames and the | ||
memory usage inside frames must be downward. | ||
|
||
Functions in SBF must alter the stack pointer using the `add64 reg, imm` | ||
(opcode `0x07`) instruction only, allowing them to request any desirable | ||
amount of stack space, provided that it meets the required alignment (refer to | ||
the following section). | ||
|
||
The stack frame gaps feature, which creates a memory layout where frames are | ||
interleaved with equally sized gaps, are not compatible with dynamic stack | ||
frames and must be deactivated. | ||
|
||
### Stack alignment | ||
|
||
We want to enforce that the stack pointer remains aligned, therefore R10 must | ||
only be incremented or decremented by a multiple of 64. Large alignments might | ||
seem wasteful, but enforcing a sufficiently big alignment will spark | ||
innovation in interpreters and JITs, ultimately leading to much better | ||
performance and thus lower costs. | ||
|
||
Based on the current AVX-512 instructions available on Intel and AMD | ||
processors, the stack alignment must be 64 bytes. Even if current interpreters | ||
do not take advantage of these vectorized instructions, we believe that future | ||
generation interpreters might be able to vectorize SBF programs to speed up | ||
common operations, such as copying or comparing public keys and signatures. | ||
An unaligned stack prohibits such innovations. | ||
|
||
### Changes in the verifier | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would simplify this section. ATM it seems to describe the whole process of validating registers, not just the changes due to this SIMD. The verifier should include a new rule to accept R11 as a destination register under the following conditions:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a comment to my comment :) Based on the current code, I'd simply add an extra "and" to the "if" condition that returns Ok() for R11. Therefore I don't think we need an error for alignment. Either R11 is allowed (if all 3 conditions pass), or it's not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I re-wrote the section, based on the new design we devised with only R10. |
||
|
||
The verifier must now allow R10 to be the destination register of the | ||
`add64 reg, imm` (opcode `0x07`) instruction. | ||
|
||
The verifier must throw `VerifierError::UnalignedImmediate` when the immediate | ||
value of `add64 reg, imm` (opcode `0x07`) is not a multiple of 64 and the | ||
destination register is R10. The error must only be raised when both | ||
conditions happen simultaneously. | ||
|
||
### Changes in code generation | ||
|
||
In the compiler side, dynamic stack frames allow for some optimizations. | ||
First, when a function does not need any stack allocated variable, code | ||
generation must not create any instruction to modify R10. In addition, we can | ||
stop using R5 as a stack spill register when a function call receives more | ||
than five arguments. With dynamic stack frames, the compiler must use | ||
registers R1 to R5 for the first five arguments and place remainder arguments | ||
in the caller frame, easily retrieving them in the callee as an offset from | ||
the stack pointer. This new call convention obviates the need to use R5 for | ||
retrieving the caller’s frame pointer address to access those parameters. | ||
|
||
### Identification of programs | ||
|
||
As per the description in SIMD-0161, programs compiled with dynamic stack | ||
frames must contain the `0x02` flag on their ELF header `e_flags` field. | ||
|
||
## Impact | ||
|
||
We foresee a positive impact in smart contract development. Developers won’t | ||
need to worry about exceeding the maximum frame space allowed for a function | ||
and won’t face any case of stack access violation if their code follows | ||
conventional Rust safety rules. Likewise, when we update the Rust version of | ||
our platform tools, developers will not have the burden of modifying their | ||
contract just because the newer version is using more stack than the previous | ||
one, often reaching the 4096 bytes limit. Refer to issues | ||
[#1186](https://github.com/anza-xyz/agave/issues/1186) and | ||
[#1158](https://github.com/anza-xyz/agave/issues/1158). | ||
|
||
We also expect some improvements in program execution. For functions with no | ||
stack usage, we will not emit the additional instruction that modifies R10. | ||
Furthermore, for function calls that handle more than five arguments, there | ||
will be one less store and one less load operation due to the new call convention. | ||
|
||
## Security Considerations | ||
|
||
Stack gaps will be disabled for dynamic stack frames to work. Stack gaps could | ||
detect invalid accesses between two function frames, if the accessed address | ||
would fall between them. With dynamic stack frames, all stack access will be | ||
valid, provided that their address is within the allowed range. We already | ||
allow functions to read and modify the memory inside the frame of other | ||
functions, so removing the stack gaps should not bring any security | ||
implications. | ||
|
||
Although one can change R10 to almost any value that fits in a 64-bit integer | ||
with `add64 reg, imm`, every memory access is verified, so there is no risk of | ||
invalid accesses from a corrupt register. | ||
|
||
## Drawbacks | ||
|
||
Programs will consume negligibly more compute units, as most functions will | ||
include two extra instructions: one to increment the stack pointer and another | ||
one to decrement it. |
Uh oh!
There was an error while loading. Please reload this page.