-
Notifications
You must be signed in to change notification settings - Fork 179
SIMD-0315: Loader-v3 to loader-v4 migration #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
--- | ||
simd: '0315' | ||
title: Loader-v3 to loader-v4 migration | ||
authors: | ||
- Alexander Meißner | ||
category: Standard | ||
type: Core | ||
status: Review | ||
created: 2024-08-15 | ||
feature: TBD | ||
extends: SIMD-0167 | ||
--- | ||
|
||
## Summary | ||
|
||
Migration of loader-v3 programs to loader-v4. | ||
|
||
## Motivation | ||
|
||
In order to remove the issues of loader-v3 (mentioned in SIMD-0167) from | ||
validator implementations, all remaining loader-v3 must be migrated to | ||
loader-v4. | ||
|
||
## Alternatives Considered | ||
|
||
### Loader-v1 programs | ||
|
||
Programs of loader-v1 have ABI v0, which is incompatible with ABI v1 used | ||
by programs of loader-v2, v3 and v4. Thus there is no way to migrate these. | ||
|
||
### Loader-v2 programs | ||
|
||
The two loader-v3 accounts per program in their sum are always larger than | ||
the one resulting loader-v4 account. Thus there is no need for additional | ||
funding. This would not be the case when migrating from loader-v2. Meaning | ||
that expanding this SIMD to cover it as well would require a funding source. | ||
|
||
### Global Migration: Coordinated in valiator or out of validator | ||
|
||
The global migration could be implemented in the validator, however: | ||
|
||
- If the global migration mechanism is inside the validator, the risk of it | ||
being detrimental to block production outweights any possible benefits. | ||
- It would have to be coordinated across all validator implementations, | ||
tested, fuzzed, etc. simply a whole lot more work for something which is only | ||
used once. | ||
- It being triggered manually per program or once (via a feature gate) for all | ||
programs changes nothing about it being controlled by a single key. | ||
- The only difference is in having more fine granular control over the | ||
timing in when a specific programs migration is triggered. | ||
- Doing it outside of the validator allows for the process to be aborted or | ||
patched quickly in case things start going sideways. | ||
|
||
## New Terminology | ||
|
||
None. | ||
|
||
## Detailed Design | ||
|
||
The feature gate must enable the new loader-v3 instruction. | ||
|
||
### Loader-v3 Instruction: Migrate | ||
|
||
- Instruction accounts: | ||
- `[writable]` The program data account. | ||
- `[writable]` The program account. | ||
- `[signer]` The migration authority. | ||
- Instruction data: | ||
- Enum variant `8u32` | ||
- Behavior: | ||
- Check that there are at least three instruction accounts, | ||
otherwise throw `NotEnoughAccountKeys` | ||
- Check that the program data account is writable, | ||
otherwise throw `InvalidArgument` | ||
- Check that the last modified slot (stored in the program data accounts | ||
header) is less than the current slot if the program data has the state | ||
`ProgramData`, | ||
otherwise throw `InvalidArgument` | ||
Comment on lines
+75
to
+78
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe verify the header is |
||
- Check that the provided authority is either: | ||
- the migration authority | ||
(pubkey is `3Scf35jMNk2xXBD6areNjgMtXgp5ZspDhms8vdcbzC42`) | ||
Comment on lines
+80
to
+81
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it's a good idea for anyone to have the unilateral authority to migrate programs even if it's a relatively harmless operation. I prefer the migration to be done by the runtime, maybe automatically at an epoch boundary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explained the reasons against doing so in the alternatives section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for writing that up. Is there a reason we don't just allow the migration instruction to be run permissionlessly by anyone? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could be done too fast (too many at once). I would rather have them be done one at a time and slowly work though a fixed list of loader-v3 programs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Heh, we could deploy an on-chain program to manage rate-limiting on this. So, it would be effectively permissionless, but the program could track how many have been submitted per-slot and deny past a certain limit. I have to agree I'm also apprehensive about a keypair that can single-handedly change even a frozen program's state, although I recognize the similarity to a feature keypair. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean giving the migration authority to an on-chain program? A program only runs if somebody sends an transaction. It is still the same controlling of the timing, with extra steps.
Again, the migration authority has almost no say in anything, but the timing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you explain what too fast means? Why is that an issue, unless the CU cost is incorrect? If txs to migrate programs are winning bids to get executed in a block, then that's what the block should be filled with, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, as long as the migration is triggered from the outside via TXs it should not be a problem. But, if it is done runtime internally then the CU budget of the block would be off. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, so why not just make the instruction permissionless to upgrade? Then you can also do what @buffalojoec said, and allow upgrade of v2 programs to v4. As long as their willing to pay the rent-difference for size. Downside would be that permissionless upgrade -> user could censor any program for one slot by calling the migration. But I think from dev perspective it's not really that different from US censoring them for one slot? Overall it seems better than having a new authority key, at least to me. Please let me know if there's something I'm missing. |
||
- or the upgrade authority stored in the program data account | ||
- or the program signer if the program is finalized, closed or | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm in favor of migrating closed programs to allow resuscitation but I think migrating finalized programs should be done as part of the mass migration because they are more sensitive because they are presumed to be in use, never unavailable (delayed visibility), and there is no guarantee that the program keypair is even stored or secured. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the trigger matters for the delayed visibility, does it? Seems like no matter who calls the migration, there will be a delayed visibility slot.
This is a really salient point. We should see if there's a way to avoid halting execution on finalized programs. |
||
uninitialized | ||
Lichtso marked this conversation as resolved.
Show resolved
Hide resolved
|
||
otherwise throw `IncorrectAuthority` | ||
- Check that the provided authority is a signer, | ||
otherwise throw `MissingRequiredSignature` | ||
- Check that the program account is writable, | ||
otherwise throw `InvalidArgument` | ||
- Check that the program account is owned by loader-v3, | ||
otherwise throw `IncorrectProgramId` | ||
- If the program account has the state `Program` and | ||
the referenced program data account is owned by loader-v3: | ||
- Set the length of the program account to 0 | ||
- Transfer all funds from the program data account to the program account | ||
- Assign ownership of the program account to loader-v4 | ||
- CPI loader-v4 `SetProgramLength` the program account to the program data | ||
account size minus the loader-v3 header size (45 bytes) and use the | ||
provided authority. | ||
- CPI loader-v4 `Copy` the program data account into the program account | ||
- CPI loader-v4 `Deploy` the program account | ||
- If the program data account was finalized (upgrade authority is `None`): | ||
- CPI loader-v4 `Finalize` | ||
- otherwise, if the program data account was not finalized and the | ||
migration authority (as opposed to the upgrade authority) was provided: | ||
- CPI loader-v4 `TransferAuthority` to the upgrade authority | ||
- Set the length of the program data account to 0 (removing the header too) | ||
- otherwise, if the program account is empty, has the state `Buffer` or has | ||
the state `Program` but the program data account is not owned by loader-v3: | ||
- Set the length of the program account to 0 | ||
- Set the `is_executable` flag of the program account to `false` | ||
Comment on lines
+108
to
+111
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we just error if someone presents a v3 buffer account here? It can still be used to deploy a v4 program, since |
||
|
||
## Impact | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like the migrate instruction was not added to the list of bpf loader instructions that can be invoked via cpi. So a multisig owned program would need to first transfer the authority to a keypair based signer to do the migration, right? |
||
|
||
This changes enables the migration of programs from loader-v3 to loader-v4 | ||
without changing their program address via a new loader-v3 instruction. This | ||
will count as a redeployment and thus render the program unavailable for the | ||
rest of the slot (delay visibility). | ||
Comment on lines
+117
to
+118
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to not have the migration incur the delay visibility? When finalized programs are migrated it would be best to not make them temporarily unavailable even for one slot. And similarly, depending on how non-upgrade-authority migrations are performed (eg migration authority or some other approach), it would be best to not make them unavailable as it could fail transactions later in the slot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Delay visibility is necessary because the index of the global program cache assumes that no two versions of a program coexist within the same slot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not very familiar with the inner workings of the program cache but could we make it work? Programs are executed exactly the same way (excluding validator perf differences) no matter if they are loader-v3 or loader-v4 owned right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No the issue is literally a collision in the map There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I would even extend this idea to non-finalized programs as well. If the ELF didn't change, there's no reason the program should have to pay with a slot. @Lichtso since the ELF isn't changing, and we are guaranteeing that in the loader v3 instruction, could we add a codepath to the program cache that handles in-place mutation of a loaded entry? It sounds sketchy typing it out, but it might be worth exploring. |
||
|
||
Once new programs can not be deployed on loader-v3 anymore, the list of all | ||
loader-v3 programs becomes fixed and can be extracted from a snapshot. Using | ||
the added loader-v3 migration instruction and the global migration authority, | ||
the core protocol developers will then migrate all loader-v3 programs to | ||
loader-v4 programs, which once completed: | ||
Comment on lines
+120
to
+124
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer if the mass migration was defined and activated in a separate SIMD + feature gate but could be convinced otherwise.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This SIMD was already split off from the loader-v4 one and is not even that long anymore. I think the global migration very much belongs in here as well as the program local one. Both use the same logic / mechanism, only the trigger is different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have we ever done anything like this before? What kind of coordination goes into a global migration like the one proposed? Just curious if we have any precedents. |
||
|
||
- removes the need to copy ELFs during program loading to align them. | ||
- allows transaction account loading to be simplified, because every program | ||
would load exactly one account, no need to load the proxy account to get to | ||
the actual program data (which is not listed in the transaction accounts). | ||
- allows the removal of the write lock demotion exception if loader-v3 is | ||
present in a transaction. | ||
- allows dApp devs to resuscitate closed loader-v3 programs if they still | ||
control the program authority. This allows redeployment at the same address | ||
or completely closing the program account in order to retrieve the locked | ||
funds. | ||
|
||
## Security Considerations | ||
|
||
None. | ||
|
||
## Backwards Compatibility | ||
|
||
None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we couldn't just make the instruction support v2 -> v4 migrations, even if we don't plan to crank those ourselves?