Skip to content

SIMD-0315: Loader-v3 to loader-v4 migration #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions proposals/0315-loader-v3-v4-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
simd: '0315'
title: Loader-v3 to loader-v4 migration
authors:
- Alexander Meißner
category: Standard
type: Core
status: Review
created: 2024-08-15
feature: TBD
extends: SIMD-0167
---

## Summary

Migration of loader-v3 programs to loader-v4.

## Motivation

In order to remove the issues of loader-v3 (mentioned in SIMD-0167) from
validator implementations, all remaining loader-v3 must be migrated to
loader-v4.

## Alternatives Considered

### Loader-v1 programs

Programs of loader-v1 have ABI v0, which is incompatible with ABI v1 used
by programs of loader-v2, v3 and v4. Thus there is no way to migrate these.

### Loader-v2 programs

The two loader-v3 accounts per program in their sum are always larger than
the one resulting loader-v4 account. Thus there is no need for additional
funding. This would not be the case when migrating from loader-v2. Meaning
that expanding this SIMD to cover it as well would require a funding source.
Comment on lines +31 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we couldn't just make the instruction support v2 -> v4 migrations, even if we don't plan to crank those ourselves?


### Global Migration: Coordinated in valiator or out of validator

The global migration could be implemented in the validator, however:

- If the global migration mechanism is inside the validator, the risk of it
being detrimental to block production outweights any possible benefits.
- It would have to be coordinated across all validator implementations,
tested, fuzzed, etc. simply a whole lot more work for something which is only
used once.
- It being triggered manually per program or once (via a feature gate) for all
programs changes nothing about it being controlled by a single key.
- The only difference is in having more fine granular control over the
timing in when a specific programs migration is triggered.
- Doing it outside of the validator allows for the process to be aborted or
patched quickly in case things start going sideways.

## New Terminology

None.

## Detailed Design

The feature gate must enable the new loader-v3 instruction.

### Loader-v3 Instruction: Migrate

- Instruction accounts:
- `[writable]` The program data account.
- `[writable]` The program account.
- `[signer]` The migration authority.
- Instruction data:
- Enum variant `8u32`
- Behavior:
- Check that there are at least three instruction accounts,
otherwise throw `NotEnoughAccountKeys`
- Check that the program data account is writable,
otherwise throw `InvalidArgument`
- Check that the last modified slot (stored in the program data accounts
header) is less than the current slot if the program data has the state
`ProgramData`,
otherwise throw `InvalidArgument`
Comment on lines +75 to +78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe verify the header is ProgramData first?

- Check that the provided authority is either:
- the migration authority
(pubkey is `3Scf35jMNk2xXBD6areNjgMtXgp5ZspDhms8vdcbzC42`)
Comment on lines +80 to +81
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea for anyone to have the unilateral authority to migrate programs even if it's a relatively harmless operation. I prefer the migration to be done by the runtime, maybe automatically at an epoch boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained the reasons against doing so in the alternatives section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing that up. Is there a reason we don't just allow the migration instruction to be run permissionlessly by anyone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done too fast (too many at once). I would rather have them be done one at a time and slowly work though a fixed list of loader-v3 programs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, we could deploy an on-chain program to manage rate-limiting on this. So, it would be effectively permissionless, but the program could track how many have been submitted per-slot and deny past a certain limit.

I have to agree I'm also apprehensive about a keypair that can single-handedly change even a frozen program's state, although I recognize the similarity to a feature keypair.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean giving the migration authority to an on-chain program? A program only runs if somebody sends an transaction. It is still the same controlling of the timing, with extra steps.

god mode key

Again, the migration authority has almost no say in anything, but the timing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lichtso I think you should at least include what @jstarry said under Alternatives Considered

Thanks for writing that up. Is there a reason we don't just allow the migration instruction to be run permissionlessly by anyone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done too fast (too many at once). I would rather have them be done one at a time and slowly work though a fixed list of loader-v3 programs.

Can you explain what too fast means? Why is that an issue, unless the CU cost is incorrect? If txs to migrate programs are winning bids to get executed in a block, then that's what the block should be filled with, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as long as the migration is triggered from the outside via TXs it should not be a problem. But, if it is done runtime internally then the CU budget of the block would be off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so why not just make the instruction permissionless to upgrade? Then you can also do what @buffalojoec said, and allow upgrade of v2 programs to v4. As long as their willing to pay the rent-difference for size.

Downside would be that permissionless upgrade -> user could censor any program for one slot by calling the migration. But I think from dev perspective it's not really that different from US censoring them for one slot?

Overall it seems better than having a new authority key, at least to me. Please let me know if there's something I'm missing.

- or the upgrade authority stored in the program data account
- or the program signer if the program is finalized, closed or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of migrating closed programs to allow resuscitation but I think migrating finalized programs should be done as part of the mass migration because they are more sensitive because they are presumed to be in use, never unavailable (delayed visibility), and there is no guarantee that the program keypair is even stored or secured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the trigger matters for the delayed visibility, does it? Seems like no matter who calls the migration, there will be a delayed visibility slot.

they are more sensitive because they are presumed to be in use, never unavailable (delayed visibility)

This is a really salient point. We should see if there's a way to avoid halting execution on finalized programs.

uninitialized
otherwise throw `IncorrectAuthority`
- Check that the provided authority is a signer,
otherwise throw `MissingRequiredSignature`
- Check that the program account is writable,
otherwise throw `InvalidArgument`
- Check that the program account is owned by loader-v3,
otherwise throw `IncorrectProgramId`
- If the program account has the state `Program` and
the referenced program data account is owned by loader-v3:
- Set the length of the program account to 0
- Transfer all funds from the program data account to the program account
- Assign ownership of the program account to loader-v4
- CPI loader-v4 `SetProgramLength` the program account to the program data
account size minus the loader-v3 header size (45 bytes) and use the
provided authority.
- CPI loader-v4 `Copy` the program data account into the program account
- CPI loader-v4 `Deploy` the program account
- If the program data account was finalized (upgrade authority is `None`):
- CPI loader-v4 `Finalize`
- otherwise, if the program data account was not finalized and the
migration authority (as opposed to the upgrade authority) was provided:
- CPI loader-v4 `TransferAuthority` to the upgrade authority
- Set the length of the program data account to 0 (removing the header too)
- otherwise, if the program account is empty, has the state `Buffer` or has
the state `Program` but the program data account is not owned by loader-v3:
- Set the length of the program account to 0
- Set the `is_executable` flag of the program account to `false`
Comment on lines +108 to +111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just error if someone presents a v3 buffer account here? It can still be used to deploy a v4 program, since Copy allows it, so there's no need to support it here. If we just need a way to close them, maybe we keep the Close instruction active on loader v3?


## Impact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the migrate instruction was not added to the list of bpf loader instructions that can be invoked via cpi. So a multisig owned program would need to first transfer the authority to a keypair based signer to do the migration, right?


This changes enables the migration of programs from loader-v3 to loader-v4
without changing their program address via a new loader-v3 instruction. This
will count as a redeployment and thus render the program unavailable for the
rest of the slot (delay visibility).
Comment on lines +117 to +118
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to not have the migration incur the delay visibility? When finalized programs are migrated it would be best to not make them temporarily unavailable even for one slot. And similarly, depending on how non-upgrade-authority migrations are performed (eg migration authority or some other approach), it would be best to not make them unavailable as it could fail transactions later in the slot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delay visibility is necessary because the index of the global program cache assumes that no two versions of a program coexist within the same slot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with the inner workings of the program cache but could we make it work? Programs are executed exactly the same way (excluding validator perf differences) no matter if they are loader-v3 or loader-v4 owned right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the issue is literally a collision in the map (Pubkey, Slot) => LoadedProgram. You can't have two LoadedProgram entries for the same (Pubkey, Slot) key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When finalized programs are migrated it would be best to not make them temporarily unavailable even for one slot.

I would even extend this idea to non-finalized programs as well. If the ELF didn't change, there's no reason the program should have to pay with a slot.

@Lichtso since the ELF isn't changing, and we are guaranteeing that in the loader v3 instruction, could we add a codepath to the program cache that handles in-place mutation of a loaded entry? It sounds sketchy typing it out, but it might be worth exploring.


Once new programs can not be deployed on loader-v3 anymore, the list of all
loader-v3 programs becomes fixed and can be extracted from a snapshot. Using
the added loader-v3 migration instruction and the global migration authority,
the core protocol developers will then migrate all loader-v3 programs to
loader-v4 programs, which once completed:
Comment on lines +120 to +124
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the mass migration was defined and activated in a separate SIMD + feature gate but could be convinced otherwise.

  1. This SIMD is useful already without global migration
  2. This SIMD is already big and complex enough without global migration
  3. Global migration could use some more discussion about alternative approaches to the global migration authority

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SIMD was already split off from the loader-v4 one and is not even that long anymore. I think the global migration very much belongs in here as well as the program local one. Both use the same logic / mechanism, only the trigger is different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we ever done anything like this before? What kind of coordination goes into a global migration like the one proposed? Just curious if we have any precedents.


- removes the need to copy ELFs during program loading to align them.
- allows transaction account loading to be simplified, because every program
would load exactly one account, no need to load the proxy account to get to
the actual program data (which is not listed in the transaction accounts).
- allows the removal of the write lock demotion exception if loader-v3 is
present in a transaction.
- allows dApp devs to resuscitate closed loader-v3 programs if they still
control the program authority. This allows redeployment at the same address
or completely closing the program account in order to retrieve the locked
funds.

## Security Considerations

None.

## Backwards Compatibility

None.
Loading