Skip to content

ForwardSkipped and refactorings for retry #11307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 23, 2025
Merged

ForwardSkipped and refactorings for retry #11307

merged 5 commits into from
Apr 23, 2025

Conversation

turadg
Copy link
Member

@turadg turadg commented Apr 21, 2025

closes: https://github.com/Agoric/agoric-private/issues/309

Description

This adds a ForwardSkipped state per https://github.com/Agoric/agoric-private/issues/309

It also has a series of refactorings to prepare for https://github.com/Agoric/agoric-private/issues/311

Reviewers,

  • review by commit is suggested
  • if you prefer to punt any refactorings to the next feature PR I can oblige

Security Considerations

none

Scaling Considerations

none

Documentation Considerations

none

Testing Considerations

CI

Upgrade Considerations

This will be part of the CCTP beta and must successfully upgrade in Mainnet. The RunUtils test does upgrade and already caught an intermediate bug (omitting the obsolete handler facets) which I've documented.

@turadg turadg requested review from mhofman and dckc April 21, 2025 23:23
@turadg turadg requested a review from a team as a code owner April 21, 2025 23:23
Copy link

cloudflare-workers-and-pages bot commented Apr 21, 2025

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: bcecfdb
Status: ✅  Deploy successful!
Preview URL: https://d0d6af8c.agoric-sdk.pages.dev
Branch Preview URL: https://309-forwardskipped.agoric-sdk.pages.dev

View logs

@turadg turadg added the force:integration Force integration tests to run on PR label Apr 21, 2025
Copy link
Member

@dckc dckc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good, but I'd like to understand some parts better before approving

and I question one red alert.

Comment on lines 74 to 77
if (reference === currentChainReference) {
await settlement.send(destination, amount);
statusManager.forwarded(txHash);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want an early return in this if?

the agoric chain is also in the cosmos namespace, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll try to figure out why no tests failed. Maybe they did both send() and transfer() and ended up with the same final state.

Comment on lines +25 to +26
getNobleICA: () => OrchestrationAccount<{ chainId: 'noble-1' }>;
settlementAccount: Promise<OrchestrationAccount<{ chainId: 'agoric-any' }>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking out loud... we pass a callback to get the noble ICA, but a promise to get the settlementAccount. seems a little inconsistent. I suppose there are constraints that motivate the different approaches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I share the concern. What I could do is refactor it also to a promise but I'm not sure it's better,

   const nobleIcaPK = makePromiseKit();
   if (baggage.has(NOBLE_ICA_BAGGAGE_KEY))
       nobleIcaPK.resolve(baggage.get(NOBLE_ICA_BAGGAGE_KEY));

And then whenever it is set, also resolve it.

Maybe there's some "durably stored promise" kit abstraction we could make, but still I don't think it's worth it.

What I will do is at a comment justifying the disharmony.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"durably stored promise" kit abstraction

That would be a VowKit, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a regular promise that gets durably stored, like is happening here. "durably stored promise result kit" may have been clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native promises cannot be durably stored. The design for durable promises puts them more in line with vows than promises.

I think I still don't understand what "durably stored promise result kit" means.

Comment on lines -1625 to +1644
asyncFuncEagerWakers: [],
asyncFuncEagerWakers: [
Object @Alleged: asyncFlow flow {},
],
asyncFuncFailures: {},
flowForOutcomeVow: {},
flowForOutcomeVow: {
'Alleged: VowInternalsKit vowV0': 'Alleged: asyncFlow flow',
},
Copy link
Member

@dckc dckc Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I don't exactly understand what asyncFuncEagerWakers and flowForOutcomeVow are, but I suppose these are compatible changes.

and we do have a relevant upgrade test: deploy HEAD; upgrade to support EVM destinations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are internal state for async flow.

@turadg
Copy link
Member Author

turadg commented Apr 22, 2025

This failure is one I haven't seen before:

2025-04-22T02:30:32.858Z SwingSet: vat: v44: chainHub: registering chains [ 'agoric', 'ethereum', 'noble', 'osmosis', 'unreachableChain' ]
2025-04-22T02:30:32.864Z SwingSet: ls: v44: Logging sent error stack (Error#1)
2025-04-22T02:30:32.864Z SwingSet: ls: v44: Error#1: key agoric already registered in collection chainInfos

This doesn't have any changes I can see causing that so I'll run again and see if it's a rare flake.

@turadg turadg force-pushed the 309-ForwardSkipped branch from 6c028e5 to 5e4ad34 Compare April 22, 2025 13:13
@0xpatrickdev
Copy link
Contributor

This failure is one I haven't seen before:

2025-04-22T02:30:32.858Z SwingSet: vat: v44: chainHub: registering chains [ 'agoric', 'ethereum', 'noble', 'osmosis', 'unreachableChain' ]
2025-04-22T02:30:32.864Z SwingSet: ls: v44: Logging sent error stack (Error#1)
2025-04-22T02:30:32.864Z SwingSet: ls: v44: Error#1: key agoric already registered in collection chainInfos

I think we need to lift the call to registerChainsAndAssets(...). chainHub.getChainInfo('agoric') gets called during makeLocalAccount() and will populate the "agoric" entry using vstorage if nothing is present.

Current HEAD:

const firstIncarnationKey = 'firstIncarnationKey';
if (!baggage.has(firstIncarnationKey)) {
baggage.init(firstIncarnationKey, true);
registerChainsAndAssets(
chainHub,
terms.brands,
privateArgs.chainInfo,
privateArgs.assetInfo,
);
}
const feedKit = zone.makeOnce('Feed Kit', () => makeFeedKit());
const poolAccountV = zone.makeOnce('PoolAccount', () => makeLocalAccount());
const settleAccountV = zone.makeOnce('SettleAccount', () =>
makeLocalAccount(),
);

Comment on lines +28 to +33
const poolAccountSendPK = makePromiseKit<void>();
const poolAccountTransferPK = makePromiseKit<void>();
const settleAccountTransferPK = makePromiseKit<void>();
const settleAccountSendPK = makePromiseKit<void>();
const intermediateAccountTransferPK = makePromiseKit<void>();
const intermediateAccountDepositForBurnPK = makePromiseKit<void>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand this change more. I think these should still be VowKit's?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the need to provide the promises that the async-flow expects. Previously it was a vow handler in Settler, which didn't go through a membrane. Now it's an async flow which when integrated goes through a membrane but in this unit test does not.

I tried to document this in the comment above. Edit suggestions welcomed.

Copy link
Member Author

@turadg turadg Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the comment is below,

    // These each have VResolver for "vow" resolver. The mocks actually
    // deal in promises but the flow that awaits them expects that it's actually
    // awaiting a vow (made by the membrane to look like a promise).

@turadg
Copy link
Member Author

turadg commented Apr 22, 2025

will populate the "agoric" entry using vstorage if nothing is present

Thanks for the diagnosis. I'll try your work-around.

Should we schedule this?

@turadg turadg force-pushed the 309-ForwardSkipped branch from 5e4ad34 to 28553b6 Compare April 22, 2025 15:41
@0xpatrickdev
Copy link
Contributor

will populate the "agoric" entry using vstorage if nothing is present

Thanks for the diagnosis. I'll try your work-around.

Should we schedule this?

No problem but I wouldn't call it a workaround - we must register info in ChainHub before we try to use it.

And yes we might consider prioritizing #10602. I also just created #11310 which would help surface this issue. Instead of seeing Error: key agoric already registered in collection chainInfos, we'd see Error: chain not found: agoric

@turadg turadg force-pushed the 309-ForwardSkipped branch from 086a7ea to c76fecc Compare April 22, 2025 18:22
@turadg turadg added the automerge:rebase Automatically rebase updates, then merge label Apr 22, 2025
@mergify mergify bot force-pushed the 309-ForwardSkipped branch from c76fecc to 4824f98 Compare April 22, 2025 18:48
Copy link
Contributor

mergify bot commented Apr 22, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify bot added a commit that referenced this pull request Apr 22, 2025
_incidental_

## Description
I burned time debugging a failure in #11307 because the output lost the error message and the stack trace. It's particularly costly in multichain-testing because of Starship setup time, how slow the tests are to rerun.

The problem was that `t.notThrowsAsync` called from outside the test function loses the message and stack trace. I've demonstrated this with a `devex.test.ts`. 

I think we're better off eschewing it since awaiting a rejected promise will fail in exactly the same cases.

### * Considerations
no other
@turadg turadg removed the automerge:rebase Automatically rebase updates, then merge label Apr 22, 2025
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides some general comments about the refactoring (to help me get situated), I am very skeptical that foregoing asyncFlow in the test is the right approach. I don't see a reason we couldn't use it.

);
statusManager.forwarded(txHash, false);
}
void forwardFunds({ txHash, amount: fullValue, destination: dest });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here that forwardFunds does error handling so a void here is safe?

Copy link
Member Author

@turadg turadg Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. That's generally what void implies but it's worth being explicit, especially since it returns a vow and not a promise

Comment on lines 457 to 460
// funds remain in `settlementAccount` and must be recovered via a
// contract upgrade
log('🚨 forward transfer rejected!', reason, txHash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this part will be updated in the follow-up ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the PR that implements retry

{
currentChainReference,
supportsCctp,
log = makeTracer('FlowForwardFunds'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI a new tracer not going through the membrane will result in flow replays to produce log output again. Is that intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful catch, thanks! Perhaps we should avoid default args in flow params?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default args made from other args would be fine. I think the tell-tale sign here is that makeTracer is in the surrounding scope and is not a pure function. Logging in general can be considered exempt from the closed over rules, but in this case I don't think it should.

timeout: FORWARD_TIMEOUT.p,
},
});
log('forward retry successful for', txHash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a first time forward attempt too, no ? Actually right now without retry it can only be a first time. Why "retry" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From when the scope of the change was the retry feature instead of just a refactor. I'll revise

Comment on lines +119 to +122
// update status manager, flagging a terminal state that needs manual
// intervention or a code update to remediate
statusManager.forwardFailed(txHash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just duplication of the code in depositForBurnHandler.onRejected, but this seems like a different failed state that the others where the funds are no longer in the local chain. I assume this will require a different retry mechanism?

Copy link
Member Author

@turadg turadg Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. retry is out of scope for this PR

}
} else {
log(
'⚠️ forward not attempted',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this just a warning, this feels more fatal than the failed forwards (which will soon be retried)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the siren is for stuff that should never happen and someone sending a bad EUD is not worth waking anyone

Comment on lines +107 to +122
const forwardFunds = tx =>
flows.forwardFunds(
undefined as any,
{
log, // some tests check the log calls
currentChainReference: 'agoric-3',
supportsCctp: makeSupportsCctp(chainHub),
statusManager,
getNobleICA: () => mockAccounts.intermediate.account as any,
settlementAccount: Promise.resolve(
mockAccounts.settlement.account as any,
),
},
tx,
);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why orchestrate as created by makeOrchestrationFacade wouldn't work and be better fidelity.

There doesn't seem to be any mock endowments incompatible with async-flow, and it would avoid impedance mismatch between real objects (e.g. statusManager), and the flow's logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone is claiming it wouldn't work or be higher fidelity. Those aren't the only two criteria.

I'm trying to ship features to users. This is a test. It's testing what I need it to.

Comment on lines -1625 to +1644
asyncFuncEagerWakers: [],
asyncFuncEagerWakers: [
Object @Alleged: asyncFlow flow {},
],
asyncFuncFailures: {},
flowForOutcomeVow: {},
flowForOutcomeVow: {
'Alleged: VowInternalsKit vowV0': 'Alleged: asyncFlow flow',
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are internal state for async flow.

@turadg turadg added the automerge:rebase Automatically rebase updates, then merge label Apr 22, 2025
@mergify mergify bot force-pushed the 309-ForwardSkipped branch from 4e311a6 to 2ce7b18 Compare April 22, 2025 23:29
Copy link
Contributor

mergify bot commented Apr 22, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@turadg turadg force-pushed the 309-ForwardSkipped branch from 2ce7b18 to 3ba9563 Compare April 22, 2025 23:30
mergify bot pushed a commit that referenced this pull request Apr 22, 2025
closes: #11309
refs: #11307

## Description

This attenuator implementation is just a simplification of the revocable kit implementation in prepare-revocable.js. The revocable kit provides both attenuation and revocation, since we can have both for the price of just one level of indirection. But if you don't need revocation, the revocable kit has more API complexity than you need.

Because the attenuator just uses an exo class rather than a class kit, it cannot support the privilege separation of a distinct revoker facet. But it can still support `selfRevoke` as a separate `extraMethod`, as shown in a testcase at `prepare-attenuator.test.js`.

- [x] Understand the `attenuate` in `@agoric/internal`, and, if possible, reconcile them. Or deprecate one in favor of the other.
  - That `attenuate` is quite different. It is more of a deep pick appropriate for own data properties, or very specifically for own non-this-sensitive methods. The attenuator it creates necessarily has own properties for these deeply picked properties. Whereas, this PR creates an exo attenuator with the exposed properties on its own exo class. It is built assuming that the underlying is an exo, but it would probably work for a far object as well. The key is it makes no assumption about where in the underlying's inheritance chain the method is found. The attenuator in this PR has no support for data properties.
- [x] Similarly, understand the `prepareAttenuator` in `@agoric/internal`.
  - That `prepareAttenuator` is similar in many ways to the attenuator in this PR. However, it depends on a `Callback` concept that IMO makes it significantly harder to understand. What extra expressiveness to we get for that extra complexity? Do we use any of that extra expressiveness? I don't know. To be investigated after this PR. Attn @michaelfig 

### Security Considerations

Attenuation helps POLA, which helps security.

### Scaling Considerations

As with revocation, the cost of attenuation is one more level of indirection. So, for most purposes, none.

In particular, a concern was raised about attenuators having per-instance methods as opposed to per-class methods. As with revocation, this implementation uses only per-class methods.

### Documentation Considerations

Both revocation and attenuation need to be documented.

### Testing Considerations

Attenuation testing included, based on the existing revocation tests.

### Upgrade Considerations

Done as prepare patterns, so should be independently upgradable. Including change to what methods are included and what `extraMethods` are added.

Note that when `@agoric/base-zone` moves to endo, this attenuator should move with it. The meaning is clearly as useful for non-agoric endo users. Attn @kriskowal
@mergify mergify bot force-pushed the 309-ForwardSkipped branch from 3ba9563 to bcecfdb Compare April 23, 2025 00:00
@mergify mergify bot merged commit 6751fc3 into master Apr 23, 2025
90 checks passed
@mergify mergify bot deleted the 309-ForwardSkipped branch April 23, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge force:integration Force integration tests to run on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants