Skip to content

CPS-???? | Node Behavior during Low Participation #982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nfrisby
Copy link

@nfrisby nfrisby commented Feb 4, 2025

A CPS for the community to elicit proposals for how the node should behave when the best chain it has access to is growing much slower than Cardano's Praos security argument anticipates when full participation is assumed.

The fact that this behavior is unspecified can obstruct node design and/or may be overlooked, such that nodes and community tooling may cause a failure cascade if the network ever does suffer low participation for some extended period (eg bug, solar flare, etc).

Rendered: https://github.com/nfrisby/CIPs/blob/nfrisby/low-participation-CPS/CPS-XXXX/README.md

@nfrisby nfrisby force-pushed the nfrisby/low-participation-CPS branch from 96c2a7f to f04475d Compare February 4, 2025 16:09
@nfrisby
Copy link
Author

nfrisby commented Feb 4, 2025

I'd appreciate your review @nc6, since we've discussed this in the past.

Copy link
Contributor

@nc6 nc6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fundamentally this looks good, but I would maybe try to make it slightly more accessible by spelling out the scenarios slightly more and possibly re-ordering some of the sections

This limit is a disjunction of Common Prefix and Chain Growth.

- *EnforceChainGrowth* (EnforceCG).
In addition to AssumeIA, the node rejects any chain that has less than 2160 blocks per 36 hr (even if selecting it wouldn't require rolling back any blocks).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not directly clear to the reader (or, this reader at least!) how these immediately relate to the question above, since they don't say "what does the node do if it has a sparse chain" but rather constrain some other operations. I would suggest moving the TwoThirdsInaccessible block up above the description of the options, in order to outline the things that the node could do, and then from there lead into the possibilities and how they relate to various security arguments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an intro and then added some connections between these options and the intro. Please let me know if that suffices. So far, I'm hesitant to introduce the scenarios first, but maybe that'd be better.

Since the nodes and/or other Cardano community tooling might have been designed around the guarantee of at most 36 hrs for 2160 blocks, they could fail outright or become more vulnerable to possible DoS attacks during the outage.

- If the whole network implemented EnforceIA, then the nodes would not face increased risks.
The 2160 disjunct in the EnforceIA definition is an optimization for when the chain is growing faster than 2160 blocks per 36 hr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the way that this is written to be slightly confusing. AIUI, the behaviour here would be identical to AssumeIA, since there's no rollback involved. So we'd still have the same concerns as under AssumeIA, right?

Copy link
Author

@nfrisby nfrisby Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference in my mind is that EnforceIA can make blocks/slots immutable as time passes, whereas AssumeIA cannot. In other words, the number of volatile slots is explicitly bounded at 36 hr in EnforceIA (ie 3*2160*20 slots) whereas it's only bounded by assumption in AssumeIA.

I didn't call it out explicitly in the text, but I only claimed the node's risk would be more obviously bounded this way --- I don't know about tooling.

The 2160 disjunct in the EnforceIA definition is an optimization for when the chain is growing faster than 2160 blocks per 36 hr.

- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online).
The silver lining is that all nodes would be able to switch to the repaired chain automatically, without needing manual intervention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I would elaborate more on this case - would grow the chain up until the 36 hour point, when all nodes would stop growing the chain and we would need to effect disaster recovery. But since we hadn't grown more than 2160 blocks, each node would be happy to roll back and select the denser chain...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit here. And a TODO... I haven't yet done the math to bound how many blocks might slowly arise before the rule actually kicked in.

---
CPS: XXXX
Title: Node Behavior during Low Participation
Status: Draft
Copy link
Collaborator

@rphair rphair Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Status: Draft
Status: Open

@nfrisby please note marking Draft review status on GitHub is enough to indicate this is a draft; while the CPS itself will be Open until considered solved. There is no Draft status for either CIPs or CPSs.

Likewise the CIP header of #974 should be changed to Proposed since that is most likely the way it would be reviewed (once emerging from Draft review status) and merged.

Copy link
Author

@nfrisby nfrisby Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you. I try to be as explicit as possible, so I like that the rendered documents says Draft for now.

I will certainly update these fields before clicking "Ready for review". Is that plan OK with you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, although other editors & reviewers are likely to call attention to the same item (though hopefully they will see your stipulation) since it would be a big headache if the document were inadvertently merged this way (i.e. web parsers like cips.cardano.org not working right & needing a further PR to clean it up). As far as I'm concerned it's in your hands. 🙏

@rphair rphair changed the title CPS-XXXX | Node Behavior during Low Participation, first draft CPS-???? | Node Behavior during Low Participation Feb 5, 2025
Copy link
Contributor

@nc6 nc6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is much more comprehensible now. I've made a couple of suggestions, and we should get a review from somebody who hasn't been involved in the discussion here.

- If the whole network implemented EnforceIA, then the nodes would not face increased risks.
The 2160 disjunct in the EnforceIA definition is an optimization for when the chain is growing faster than 2160 blocks per 36 hr.

- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online), since the honest nodes would eventually refuse to the newly minted blocks, since they are more than 36 hr younger than their 2161st predecessor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online), since the honest nodes would eventually refuse to the newly minted blocks, since they are more than 36 hr younger than their 2161st predecessor.
- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online), since the honest nodes would eventually refuse to select the newly minted blocks, since they are more than 36 hr younger than their 2161st predecessor.

- If the whole network implemented AssumeIA, then the 2161th youngest block would be at least three times older than usual (36+ hr instead of 12 hr).
Since the nodes and/or other Cardano community tooling might have been designed around the guarantee of at most 36 hrs for 2160 blocks, they could fail outright or become more vulnerable to possible DoS attacks during the outage.

- If the whole network implemented EnforceIA, then the nodes would not face increased risks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the whole network implemented EnforceIA, then the nodes would not face increased risks.
- If the whole network implemented EnforceIA, then the nodes would not face increased risks due to large numbers of slots remaining volatile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants