-
Notifications
You must be signed in to change notification settings - Fork 350
CPS-???? | Node Behavior during Low Participation #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CPS-???? | Node Behavior during Low Participation #982
Conversation
96c2a7f
to
f04475d
Compare
I'd appreciate your review @nc6, since we've discussed this in the past. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fundamentally this looks good, but I would maybe try to make it slightly more accessible by spelling out the scenarios slightly more and possibly re-ordering some of the sections
CPS-XXXX/README.md
Outdated
This limit is a disjunction of Common Prefix and Chain Growth. | ||
|
||
- *EnforceChainGrowth* (EnforceCG). | ||
In addition to AssumeIA, the node rejects any chain that has less than 2160 blocks per 36 hr (even if selecting it wouldn't require rolling back any blocks). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not directly clear to the reader (or, this reader at least!) how these immediately relate to the question above, since they don't say "what does the node do if it has a sparse chain" but rather constrain some other operations. I would suggest moving the TwoThirdsInaccessible block up above the description of the options, in order to outline the things that the node could do, and then from there lead into the possibilities and how they relate to various security arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an intro and then added some connections between these options and the intro. Please let me know if that suffices. So far, I'm hesitant to introduce the scenarios first, but maybe that'd be better.
Since the nodes and/or other Cardano community tooling might have been designed around the guarantee of at most 36 hrs for 2160 blocks, they could fail outright or become more vulnerable to possible DoS attacks during the outage. | ||
|
||
- If the whole network implemented EnforceIA, then the nodes would not face increased risks. | ||
The 2160 disjunct in the EnforceIA definition is an optimization for when the chain is growing faster than 2160 blocks per 36 hr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the way that this is written to be slightly confusing. AIUI, the behaviour here would be identical to AssumeIA, since there's no rollback involved. So we'd still have the same concerns as under AssumeIA, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference in my mind is that EnforceIA can make blocks/slots immutable as time passes, whereas AssumeIA cannot. In other words, the number of volatile slots is explicitly bounded at 36 hr in EnforceIA (ie 3*2160*20
slots) whereas it's only bounded by assumption in AssumeIA.
I didn't call it out explicitly in the text, but I only claimed the node's risk would be more obviously bounded this way --- I don't know about tooling.
CPS-XXXX/README.md
Outdated
The 2160 disjunct in the EnforceIA definition is an optimization for when the chain is growing faster than 2160 blocks per 36 hr. | ||
|
||
- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online). | ||
The silver lining is that all nodes would be able to switch to the repaired chain automatically, without needing manual intervention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, I would elaborate more on this case - would grow the chain up until the 36 hour point, when all nodes would stop growing the chain and we would need to effect disaster recovery. But since we hadn't grown more than 2160 blocks, each node would be happy to roll back and select the denser chain...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bit here. And a TODO... I haven't yet done the math to bound how many blocks might slowly arise before the rule actually kicked in.
--- | ||
CPS: XXXX | ||
Title: Node Behavior during Low Participation | ||
Status: Draft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status: Draft | |
Status: Open |
@nfrisby please note marking Draft
review status on GitHub is enough to indicate this is a draft; while the CPS itself will be Open
until considered solved. There is no Draft
status for either CIPs or CPSs.
Likewise the CIP header of #974 should be changed to Proposed
since that is most likely the way it would be reviewed (once emerging from Draft
review status) and merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you. I try to be as explicit as possible, so I like that the rendered documents says Draft for now.
I will certainly update these fields before clicking "Ready for review". Is that plan OK with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, although other editors & reviewers are likely to call attention to the same item (though hopefully they will see your stipulation) since it would be a big headache if the document were inadvertently merged this way (i.e. web parsers like cips.cardano.org
not working right & needing a further PR to clean it up). As far as I'm concerned it's in your hands. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is much more comprehensible now. I've made a couple of suggestions, and we should get a review from somebody who hasn't been involved in the discussion here.
- If the whole network implemented EnforceIA, then the nodes would not face increased risks. | ||
The 2160 disjunct in the EnforceIA definition is an optimization for when the chain is growing faster than 2160 blocks per 36 hr. | ||
|
||
- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online), since the honest nodes would eventually refuse to the newly minted blocks, since they are more than 36 hr younger than their 2161st predecessor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online), since the honest nodes would eventually refuse to the newly minted blocks, since they are more than 36 hr younger than their 2161st predecessor. | |
- If the whole network implemented EnforceCG, then the Recovery Plan would be unavoidable (once the affected stake was back online), since the honest nodes would eventually refuse to select the newly minted blocks, since they are more than 36 hr younger than their 2161st predecessor. |
- If the whole network implemented AssumeIA, then the 2161th youngest block would be at least three times older than usual (36+ hr instead of 12 hr). | ||
Since the nodes and/or other Cardano community tooling might have been designed around the guarantee of at most 36 hrs for 2160 blocks, they could fail outright or become more vulnerable to possible DoS attacks during the outage. | ||
|
||
- If the whole network implemented EnforceIA, then the nodes would not face increased risks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the whole network implemented EnforceIA, then the nodes would not face increased risks. | |
- If the whole network implemented EnforceIA, then the nodes would not face increased risks due to large numbers of slots remaining volatile. |
A CPS for the community to elicit proposals for how the node should behave when the best chain it has access to is growing much slower than Cardano's Praos security argument anticipates when full participation is assumed.
The fact that this behavior is unspecified can obstruct node design and/or may be overlooked, such that nodes and community tooling may cause a failure cascade if the network ever does suffer low participation for some extended period (eg bug, solar flare, etc).
Rendered: https://github.com/nfrisby/CIPs/blob/nfrisby/low-participation-CPS/CPS-XXXX/README.md