-
Notifications
You must be signed in to change notification settings - Fork 377
INDY-1199: Implement inconsistent 3PC state detection #811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
plenum/config.py
Outdated
@@ -310,3 +310,5 @@ | |||
# 1 for recorder | |||
# 2 during replay | |||
STACK_COMPANION = 0 | |||
|
|||
ENABLE_NETWORK_I3PC_WATCHER = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is I3PC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent 3PC state, should I add comments or change name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENABLE_INCONSISTENCY_WATCHER_NETWORK
?
plenum/server/i3pc_watchers.py
Outdated
def set_nodes(self, nodes: Iterable[str]): | ||
self.nodes = set(nodes) | ||
|
||
def _has_consensus(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Quorums
class (add a new field there is needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will use it
def _add_node(watcher: NetworkI3PCWatcher, node: str): | ||
nodes = watcher.nodes | ||
nodes.add(node) | ||
watcher.set_nodes(nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we call set_nodes
again here? Why not watcher.nodes.add(node)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was thinking that watchdog could potentially react to change in number of nodes, that's why I'm using setter (which is also called from production code) instead of directly manipulating field. Also I'm a little bit afraid of adding _add_node/_remove_node to Watcher
class because someone might "optimize" them to change nodes directly instead of going through set_nodes
, which will render tests useless.
I think renaming nodes
to _nodes
and adding read-only property getter can make intentions more clear, what do you think?
|
||
# Restart majority group | ||
tm = tconf.ToleratePrimaryDisconnection + waits.expectedPoolElectionTimeout(len(txnPoolNodeSet)) | ||
restart_nodes(looper, txnPoolNodeSet, majority, tconf, tdir, allPluginsPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we wait for election here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they don't finish (or at least ensure_elections_done
is not satisfied with result) and test just fails before it has chance to check that inconsistent 3pc state was indeed detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably ensureElectionsDone
looking only at majority of nodes will still pass, although I'm not sure whether we should call it here manually or improve restart_nodes
or even ensureElectionsDone
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
after_restart_timeout=tm, start_one_by_one=False) | ||
|
||
# Check that all nodes are still functional | ||
sdk_ensure_pool_functional(looper, txnPoolNodeSet, sdk_wallet_client, sdk_pool_handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if election is still in progress, and we are trying to send requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests will probably still propagate and when elections are done will be ordered. I can add explicit call to ensureElectionsDone
before this if this is neccessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the second look - elections are surely done here, ensureElectionsDone
is called from second restart_nodes
plenum/server/i3pc_watchers.py
Outdated
self.connected.add(name) | ||
|
||
def disconnect(self, name: str): | ||
had_consensus = self._has_consensus() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we face disconnection during initial establishing of connections:
- consider a pool of 4 nodes
- a node connected to Node1
- a node connected to Node2 (so f+1 now)
- a node disconnected from Node 2 (f now) => restart???
Maybe we should do restart only if we go to <=f connection, AND we were connected to at least n-f nodes before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I missed that case, going to fix soon
plenum/server/i3pc_watchers.py
Outdated
from typing import Callable, Iterable | ||
|
||
|
||
class NetworkI3PCWatcher: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NetworkInconcistencyWatcher?
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin <[email protected]>
Signed-off-by: Sergey Khoroshavin [email protected]