Skip to content

INDY-1199: Implement inconsistent 3PC state detection #811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 18, 2018

Conversation

skhoroshavin
Copy link
Contributor

Signed-off-by: Sergey Khoroshavin [email protected]

@skhoroshavin skhoroshavin changed the title [WIP] INDY-1199: Implement inconsistent 3PC state detection INDY-1199: Implement inconsistent 3PC state detection Jul 16, 2018
Signed-off-by: Sergey Khoroshavin <[email protected]>
plenum/config.py Outdated
@@ -310,3 +310,5 @@
# 1 for recorder
# 2 during replay
STACK_COMPANION = 0

ENABLE_NETWORK_I3PC_WATCHER = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is I3PC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent 3PC state, should I add comments or change name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_INCONSISTENCY_WATCHER_NETWORK?

def set_nodes(self, nodes: Iterable[str]):
self.nodes = set(nodes)

def _has_consensus(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Quorums class (add a new field there is needed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will use it

def _add_node(watcher: NetworkI3PCWatcher, node: str):
nodes = watcher.nodes
nodes.add(node)
watcher.set_nodes(nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call set_nodes again here? Why not watcher.nodes.add(node)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was thinking that watchdog could potentially react to change in number of nodes, that's why I'm using setter (which is also called from production code) instead of directly manipulating field. Also I'm a little bit afraid of adding _add_node/_remove_node to Watcher class because someone might "optimize" them to change nodes directly instead of going through set_nodes, which will render tests useless.

I think renaming nodes to _nodes and adding read-only property getter can make intentions more clear, what do you think?


# Restart majority group
tm = tconf.ToleratePrimaryDisconnection + waits.expectedPoolElectionTimeout(len(txnPoolNodeSet))
restart_nodes(looper, txnPoolNodeSet, majority, tconf, tdir, allPluginsPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we wait for election here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they don't finish (or at least ensure_elections_done is not satisfied with result) and test just fails before it has chance to check that inconsistent 3pc state was indeed detected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ensureElectionsDone looking only at majority of nodes will still pass, although I'm not sure whether we should call it here manually or improve restart_nodes or even ensureElectionsDone itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

after_restart_timeout=tm, start_one_by_one=False)

# Check that all nodes are still functional
sdk_ensure_pool_functional(looper, txnPoolNodeSet, sdk_wallet_client, sdk_pool_handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if election is still in progress, and we are trying to send requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requests will probably still propagate and when elections are done will be ordered. I can add explicit call to ensureElectionsDone before this if this is neccessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second look - elections are surely done here, ensureElectionsDone is called from second restart_nodes

self.connected.add(name)

def disconnect(self, name: str):
had_consensus = self._has_consensus()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we face disconnection during initial establishing of connections:

  • consider a pool of 4 nodes
  • a node connected to Node1
  • a node connected to Node2 (so f+1 now)
  • a node disconnected from Node 2 (f now) => restart???

Maybe we should do restart only if we go to <=f connection, AND we were connected to at least n-f nodes before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed that case, going to fix soon

from typing import Callable, Iterable


class NetworkI3PCWatcher:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkInconcistencyWatcher?

Sergey Khoroshavin added 6 commits July 17, 2018 19:13
@ashcherbakov ashcherbakov merged commit 83937fe into hyperledger:master Jul 18, 2018
@skhoroshavin skhoroshavin deleted the indy-1199 branch July 18, 2018 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants