Skip to content

Commit 31b1a9c

Browse files
authored
Merge pull request #85 from hyperledger/master
Master
2 parents 9d6de2c + db87901 commit 31b1a9c

12 files changed

+448
-17
lines changed

README.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,34 @@
1-
# Plenum Byzantine Fault Tolerant Protocol
1+
![logo](indy-logo.png)
2+
3+
* [Plenum Byzantine Fault Tolerant Protocol](#plenum-byzantine-fault-tolerant-protocol)
4+
* [Technical Overview of Indy](#technical-overview-of-indy)
5+
* [Other Documentation](#other-documentation)
6+
* [Indy Plenum Repository Structure](#indy-plenum-repository-structure)
7+
* [Dependencies](#dependencies)
8+
* [Contact Us](#contact-us)
9+
* [How to Contribute](#how-to-contribute)
10+
* [How to Start Working with the Code](#how-to-start-working-with-the-code)
11+
* [Try Plenum Locally](#try-plenum-locally)
12+
13+
## Plenum Byzantine Fault Tolerant Protocol
214

315
Plenum is the heart of the distributed ledger technology inside Hyperledger
416
Indy. As such, it provides features somewhat similar in scope to those
517
found in Fabric. However, it is special-purposed for use in an identity
618
system, whereas Fabric is general purpose.
719

20+
## Technical Overview of Indy
21+
22+
Please find the general overview of the system in [Overview of the system](docs/main.md).
23+
24+
More documentation can be found in [docs](docs).
25+
826
## Other Documentation
927

1028
- Details about the protocol, including a great tutorial, can be found on the [wiki](https://github.com/hyperledger/indy-plenum/wiki).
1129
- Please have a look at aggregated documentation at [indy-node-documentation](https://github.com/hyperledger/indy-node/blob/master/README.md) which describes workflows and setup scripts common for both projects.
1230

31+
1332
## Indy Plenum Repository Structure
1433

1534
- plenum:
@@ -49,7 +68,7 @@ separately.
4968
- In particular, it contains BLS multi-signature crypto needed for state proofs support in Indy.
5069

5170

52-
## Contact us
71+
## Contact Us
5372

5473
- Bugs, stories, and backlog for this codebase are managed in [Hyperledger's Jira](https://jira.hyperledger.org).
5574
Use project name `INDY`.
@@ -67,7 +86,7 @@ Please have a look at [Dev Setup](https://github.com/hyperledger/indy-node/blob/
6786
It contains common setup for both indy-plenum and indy-node.
6887

6988

70-
## Installing Plenum
89+
## Try Plenum Locally
7190

7291
#### Install from pypi
7392

common/serializers/msgpack_serializer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
import msgpack
66
from common.serializers.mapping_serializer import MappingSerializer
7-
from storage.stream_serializer import StreamSerializer
7+
from common.serializers.stream_serializer import StreamSerializer
88

99

1010
def decode_to_sorted(obj):

design/Observer Architecture.jpg

75.7 KB
Loading

design/observers.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
As of now we have Validator Nodes only responsible for write and read requests as well as catching up other Nodes and clients.
2+
With a state proof support, we can create additional rings of Observer Nodes that can process read requests and reduce load on Validator nodes.
3+
4+
## Goals (in priority order)
5+
6+
- Goal1: Reduce Read Load
7+
- Reduce read load on Validators
8+
9+
- Goal2: More Nodes
10+
- Have a big number of Nodes available for read requests (we have an essential limit on the size of Validator Nodes because of how chatty RBFT is)
11+
12+
- Goal3: Reduce Catch-up Load
13+
- Reduce catch-up load on Validators
14+
15+
- Goal4: Reduce Write Load
16+
- Be able to send write requests to one Node only (either Validator or Observer)
17+
18+
- Goal5: Promotion
19+
- Be able to replace suspicious Validators to Gatekeepers (promote Gatekeepers)
20+
21+
- Goal6: Untrusted Observers
22+
- Be able to have unlimited number of Nodes with a copy of the ledger
23+
24+
- Goal7: Fast Catch-up of clients
25+
- Be able to bootstrap catch-up for clients using one Node (either Validator or Observer)
26+
27+
- Goal8: Fast Catch-up of Nodes
28+
- Be able to bootstrap catch-up for Nodes using one Node (either Validator or Observer)
29+
30+
- Goal9: Close the gates
31+
- Forbid all external client communication for Validators (allow only restricted communication with Gatekeepers and other Validators), so that
32+
- Gatekeepers ring serves as one of the stoppers for DoS/Spam attacks.
33+
- Helps to avoid attacks on Primary
34+
- Helps to reduce the number of connections for Validators
35+
- Helps to improve availability of Validators
36+
37+
- Goal10: Stand DoS from clients
38+
- The core ring(s) need to be able to stand DoS from clients.
39+
Some observers may fail, but the system in general should be working, especially Validator rings.
40+
41+
- Goal11: Stand DoS from nodes
42+
- The Validator ring needs to be able to stand DoS from other Validator Nodes and Gatekeeper rings.
43+
44+
45+
46+
## Architecture
47+
![Observer Architecture](Observer Architecture.jpg)
48+
49+
###Assumptions
50+
- We call Trusted Observers “Gatekeepers” because they eventually will become them when we “close the gate” to Validators.
51+
If we don’t want that to ever happen, then we probably should find a better name for Gatekeepers to avoid confusion (just call them Observers?)
52+
53+
- Gatekeepers are added in permissioned fashion (there is a Steward for each Gatekeeper)
54+
55+
- Gatekeepers are in sync with Validators
56+
- It means that clients (indy-sdk) do not need to have neither a separate pool ledger for each Gatekeeper nor a Pool State trie (Patricia trie).
57+
- Observers may not be fully in sync with Validators and Gatekeepers at a given moment of time (depends on the synchronization policy), although shoud be in sync eventually.
58+
There will probably be a gap between the latest Validators state and latest Observer state.
59+
- It means that clients (indy-sdk) need to have either a separate pool ledger for each Observer, or a Pool State trie (Patricia trie)
60+
- There is enough Gatekeeper Nodes to create a separate Ring
61+
62+
###Iterations
63+
It’s better to implement the long-term solution in multiple iterations (taken into account the Priority of Goals above):
64+
1. Support Gatekeepers only (that is trusted Observers which are always in sync)
65+
1. Support Observers (untrusted Observers which may not be 100% in sync)
66+
1. “Close the gate” to Validators to allow Validators be connected to Gatekeepers only
67+
68+
Note: We may change the order of Iterations (and hence the roadmap of tasks in the last section) depending on the requirements and priority of Goals
69+
70+
###Data to keep in sync on Gatekeepers and Observers
71+
- All ledgers
72+
- All states
73+
- Attributes store
74+
- BLS store (in fact the last entry only)
75+
- seqNo store
76+
77+
###Participants
78+
79+
##### Catch-up
80+
| | What ledger | Min Number of nodes to finish | Node type to finish | Bootstrapping
81+
| --- | --- | --- | --- | ---
82+
Client | POOL | F_G + 1 (or even 1 if use state proofs and timestamp)| Gatekeepers | Yes (from Observers and Gatekeepers)
83+
Observer | All | F_G + 1 (or even 1 if use state proofs and timestamp)| Gatekeepers | Yes (from Observers and Gatekeepers)
84+
Gatekeeper | All | F_V + 1 (or even 1 if use state proofs and timestamp)| Validators | Yes (from Gatekeepers and Validators)
85+
Validator | All | F_V + 1 (or even 1 if use state proofs and timestamp)| Validators | Yes (from Gatekeepers and Validators)
86+
87+
##### Read and Write
88+
| | Read Node type | Read Min Number of nodes | Write Node type | Write Min Number of nodes |
89+
| --- | --- | --- | --- | --- |
90+
Client | 1 | Observer OR Gatekeeper | 1 (eventually propagated to f_V+1 validators by Observers and Gatekeepers) | Observer OR Gatekeeper
91+
92+
##### Keep in sync
93+
94+
| | How | Node type to connect to for sync
95+
| --- | --- | ---
96+
Client | No (do catch-up for syncing); TBD in future |
97+
Observer | Catch-up + Any custom policy | Other Observers and Gatekeepers
98+
Gatekeeper | Catch-up + Receive all write Replies | Validators
99+
Validator | Catch-up | Validators
100+
101+
#### Description
102+
- Validators Ring:
103+
- Has its own Steward, and added to the Pool by a NODE txn with VALIDATOR service
104+
- Participates in consensus (write requests ordering)
105+
- Do not process any Clients nor Observers requests
106+
- Supports restricted connections to other Validators and Gatekeepers only (not to Observers)
107+
- Provides catch-up for Gatekeepers and other Validators only (neither for Clients nor other Observers)
108+
- Connected to all Gatekeepers (the same way as to Validators, but just doesn’t include them into 3PC)
109+
- Propagates all write replies to all Gatekeepers to make them in sync
110+
- Can bootstrap catch-up using other Gatekeepers or Validators and validate catch-up results using state proofs and timestamp of the latest state.
111+
- Has BFT f parameter as f_V
112+
- The number of Validators can not be too big because of RBFT chattiness problem
113+
- Gatekeepers Ring:
114+
- Has its own Steward, and added to the Pool by a NODE txn with GATEKEEPER service
115+
- Handles read requests by its own using State Proofs
116+
- Can handle write requests by propagating them to Validators and using Write State Proof
117+
- Registers to Validators to be in sync with them on every Reply
118+
- Can become a Validator
119+
- Can register Observers to keep them in sync
120+
- Can bootstrap catch-up using other Gatekeepers or Validators and validate catch-up results using state proofs and timestamp of the latest state.
121+
- Has BFT f parameter as f_G
122+
- The number of Gatekeepers can be bigger than the number of Validators since it doesn’t have RBFT chattiness problem
123+
- Observers Ring
124+
- May be added with a NODE txn with OBSERVER service, or not added to the pool ledger at all (some internal sync-up point)
125+
- Handles read requests by its own using State Proofs
126+
- Can handle write requests by propagating them to Gatekeepers and using Write State Proof
127+
- Registers to Gatekeepers with some synchronization policy to be in sync
128+
- Can not become a Validator
129+
- Can register Observers to keep them in sync => a net of Observers
130+
- Can bootstrap catch-up using other Observers or Gatekeeper’s states and validate catch-up results using state proofs and timestamp of the latest state.
131+
- No restrictions on a number of Observers
132+
- Clients
133+
- Can connect to Observers or Gatekeepers only
134+
- It’s sufficient to connect to one Observer or Gatekeeper only to send read and write Requests because of State Proofs
135+
- Can bootstrap catch-up using Gatekeeper or Observer states
136+
- Need to finish catch-up (check merkle tree root) using Gatekeepers Rings
137+
138+
##Roadmap
139+
140+
Note: the roadmap is provided according to the priority of goals above. If priority needs to be changed (for example, we would like to start with Untrusted/unsynced Observers), then the set of tasks will be the same, just the order will change.
141+
142+
###Indy-Node
143+
- Goals 1-3: \
144+
Reduce Read Load, \
145+
More Nodes,\
146+
Reduce Catch-up load
147+
1. Support GATEKEEPER service in NODE txn, define auth rule on who can create gatekeepers Nodes. No special behaviour for Gatekeepers yet
148+
1. Separate pool to Validators and gatekeepers with N_V, f_V and N_G, f_G and make them connect to each other
149+
1. Support adding abstract Observers to Nodes and possibility of a custom policy for syncing (no real policies yet)
150+
1. Register all Gatekeeper Nodes as Observers with a default Policy to send each write Reply to all Gatekeepers
151+
1. Restrict Gatekeepers to process read requests only and exclude them from consensus. Gatekeepers reject write requests at this point
152+
1. Support processing observed data by Gatekeepers (or Observers in general)
153+
1. Catch-up BLS store by all Nodes (in fact the latest BLS store is needed)
154+
1. Catch-up seqNo store by all Nodes
155+
- Goal4: Reduce Write load
156+
1. Make sure that a valid Reply with Audit Proof is sent for already processed reqId
157+
1. Support receiving of Write Requests by Gatekeepers and propagate them to Validators with processing Replies back
158+
- Goal5: Promotion
159+
1. Define auth rules on who can change Node’s service (promotion to Validators)
160+
1. Re-calculate N_V and f_V parameters when Gatekeeper Node is promoted to Validator
161+
- Goal6: Untrusted Observers
162+
1. Support OBSERVER service in NODE txn, define auth rule on who can create Observer Nodes. Make Observers and Gatekeepers equal for now.
163+
1. Be able to register any Observers with a custom policy
164+
- Define some policies for observing and implement syncing the state of Observers (so, the only difference between Observer and Gatekeeper here is that Observers can use any custom policy)
165+
1. Support periodic updates of BLS sigs
166+
- Goal7: Fast Catch-up of clients
167+
1. Support BLS multi-signature for Pool and Config ledgers (that is for all ledgers)
168+
1. Create a new genesis txn file where each genesis pool txns has BLS key
169+
1. Keep the history of BLS multisig for Pool Ledger
170+
1. Return BLS multi-sig during catch-up of each txn in Pool Ledger, so that it’s possible to catch-up from one Node only and verify result using previous pool states
171+
- Goal8: Fast Catch-up of Nodes
172+
1. Support verification of Pool ledger based on BLS multi-sg history for each txn
173+
1. Support bootstrapping of catch-up of Validators and Gatekeepers from other Validators and gatekeepers
174+
- take ledger from one Node and use BLS multi-sig for validation
175+
- Catch-up needs to finished by checking that the received from one node state matches the state on at least f_V+1 other Nodes (check merkle tree roots)
176+
1. Support bootstrapping of catch-up of Observers from other Observers and gatekeepers
177+
1. [Optionally] Support bittorent-like catch-up bootstrapping
178+
1. [Optionally] We may probably need to support Rocksdb for better propagation of parts of ledger and recovery
179+
- Goal9: Close the gate
180+
1. Do not allow any Observers or clients connections by Validators
181+
1. Connect Observers to Gatekeepers only
182+
1. Propagate write requests on Observers to Gatekeepers
183+
184+
- Goal10: Stand DoS from clients
185+
1. Load balancing for Observer nodes (Nginx?)
186+
1. Change client-to-node communication from ZMQ to something else (http+authcrypt?)
187+
188+
- Goal11: Stand DoS from nodes
189+
1. Blacklist malicious nodes
190+
1. Make communication and consensus levels separate in nodes
191+
192+
193+
###Indy-sdk
194+
- Goals 1,2,5: \
195+
Read Load, \
196+
More Nodes, \
197+
Promotion
198+
1. Support GATEKEEPER service in NODE txn on client side
199+
1. Separate pool to Validators and Gatekeepers with N_V, f_V and N_O, f_O
200+
1. Send read requests to Gatekeepers Nodes
201+
- Connect to one Gatekeeper Node only (explicitly or randomly) and work with this Node for read requests
202+
- If Read request to a Gatekeeper Node fails for some reasons (timeout, invalid state proof, etc.), then resend the request to another Gatekeeper Node
203+
- If there is no more Gatekeeper Nodes to send read request to, then send it to Validators
204+
- Support fallback to f+1
205+
- Note: neither a separate pool ledger for each Gatekeeper nor pool state trie is needed on client here because we assume that Gatekeepers are always in sync with Validators
206+
- Goal3: Reduce Catch-up load
207+
1. Support catch-up based on Gatekeepers (with a fallback to catch-up on Validators)
208+
1. [Optional] Validate catch-up correctness sending LEDGER_STATUS to Validators
209+
210+
- Goal4: Reduce Write Load
211+
1. Verify audit proof on write requests
212+
1. Send write requests to Gatekeepers Nodes
213+
- Can send both reads and writes to the same Gatekeeper Node with the same fallback rules
214+
- It’s essential that re-sending must be done with the same reqId
215+
216+
- Goal6: Untrusted Observer
217+
1. Support specifying any Observer Node for read and write (not necessary Gatekeeper). Fallback to Gatekeepers, and only then to Validators.
218+
1. Support a gap in Observer’s state
219+
- Option1: have a separate pool ledger for each Observer and do a catch-up for each Observer
220+
- Option2: Patricia trie for Pool state on clients
221+
222+
- Goals 7,8: Fast Catch-up of clients and Nodes
223+
1. Support bootstrapping of pool ledger from one Node only (either Observer, Gatekeeper or Validator)
224+
1. Validate bootstrapped ledger by provided BLS sigs.
225+
1. Finish catch-up by validating the correctness of ledger from f_G + 1 (or f_V+1) nodes.
226+
227+
- Goals 9: “Close the gate”
228+
1. Forbid any connections to Validators from sdk.

indy-logo.png

24.6 KB
Loading

plenum/server/node.py

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1088,6 +1088,7 @@ def onConnsChanged(self, joined: Set[str], left: Set[str]):
10881088
- Check protocol instances. See `checkInstances()`
10891089
10901090
"""
1091+
_prev_status = self.status
10911092
if self.isGoing():
10921093
if self.connectedNodeCount == self.totalNodes:
10931094
self.status = Status.started
@@ -1103,6 +1104,15 @@ def onConnsChanged(self, joined: Set[str], left: Set[str]):
11031104
logger.info(
11041105
'{} lost connection to primary of master'.format(self))
11051106
self.lost_master_primary()
1107+
elif _prev_status == Status.starting and self.status == Status.started_hungry \
1108+
and self.lost_primary_at is not None \
1109+
and self.master_primary_name is not None:
1110+
"""
1111+
Such situation may occur if the pool has come back to reachable consensus but
1112+
primary is still disconnected, so view change proposal makes sense now.
1113+
"""
1114+
self._schedule_view_change()
1115+
11061116
if self.isReady():
11071117
self.checkInstances()
11081118
for node in joined:
@@ -1312,8 +1322,10 @@ def service_replicas_outbox(self, limit: int=None) -> int:
13121322
*reqKey,
13131323
self.reasonForClientFromException(
13141324
message.reason))
1315-
self.transmitToClient(reject, self.requestSender[reqKey])
1316-
self.doneProcessingReq(*reqKey)
1325+
# TODO: What the case when reqKey will be not in requestSender dict
1326+
if reqKey in self.requestSender:
1327+
self.transmitToClient(reject, self.requestSender[reqKey])
1328+
self.doneProcessingReq(*reqKey)
13171329
elif isinstance(message, Exception):
13181330
self.processEscalatedException(message)
13191331
else:
@@ -2326,19 +2338,20 @@ def propose_view_change(self):
23262338
"".format(self))
23272339
self.view_changer.on_primary_loss()
23282340

2341+
def _schedule_view_change(self):
2342+
logger.debug('{} scheduling a view change in {} sec'.
2343+
format(self, self.config.ToleratePrimaryDisconnection))
2344+
self._schedule(self.propose_view_change,
2345+
self.config.ToleratePrimaryDisconnection)
2346+
23292347
# TODO: consider moving this to pool manager
23302348
def lost_master_primary(self):
23312349
"""
23322350
Schedule an primary connection check which in turn can send a view
23332351
change message
2334-
:return: whether view change started
23352352
"""
23362353
self.lost_primary_at = time.perf_counter()
2337-
2338-
logger.debug('{} scheduling a view change in {} sec'.
2339-
format(self, self.config.ToleratePrimaryDisconnection))
2340-
self._schedule(self.propose_view_change,
2341-
self.config.ToleratePrimaryDisconnection)
2354+
self._schedule_view_change()
23422355

23432356
def select_primaries(self, nodeReg: Dict[str, HA]=None):
23442357
for instance_id, replica in enumerate(self.replicas):

0 commit comments

Comments
 (0)