|
| 1 | +As of now we have Validator Nodes only responsible for write and read requests as well as catching up other Nodes and clients. |
| 2 | +With a state proof support, we can create additional rings of Observer Nodes that can process read requests and reduce load on Validator nodes. |
| 3 | + |
| 4 | +## Goals (in priority order) |
| 5 | + |
| 6 | +- Goal1: Reduce Read Load |
| 7 | + - Reduce read load on Validators |
| 8 | + |
| 9 | +- Goal2: More Nodes |
| 10 | + - Have a big number of Nodes available for read requests (we have an essential limit on the size of Validator Nodes because of how chatty RBFT is) |
| 11 | + |
| 12 | +- Goal3: Reduce Catch-up Load |
| 13 | + - Reduce catch-up load on Validators |
| 14 | + |
| 15 | +- Goal4: Reduce Write Load |
| 16 | + - Be able to send write requests to one Node only (either Validator or Observer) |
| 17 | + |
| 18 | +- Goal5: Promotion |
| 19 | + - Be able to replace suspicious Validators to Gatekeepers (promote Gatekeepers) |
| 20 | + |
| 21 | +- Goal6: Untrusted Observers |
| 22 | + - Be able to have unlimited number of Nodes with a copy of the ledger |
| 23 | + |
| 24 | +- Goal7: Fast Catch-up of clients |
| 25 | + - Be able to bootstrap catch-up for clients using one Node (either Validator or Observer) |
| 26 | + |
| 27 | +- Goal8: Fast Catch-up of Nodes |
| 28 | + - Be able to bootstrap catch-up for Nodes using one Node (either Validator or Observer) |
| 29 | + |
| 30 | +- Goal9: Close the gates |
| 31 | + - Forbid all external client communication for Validators (allow only restricted communication with Gatekeepers and other Validators), so that |
| 32 | + - Gatekeepers ring serves as one of the stoppers for DoS/Spam attacks. |
| 33 | + - Helps to avoid attacks on Primary |
| 34 | + - Helps to reduce the number of connections for Validators |
| 35 | + - Helps to improve availability of Validators |
| 36 | + |
| 37 | +- Goal10: Stand DoS from clients |
| 38 | + - The core ring(s) need to be able to stand DoS from clients. |
| 39 | + Some observers may fail, but the system in general should be working, especially Validator rings. |
| 40 | + |
| 41 | +- Goal11: Stand DoS from nodes |
| 42 | + - The Validator ring needs to be able to stand DoS from other Validator Nodes and Gatekeeper rings. |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +## Architecture |
| 47 | + |
| 48 | + |
| 49 | +###Assumptions |
| 50 | +- We call Trusted Observers “Gatekeepers” because they eventually will become them when we “close the gate” to Validators. |
| 51 | +If we don’t want that to ever happen, then we probably should find a better name for Gatekeepers to avoid confusion (just call them Observers?) |
| 52 | + |
| 53 | +- Gatekeepers are added in permissioned fashion (there is a Steward for each Gatekeeper) |
| 54 | + |
| 55 | +- Gatekeepers are in sync with Validators |
| 56 | + - It means that clients (indy-sdk) do not need to have neither a separate pool ledger for each Gatekeeper nor a Pool State trie (Patricia trie). |
| 57 | +- Observers may not be fully in sync with Validators and Gatekeepers at a given moment of time (depends on the synchronization policy), although shoud be in sync eventually. |
| 58 | + There will probably be a gap between the latest Validators state and latest Observer state. |
| 59 | + - It means that clients (indy-sdk) need to have either a separate pool ledger for each Observer, or a Pool State trie (Patricia trie) |
| 60 | +- There is enough Gatekeeper Nodes to create a separate Ring |
| 61 | + |
| 62 | +###Iterations |
| 63 | +It’s better to implement the long-term solution in multiple iterations (taken into account the Priority of Goals above): |
| 64 | +1. Support Gatekeepers only (that is trusted Observers which are always in sync) |
| 65 | +1. Support Observers (untrusted Observers which may not be 100% in sync) |
| 66 | +1. “Close the gate” to Validators to allow Validators be connected to Gatekeepers only |
| 67 | + |
| 68 | +Note: We may change the order of Iterations (and hence the roadmap of tasks in the last section) depending on the requirements and priority of Goals |
| 69 | + |
| 70 | +###Data to keep in sync on Gatekeepers and Observers |
| 71 | +- All ledgers |
| 72 | +- All states |
| 73 | +- Attributes store |
| 74 | +- BLS store (in fact the last entry only) |
| 75 | +- seqNo store |
| 76 | + |
| 77 | +###Participants |
| 78 | + |
| 79 | +##### Catch-up |
| 80 | +| | What ledger | Min Number of nodes to finish | Node type to finish | Bootstrapping |
| 81 | +| --- | --- | --- | --- | --- |
| 82 | +Client | POOL | F_G + 1 (or even 1 if use state proofs and timestamp)| Gatekeepers | Yes (from Observers and Gatekeepers) |
| 83 | +Observer | All | F_G + 1 (or even 1 if use state proofs and timestamp)| Gatekeepers | Yes (from Observers and Gatekeepers) |
| 84 | +Gatekeeper | All | F_V + 1 (or even 1 if use state proofs and timestamp)| Validators | Yes (from Gatekeepers and Validators) |
| 85 | +Validator | All | F_V + 1 (or even 1 if use state proofs and timestamp)| Validators | Yes (from Gatekeepers and Validators) |
| 86 | + |
| 87 | +##### Read and Write |
| 88 | +| | Read Node type | Read Min Number of nodes | Write Node type | Write Min Number of nodes | |
| 89 | +| --- | --- | --- | --- | --- | |
| 90 | +Client | 1 | Observer OR Gatekeeper | 1 (eventually propagated to f_V+1 validators by Observers and Gatekeepers) | Observer OR Gatekeeper |
| 91 | + |
| 92 | +##### Keep in sync |
| 93 | + |
| 94 | +| | How | Node type to connect to for sync |
| 95 | +| --- | --- | --- |
| 96 | +Client | No (do catch-up for syncing); TBD in future | |
| 97 | +Observer | Catch-up + Any custom policy | Other Observers and Gatekeepers |
| 98 | +Gatekeeper | Catch-up + Receive all write Replies | Validators |
| 99 | +Validator | Catch-up | Validators |
| 100 | + |
| 101 | +#### Description |
| 102 | +- Validators Ring: |
| 103 | + - Has its own Steward, and added to the Pool by a NODE txn with VALIDATOR service |
| 104 | + - Participates in consensus (write requests ordering) |
| 105 | + - Do not process any Clients nor Observers requests |
| 106 | + - Supports restricted connections to other Validators and Gatekeepers only (not to Observers) |
| 107 | + - Provides catch-up for Gatekeepers and other Validators only (neither for Clients nor other Observers) |
| 108 | + - Connected to all Gatekeepers (the same way as to Validators, but just doesn’t include them into 3PC) |
| 109 | + - Propagates all write replies to all Gatekeepers to make them in sync |
| 110 | + - Can bootstrap catch-up using other Gatekeepers or Validators and validate catch-up results using state proofs and timestamp of the latest state. |
| 111 | + - Has BFT f parameter as f_V |
| 112 | + - The number of Validators can not be too big because of RBFT chattiness problem |
| 113 | +- Gatekeepers Ring: |
| 114 | + - Has its own Steward, and added to the Pool by a NODE txn with GATEKEEPER service |
| 115 | + - Handles read requests by its own using State Proofs |
| 116 | + - Can handle write requests by propagating them to Validators and using Write State Proof |
| 117 | + - Registers to Validators to be in sync with them on every Reply |
| 118 | + - Can become a Validator |
| 119 | + - Can register Observers to keep them in sync |
| 120 | + - Can bootstrap catch-up using other Gatekeepers or Validators and validate catch-up results using state proofs and timestamp of the latest state. |
| 121 | + - Has BFT f parameter as f_G |
| 122 | + - The number of Gatekeepers can be bigger than the number of Validators since it doesn’t have RBFT chattiness problem |
| 123 | +- Observers Ring |
| 124 | + - May be added with a NODE txn with OBSERVER service, or not added to the pool ledger at all (some internal sync-up point) |
| 125 | + - Handles read requests by its own using State Proofs |
| 126 | + - Can handle write requests by propagating them to Gatekeepers and using Write State Proof |
| 127 | + - Registers to Gatekeepers with some synchronization policy to be in sync |
| 128 | + - Can not become a Validator |
| 129 | + - Can register Observers to keep them in sync => a net of Observers |
| 130 | + - Can bootstrap catch-up using other Observers or Gatekeeper’s states and validate catch-up results using state proofs and timestamp of the latest state. |
| 131 | + - No restrictions on a number of Observers |
| 132 | +- Clients |
| 133 | + - Can connect to Observers or Gatekeepers only |
| 134 | + - It’s sufficient to connect to one Observer or Gatekeeper only to send read and write Requests because of State Proofs |
| 135 | + - Can bootstrap catch-up using Gatekeeper or Observer states |
| 136 | + - Need to finish catch-up (check merkle tree root) using Gatekeepers Rings |
| 137 | + |
| 138 | +##Roadmap |
| 139 | + |
| 140 | +Note: the roadmap is provided according to the priority of goals above. If priority needs to be changed (for example, we would like to start with Untrusted/unsynced Observers), then the set of tasks will be the same, just the order will change. |
| 141 | + |
| 142 | +###Indy-Node |
| 143 | +- Goals 1-3: \ |
| 144 | +Reduce Read Load, \ |
| 145 | +More Nodes,\ |
| 146 | +Reduce Catch-up load |
| 147 | + 1. Support GATEKEEPER service in NODE txn, define auth rule on who can create gatekeepers Nodes. No special behaviour for Gatekeepers yet |
| 148 | + 1. Separate pool to Validators and gatekeepers with N_V, f_V and N_G, f_G and make them connect to each other |
| 149 | + 1. Support adding abstract Observers to Nodes and possibility of a custom policy for syncing (no real policies yet) |
| 150 | + 1. Register all Gatekeeper Nodes as Observers with a default Policy to send each write Reply to all Gatekeepers |
| 151 | + 1. Restrict Gatekeepers to process read requests only and exclude them from consensus. Gatekeepers reject write requests at this point |
| 152 | + 1. Support processing observed data by Gatekeepers (or Observers in general) |
| 153 | + 1. Catch-up BLS store by all Nodes (in fact the latest BLS store is needed) |
| 154 | + 1. Catch-up seqNo store by all Nodes |
| 155 | +- Goal4: Reduce Write load |
| 156 | + 1. Make sure that a valid Reply with Audit Proof is sent for already processed reqId |
| 157 | + 1. Support receiving of Write Requests by Gatekeepers and propagate them to Validators with processing Replies back |
| 158 | +- Goal5: Promotion |
| 159 | + 1. Define auth rules on who can change Node’s service (promotion to Validators) |
| 160 | + 1. Re-calculate N_V and f_V parameters when Gatekeeper Node is promoted to Validator |
| 161 | +- Goal6: Untrusted Observers |
| 162 | + 1. Support OBSERVER service in NODE txn, define auth rule on who can create Observer Nodes. Make Observers and Gatekeepers equal for now. |
| 163 | + 1. Be able to register any Observers with a custom policy |
| 164 | + - Define some policies for observing and implement syncing the state of Observers (so, the only difference between Observer and Gatekeeper here is that Observers can use any custom policy) |
| 165 | + 1. Support periodic updates of BLS sigs |
| 166 | +- Goal7: Fast Catch-up of clients |
| 167 | + 1. Support BLS multi-signature for Pool and Config ledgers (that is for all ledgers) |
| 168 | + 1. Create a new genesis txn file where each genesis pool txns has BLS key |
| 169 | + 1. Keep the history of BLS multisig for Pool Ledger |
| 170 | + 1. Return BLS multi-sig during catch-up of each txn in Pool Ledger, so that it’s possible to catch-up from one Node only and verify result using previous pool states |
| 171 | +- Goal8: Fast Catch-up of Nodes |
| 172 | + 1. Support verification of Pool ledger based on BLS multi-sg history for each txn |
| 173 | + 1. Support bootstrapping of catch-up of Validators and Gatekeepers from other Validators and gatekeepers |
| 174 | + - take ledger from one Node and use BLS multi-sig for validation |
| 175 | + - Catch-up needs to finished by checking that the received from one node state matches the state on at least f_V+1 other Nodes (check merkle tree roots) |
| 176 | + 1. Support bootstrapping of catch-up of Observers from other Observers and gatekeepers |
| 177 | + 1. [Optionally] Support bittorent-like catch-up bootstrapping |
| 178 | + 1. [Optionally] We may probably need to support Rocksdb for better propagation of parts of ledger and recovery |
| 179 | +- Goal9: Close the gate |
| 180 | + 1. Do not allow any Observers or clients connections by Validators |
| 181 | + 1. Connect Observers to Gatekeepers only |
| 182 | + 1. Propagate write requests on Observers to Gatekeepers |
| 183 | + |
| 184 | +- Goal10: Stand DoS from clients |
| 185 | + 1. Load balancing for Observer nodes (Nginx?) |
| 186 | + 1. Change client-to-node communication from ZMQ to something else (http+authcrypt?) |
| 187 | + |
| 188 | +- Goal11: Stand DoS from nodes |
| 189 | + 1. Blacklist malicious nodes |
| 190 | + 1. Make communication and consensus levels separate in nodes |
| 191 | + |
| 192 | + |
| 193 | +###Indy-sdk |
| 194 | +- Goals 1,2,5: \ |
| 195 | + Read Load, \ |
| 196 | + More Nodes, \ |
| 197 | + Promotion |
| 198 | + 1. Support GATEKEEPER service in NODE txn on client side |
| 199 | + 1. Separate pool to Validators and Gatekeepers with N_V, f_V and N_O, f_O |
| 200 | + 1. Send read requests to Gatekeepers Nodes |
| 201 | + - Connect to one Gatekeeper Node only (explicitly or randomly) and work with this Node for read requests |
| 202 | + - If Read request to a Gatekeeper Node fails for some reasons (timeout, invalid state proof, etc.), then resend the request to another Gatekeeper Node |
| 203 | + - If there is no more Gatekeeper Nodes to send read request to, then send it to Validators |
| 204 | + - Support fallback to f+1 |
| 205 | + - Note: neither a separate pool ledger for each Gatekeeper nor pool state trie is needed on client here because we assume that Gatekeepers are always in sync with Validators |
| 206 | +- Goal3: Reduce Catch-up load |
| 207 | + 1. Support catch-up based on Gatekeepers (with a fallback to catch-up on Validators) |
| 208 | + 1. [Optional] Validate catch-up correctness sending LEDGER_STATUS to Validators |
| 209 | + |
| 210 | +- Goal4: Reduce Write Load |
| 211 | + 1. Verify audit proof on write requests |
| 212 | + 1. Send write requests to Gatekeepers Nodes |
| 213 | + - Can send both reads and writes to the same Gatekeeper Node with the same fallback rules |
| 214 | + - It’s essential that re-sending must be done with the same reqId |
| 215 | + |
| 216 | +- Goal6: Untrusted Observer |
| 217 | + 1. Support specifying any Observer Node for read and write (not necessary Gatekeeper). Fallback to Gatekeepers, and only then to Validators. |
| 218 | + 1. Support a gap in Observer’s state |
| 219 | + - Option1: have a separate pool ledger for each Observer and do a catch-up for each Observer |
| 220 | + - Option2: Patricia trie for Pool state on clients |
| 221 | + |
| 222 | +- Goals 7,8: Fast Catch-up of clients and Nodes |
| 223 | + 1. Support bootstrapping of pool ledger from one Node only (either Observer, Gatekeeper or Validator) |
| 224 | + 1. Validate bootstrapped ledger by provided BLS sigs. |
| 225 | + 1. Finish catch-up by validating the correctness of ledger from f_G + 1 (or f_V+1) nodes. |
| 226 | + |
| 227 | +- Goals 9: “Close the gate” |
| 228 | + 1. Forbid any connections to Validators from sdk. |
0 commit comments