-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathfailure_case_playbook.txt
110 lines (78 loc) · 3.26 KB
/
failure_case_playbook.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
Testing Playbook
===============
Enter the commands as shown.
Let CN be Client N.
Let S be the server.
Let the following be a command on CN:
CN>[DD]: $command
Let the following be output to check for:
CHECK: $output
--------------
Failure Cases:
--------------
1. PN fails while it is idle. It has not sent out any prepare request or accepted any other prepare requests
Solution: Other PNs remain at “waiting” state. No other PNs are affected by the failure of a particular PN in this state. Other PNs may continue to send out proposals.
Start S,C1,C2,C3
C1>[DD]: write hello
C2>[DD]: write world
CHECK: Read all is the same
C1>[DD]: exit
CHECK: Read all is the same
2. PN fails after it has sent out a prepare request
Solution: For PNs that sent back a promise for this prepare request, it will time out waiting for an accept request. Eventually, these PNs will promise another prepare request with a higher proposal number.
Start S,C1,C2,C3
C1>[DD]: write hello
C2>[DD]: write world
CHECK: Read all is the same
C1>[DD]: kill propose
C1>[DD]: write bye
CHECK: C2 and C3 do not see 'bye' in their logs. --debug should show the
receipt of the prepare message
3. PN fails after it has sent out a prepare request, received a majority of positive responses back, and sent out an accept request.
Solution: Our system is still able to learn about the consensus value because each PN’s acceptor will send the accepted value to every PN’s learner. Each learner will count the number of acceptances it has received and it will write the value to the log once it has reached a majority
Start S,C1,C2,C3
C1>[DD]: write hello
C2>[DD]: write world
CHECK: Read all is the same
C1>[DD]: kill learn
C1>[DD]: write bye
CHECK: C2 and C3 do see 'bye' in their logs. --debug should show the
receipt of the prepare message, and the proposal.
4. The last PN leaves the network
Solution: The distributed log in our system is only maintained if there is at least one PN in the network. When the last PN leaves the network, any state of the log is lost, and the log will restart upon a new PN connects.
Start S,C1,C2,C3
C1>[DD]: write hello
C2>[DD]: write world
CHECK: Read all is the same
C1>[DD]: exit
C2>[DD]: exit
C3>[DD]: exit
Restart C1
C1>[DD]: read
CHECK: read is empty
5. A majority of PNs fail during a Paxos round
Solution: Each live PN updates its neighbours by the failed neighbours protocol. The current round ends and the next begins. Majority threshold is updated given the live PNs.
Start S,C1,C2,C3, C4
C1>[DD]: write hello
C2>[DD]: write world
CHECK: Read all is the same
C1>[DD]: break propose
C1>[DD]: write bye
CHECK: Read on all does not include 'bye'
C2>[DD]: exit
C3>[DD]: exit
C1>[DD]: continue
CHECK: C1 and C4 do see 'bye' in their logs. --debug should show the original
round failure with four nodes, and then the restart of the round with two
nodes, which reaches a majority.
6. A Node fails and reconnects when there are 3 or more peers present
Solution: If the node re-connects with the same port number, it restores the last proposed and last accepted values from its backup file and pulls a new log from the Paxos NW.
Start S,C1,C2,C3
C1>[DD]: write hello
C2>[DD]: write world
CHECK: Read all is the same
C1>[DD]: exit
CHECK: Read all is the same
Restart C1
CHECK: Read all is the same