Error: Proposal failed to bind to state

**Describe the bug**
In a 3 node CCF network, when submitting a proposal, the following error is returned.
```
{'error': {'code': 'InternalError', 'message': 'Proposal failed to bind to state.'}}
```

To debug, we sent proposals to each CCF node individually instead of going through the load balancer. The above error is observed on all the secondary nodes. Meanwhile, the primary returns a response like the following:
```
{'proposal_id': '56c1acb4247462a9201b254bdcd84958accf1bb22ac942e26d8a545fa9dffa20', 'proposer_id': 0, 'state': 'OPEN'}
```

The commit was observed to increment on all nodes from 9.22 to 9.24, so it appears that replication from the primary is functional.

Some time later, we observed a change in the primary. Sending the proposal to each node individually again uncovered the same issue; only the (new) primary could "bind state".

**To Reproduce**
Sadly, this appears to be a transient error and we're not able to reproduce consistently; it just happens occasionally after we start up the network. I'm not able to point to a possible cause yet.

For this case in particular, we were actually able to submit proposals before this, because checking `/node/network` shows the service status as `OPEN`. We may have gotten lucky with the load balancing and hit the primary though.

We are submitting the proposals using signature authentication, with the signing occurring in Azure Key Vault.

**Expected behavior**
The proposal can be submitted on the secondary nodes.

**Environment information**
CCF version: 0.18.2
Start node config:
```
consensus = cft
enclave-file = 
enclave-type = release
ledger-chunk-bytes = 104857600
ledger-dir = 
log-format-json = true
node-address = 10.240.0.112:16384
public-rpc-address = 10.240.0.112:16385
read-only-ledger-dir = 
rpc-address = 10.240.0.112:16385
snapshot-dir = 
snapshot-tx-interval = 10000

[start]
gov-script = 
network-cert-file =
member-info = 
```

Joining node config:
```
consensus = cft
enclave-file =
enclave-type = release
ledger-chunk-bytes = 104857600
ledger-dir = 
log-format-json = true
node-address = 10.240.0.9:16384
public-rpc-address = 10.240.0.9:16385
read-only-ledger-dir = 
rpc-address = 10.240.0.9:16385
snapshot-dir =
snapshot-tx-interval = 10000

[join]
network-cert-file = 
target-rpc-address = 10.240.0.112:16385
```

oe_sign.conf:
```
# Enclave settings:
Debug=0
NumHeapPages=70000
NumStackPages=1024
NumTCS=8
ProductID=1
SecurityVersion=1
```

**Additional context**
We are running the network in Kubernetes. Previously, we've gotten opaque errors where the root cause was running out of IP addresses, but that doesn't seem to be the cause here because the CCF Pods all have assigned IP addresses, and we can communicate with each individually.

The logs are not useful; everything ends with something like the following, and looks to be emitted shortly after startup rather than at the time of our debugging:
```
{"h_ts":"2021-02-26T14:48:05.969839Z","thread_id":"100","level":"fail","file":"../src/host/main.cpp","number":"765","msg":"No snapshot found: Node will request all historical transactions\n"}Azure Quote Provider: libdcap_quoteprov.so [ERROR]: Could not retreive environment variable for 'AZDCAP_DEBUG_LOG_LEVEL'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error: Proposal failed to bind to state #2247

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error: Proposal failed to bind to state #2247

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions