Zeebe-Broker fails and dies before Hazelcast is ready #103

danshapir · 2021-04-29T14:11:21Z

Hey,

When using the zeebe-full-helm with ZeeQS and Hazelcast (internal, part of the chart, not remote), the broker is up so fast that it reaches it's final try or connecting to Hazelcast before the cluster is ready, due to that fact you need to kill the broker and when it restarts it works great. I searched for updating the timeout in the exporter, but no such feature is available :(

Any ideas?

saig0 · 2021-05-07T12:15:19Z

If I understand the helm chart correctly then it deploys a Hazelcast instance (locally) and the exporter tries to connect to it.
Is this right?

In this case, the exporter should have an option to configure the timeout when connecting to a remove Hazelcast cluster (instead of creating an In-Memory Hazelcast).

danshapir · 2021-05-29T22:53:18Z

@saig0

The issue is with the broker. The broker is up before hazelcast is, so this happens:

2021-05-29 22:50:03.706 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 2 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:04.707 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 3 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:05.707 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 4 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:06.708 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 5 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:07.709 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 6 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:08.710 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 7 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:09.711 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 8 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:10.712 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 9 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:11.713 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 10 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:12.716 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 11 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:13.717 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 12 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:14.718 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 13 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:15.719 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 14 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:16.720 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 15 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:17.721 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 16 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:18.722 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 17 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:19.723 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 18 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:20.724 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 1000 ms, attempt: 19 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:21.725 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, retry in 975 ms, attempt: 20 , cluster connect timeout: 20000 seconds , max backoff millis: 30000 2021-05-29 22:50:22.701 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] WARN com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to get live cluster connection, cluster connect timeout (20000 millis) is reached. Attempt 21. 2021-05-29 22:50:22.701 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Unable to connect to any address from the cluster with name: dev. The following addresses were tried: [] 2021-05-29 22:50:22.702 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO com.hazelcast.core.LifecycleService - hz.client_1 [dev] [4.0.3] HazelcastClient 4.0.3 (20200921 - 59ae831) is SHUTTING_DOWN 2021-05-29 22:50:22.707 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO com.hazelcast.core.LifecycleService - hz.client_1 [dev] [4.0.3] HazelcastClient 4.0.3 (20200921 - 59ae831) is SHUTDOWN 2021-05-29 22:50:22.709 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] ERROR io.zeebe.util.actor - Uncaught exception in 'Broker-0-Exporter-1' in phase 'STARTED'. Continuing with next job. java.lang.IllegalStateException: Unable to connect to any cluster.

saig0 · 2021-05-31T03:49:09Z

Yes. I guess that an additional option in the exporter to configure the timeout should solve the issue. Correct?

danshapir · 2021-05-31T04:52:37Z

Yes, it totally would!
I tried finding a way to change it through an env var without changes to the exporter, but to no avail.

saig0 · 2021-09-21T11:36:47Z

Okay. To sum it up, we want to extend the exporter by a new configuration option connectionTimeout. The timeout is used by the exporter when connecting to a remote Hazelcast instance.

It can be implemented similar to here.

saig0 added the enhancement New feature or request label May 7, 2021

saig0 mentioned this issue May 7, 2021

Simple-Monitor gives up before Hazelcast is ready camunda-community-hub/zeebe-simple-monitor#239

Closed

saig0 added hacktoberfest good first issue Good for newcomers labels Sep 21, 2021

saig0 mentioned this issue Jan 4, 2022

feat(exporter): define remote connection timeout #158

Merged

saig0 closed this as completed in #158 Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Zeebe-Broker fails and dies before Hazelcast is ready #103

Zeebe-Broker fails and dies before Hazelcast is ready #103

danshapir commented Apr 29, 2021

saig0 commented May 7, 2021

Uh oh!

danshapir commented May 29, 2021

Uh oh!

saig0 commented May 31, 2021

Uh oh!

danshapir commented May 31, 2021

Uh oh!

saig0 commented Sep 21, 2021

Uh oh!

Zeebe-Broker fails and dies before Hazelcast is ready #103

Zeebe-Broker fails and dies before Hazelcast is ready #103

Comments

danshapir commented Apr 29, 2021

saig0 commented May 7, 2021

Uh oh!

danshapir commented May 29, 2021

Uh oh!

saig0 commented May 31, 2021

Uh oh!

danshapir commented May 31, 2021

Uh oh!

saig0 commented Sep 21, 2021

Uh oh!