Globalnet gateway monitor exits if controller startup fails after leadership acquired

In a production environment with several gateway nodes, it was observed that each globalnet pod had restarted numerous times over several weeks. Each time it had just acquired leadership and was attempting to start the controllers but a transient API server error occurred, eg:

`Error starting the controllers : error="error creating the ClusterGlobalEgressIP controller: error retrieving ClusterGlobalEgressIP resource \"cluster-egress.submariner.io\": the server was unable to return a response in the time allotted, but may still be processing the request"`

This results in the process exiting with a fatal error. When leadership is lost we keep the process running so we should do the same if an error occurs starting the controllers. Most of the time leader election had also timed out and relinquished leadership so it's not necessary to exit. But if that's not the case then either explicitly relinquish leadership or try to start the controllers again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Globalnet gateway monitor exits if controller startup fails after leadership acquired #3337

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Globalnet gateway monitor exits if controller startup fails after leadership acquired #3337

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions