-
Notifications
You must be signed in to change notification settings - Fork 190
Description
In a production environment with several gateway nodes, it was observed that each globalnet pod had restarted numerous times over several weeks. Each time it had just acquired leadership and was attempting to start the controllers but a transient API server error occurred, eg:
Error starting the controllers : error="error creating the ClusterGlobalEgressIP controller: error retrieving ClusterGlobalEgressIP resource \"cluster-egress.submariner.io\": the server was unable to return a response in the time allotted, but may still be processing the request"
This results in the process exiting with a fatal error. When leadership is lost we keep the process running so we should do the same if an error occurs starting the controllers. Most of the time leader election had also timed out and relinquished leadership so it's not necessary to exit. But if that's not the case then either explicitly relinquish leadership or try to start the controllers again.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status