Skip to content

Load Balance HealthCheck Fail when Making Changes to HA Proxy Setting #9582

Open
@btzq

Description

@btzq
ISSUE TYPE
  • Bug Report
COMPONENT NAME
Virtual Router
CLOUDSTACK VERSION
4.19.1.1
CONFIGURATION
OS / ENVIRONMENT
SUMMARY

We have 50 Autoscale Groups within the same Virtual Router.

The Default Setting for VR Load Balancer is where:

  • Timeout Client = 50 Seconds
  • Timeout Server = 50 Seconds

We have a use case where lots of our customers have very long lived TCP Connections. So we went into the VR, and made the following changes:

  • Timeout Client = 50 Seconds -> 30 Minutes
  • Timeout Server = 50 Seconds -> 30 Minutes
  • Added TCP Keep Alive (Option tcpka)

WhatsApp Image 2024-08-23 at 7 16 17 PM

After that, we tried accessing services using Autoscale Group Load Balancer. It working for 3-5 Minutes and then we were suddenly we got cut off. We couldnt PING or Telnet the VMs under Autoscale anymore.

We checked and saw VR started throwing HealthCheck issues where: 'Missing Load Balancer For XXXXXX'

To resolve the issue temporarily, we added a new LB Rule and removed it, to force HA Proxy to get the new info.

WhatsApp Image 2024-08-23 at 8 07 30 PM

As a result, the HA Proxy Config was reloaded and the settings all got overrided.

But strangely enough, once we repeat the steps above, the load balance issue no longer there. It seems to be intermittent.

STEPS TO REPRODUCE
Refer Above
EXPECTED RESULTS
To be able to change the TCP Timeout Setting without any issue. 
ACTUAL RESULTS
Get Healthcheck fails. And suddenly services didnt work. 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions