-
Notifications
You must be signed in to change notification settings - Fork 954
Introduce XdsClusterManager
to cache Cluster
s
#6261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the xDS resource management by introducing an XdsClusterManager to cache and reuse Cluster resources, while also updating the subscription context and resource node implementations to align with upstream behavior.
- Introduces a new SubscriptionContext and DefaultSubscriptionContext replacing direct usage of XdsBootstrapImpl.
- Refactors resource node classes and associated watchers for Route, Listener, Cluster, and Endpoint resources.
- Updates static resources and bootstrap cluster initialization to support centralized cluster caching.
Reviewed Changes
Copilot reviewed 20 out of 27 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
xds/StaticSubscriptionContext.java | Introduces a static subscription context with unsupported operations throwing exceptions. |
xds/StaticResourceUtils.java | Replaces the older XdsBootstrapImpl parameter with SubscriptionContext in utility methods for static resource creation. |
xds/RouteXdsResource.java | Removes the primer field and updates equality/hashCode to reflect changes. |
xds/RouteResourceNode.java | Refactors resource node watcher creation and closure to update virtual host snapshots. |
xds/ListenerRoot.java & ListenerResourceNode.java | Adjusts listener initialization and subscription to use SubscriptionContext and executes updates on the event loop. |
xds/EndpointXdsResource.java & EndpointResourceNode.java | Updates endpoint resource implementations to remove primer support and use SubscriptionContext. |
xds/DefaultSubscriptionContext.java, ControlPlaneClientManager.java | Introduces default subscription context and client management for control plane interactions. |
xds/ClusterXdsResource.java, ClusterSnapshot.java, ClusterRoot.java, ClusterResourceNode.java | Refactors cluster-related resource nodes, caching upstream TLS context and updating snapshot handling. |
xds/BootstrapClusters.java | Updates bootstrap cluster initialization to register static clusters via the new cluster manager. |
xds/AbstractRoot.java, AbstractResourceNode.java | Adjusts abstract resource node implementations to support SubscriptionContext and watcher management. |
Comments suppressed due to low confidence (2)
xds/src/main/java/com/linecorp/armeria/xds/ListenerRoot.java:42
- Consider whether scheduling the close operation asynchronously on the event loop meets the desired resource release guarantees. If immediate cleanup is required, evaluate synchronizing the close logic or awaiting the execution completion.
eventLoop().execute(() -> node.onChanged(listenerXdsResource));
xds/src/main/java/com/linecorp/armeria/xds/AbstractResourceNode.java:49
- [nitpick] The checkArgument conditions for dynamic and static nodes improve safety, but consider adding comments or more descriptive error messages to document the invariants expected for configSource values in each node type.
if (resourceNodeType == ResourceNodeType.DYNAMIC) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks much simpler now. 👍 👍 👍
xds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.java
Outdated
Show resolved
Hide resolved
|
||
void secondaryInitialize(SubscriptionContext context) { | ||
for (Cluster cluster: bootstrap.getStaticResources().getClustersList()) { | ||
if (!cluster.hasEdsClusterConfig()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initializeEdsClusters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep the naming secondary as it is a widely used term upstream. (ref)
Let me know if you feel strongly about this though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know about it. Let's use the term secondary, then.
final XdsType type = node.type(); | ||
final String resourceName = node.name(); | ||
final ConfigSourceClient client = clientMap.get(node.configSource()); | ||
if (client != null && client.removeSubscriber(type, resourceName, node)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to use assert client != null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since updates are event-loop based, I imagined that a call tounsubscribe
can be queued after ControlPlaneClientManager#close
is called.
As there is no way to immediately know the state of the manager, I preferred to add a conditional statement instead of an assertion.
Let me know if you still prefer that an assertion is done though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't think of the case. Let's keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice. Left only minor comments.
notifyOnError(type, error); | ||
} | ||
|
||
void notifyOnError(XdsType type, Status error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) Should we make this method private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is intended to be called from subclasses. Let me know if I misunderstood your intention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me go ahead and merge this PR. If you have additional comments, let me follow up in the next iteration.
Motivation:
This changeset can be thought of as a refactoring for the next PR #6096.
Functionally, there should be no changes.
Cluster caching
Currently, a ClusterManager is created every time a new
XdsEndpointGroup
is created.While this was easy to implement at the time, this poses a problem where each
XdsEndpointGroup
creates a backingEndpointGroup
such asHealthCheckedEndpointGroup
orDnsAddressEndpointGroup
. Moreover, even having multiple clusters contained in different routes would also consume resources. (e.g.route1
withcluster1
androute2
withcluster1
. Althoughcluster1
is the same, they are contained in a differentRouteSnapshot
and thus handled twice)A cluster is expected to be uniquely identified by its name as specified in the documentation, and this is reflected in the upstream implementation as well. ref
I propose that for each
XdsBootstrap
, clusters and the resources attached to them are cached by name. The centralized location for managing these clusters is namedXdsClusterManager
after ClusterManager.In the future,
XdsClusterManager
will take over the responsibility of the currentClusterManager
and also manage resources (health checking, dns, etc..) associated with clusters. While caching resources isn't done in this PR,the following POC PR can be checked for reference on how this will be done.
XdsClusterManager
initializationThe changes in this PR also try to align closer to the upstream implementation.
There is an inherent dependency between adding clusters and creating a bootstrap. Bootstrap clusters should be created on initialization; yet bootstrap clusters may require connection to a control plane (which may require other bootstraps). In order to resolve this, a couple of refactoring efforts have taken place.
The initialization process of
XdsClusterManager
has been modified to closely follows that of upstream. This guarantees that bootstrapped clusters will initialize correctly irregardless of the declaration order and matches upstream behavior. (ref). Essentially, start up is done in two phases 1) non-eds cluster initialization 2) eds cluster initialization. This is done since it is possible that eds clusters require a bootstrap to connect to a control plane (relying on non-eds clusters).Another implication of the change above is that a bootstrap is not created when bootstrap clusters are initialized. In order to resolve this circular dependency, the logic where remote subscription (to the control plane) takes place has been abstracted to a
ControlPlaneClientManager
. Now that subscription doesn't require a bootstrap and requires aControlPlaneClientManager
, a newSubscriptionContext
has been introduced to allow for easier subscription. Since non-eds bootstrap clusters don't require connection to a control plane, a separateStaticSubscriptionContext
is used.Modifications:
Cluster
caching related changesCluster
s are now cached.ResourceNode
s now support multiple snapshot watchers. This change is made so thatXdsClusterManager
can determine if a cluster no longer has to be cached depending on the watchers.XdsClusterManager
initialization related changesControlPlaneClientManager
is introduced which managesConfigSourceClient
sSubscriptionContext
is introduced to remove the circular dependency betweenResourceNode
s and bootstrap.BootstrapClusters
is now initialized in two phasesClusterRoot
orListenerRoot
can cause logic in theClusterManager
to be executed from a non-eventloop. Modified the logic so that user threads cannot modifyClusterManager
or the internal bootstrap state.ClusterXdsResource#upstreamTlsContext
is now created once and cachedResult:
XdsPreprocessor
#6096