-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Allow more permissive extensibility for securityRules and additional CP LoadBalancers #5525
base: main
Are you sure you want to change the base?
✨ Allow more permissive extensibility for securityRules and additional CP LoadBalancers #5525
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5525 +/- ##
==========================================
+ Coverage 52.61% 52.91% +0.29%
==========================================
Files 272 272
Lines 29485 29515 +30
==========================================
+ Hits 15513 15617 +104
+ Misses 13165 13086 -79
- Partials 807 812 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bec7326
to
fbe11d7
Compare
api/v1beta1/types.go
Outdated
@@ -893,6 +893,20 @@ func (s SubnetSpec) IsIPv6Enabled() bool { | |||
return false | |||
} | |||
|
|||
// GetSecurityRuleByDestination returns security group rule, which matches provided destination ports. | |||
func (s SubnetSpec) GetSecurityRuleByDestination(ports string) *SecurityRule { | |||
if s.SecurityGroup.SecurityRules == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapping the below for loop in if s.SecurityGroup.SecurityRules != nil {
is equivalent and allows us to remove the add'l return nil
response (we don't need a specific nil
return for this condition because the default return before exiting the function is nil
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed, thanks.
@@ -1020,9 +1020,18 @@ func (s *ClusterScope) SetControlPlaneSecurityRules() { | |||
if !s.ControlPlaneEnabled() { | |||
return | |||
} | |||
if s.ControlPlaneSubnet().SecurityGroup.SecurityRules == nil { | |||
|
|||
subnet := s.ControlPlaneSubnet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this flow where are we adding any user-provided security rules? (for example if a user specifies TCP 9345
)
or is that elsewhere and the purpose of this change is to filter out 22
and apiserver port if it's not included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flow performs only defaulting on top of the user-provided set of rules, which may not be empty. This way having a field populated does not always need to specify all allowed ports, only additional ones or overrides.
I’m currently validating how this works in tandem with ClusterClass definitions we have. The desired goal is to permit specifying registrationMethod: control-plane-endpoint
and allow RKE2 to register on the allowed port. But ideally this should follow internal name resolution, so that the traffic wouldn’t go through external load balancer.
If it doesn’t work, maybe it would still require additional LB rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems reasonable to have the additional ports of the APIServer LB opened-up/have an opinion about.
Could you please elaborate on why the security rules should not be updated too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I’m not fully understanding. I will try to give my thoughts on this:
I was attempting to describe the flow of how the rules are processed. We only apply defaults on top of what the user provides, so the user can still specify ports like TCP 9345 explicitly in security rules. We’re not overriding or stripping anything out, just adding fallback values when needed. LB rules were needed as well, since CP endpoint is external from the cluster perspective.
Current set of changes is what was required to make cluster healthy, and added e2e test displays that. My previous comment was a set of assumptions, or “paths to explore later” if it doesn’t work.
Signed-off-by: Danil-Grigorev <[email protected]>
452799e
to
df2c355
Compare
Signed-off-by: Danil-Grigorev <[email protected]>
df2c355
to
cf4c84b
Compare
/test ? |
@Danil-Grigorev: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
d744062
to
0d5e30f
Compare
9e8e8e1
to
3592388
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
14e9903
to
85603df
Compare
/test pull-cluster-api-provider-azure-e2e-optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Danil-Grigorev 👋🏼 ! Thank you for creating this PR!
Below is an initial round of review. Will iterate again.
api/v1beta1/types.go
Outdated
@@ -893,6 +906,17 @@ func (s SubnetSpec) IsIPv6Enabled() bool { | |||
return false | |||
} | |||
|
|||
// GetSecurityRuleByDestination returns security group rule, which matches provided destination ports. | |||
func (s SubnetSpec) GetSecurityRuleByDestination(ports string) *SecurityRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
port
seems to be holding a single port here.
func (s SubnetSpec) GetSecurityRuleByDestination(ports string) *SecurityRule { | |
func (s SubnetSpec) GetSecurityRuleByDestination(port string) *SecurityRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was originally like so, but users can supply a list o ports with this, like 22,443
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the current implementation does not seem to be iterating over a comma-separated list of ports ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doublechecked it, maybe my initial assumption was wrong. It can be a number or an *
there, so changed it to port as suggested.
@@ -893,6 +906,17 @@ func (s SubnetSpec) IsIPv6Enabled() bool { | |||
return false | |||
} | |||
|
|||
// GetSecurityRuleByDestination returns security group rule, which matches provided destination ports. | |||
func (s SubnetSpec) GetSecurityRuleByDestination(ports string) *SecurityRule { | |||
for _, rule := range s.SecurityGroup.SecurityRules { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably for the long term, we should change s.SecurityGroup.SecurityRules
to a dictionary type for faster processing.
@@ -141,22 +146,22 @@ spec: | |||
name: my-subnet-cp-nsg | |||
securityRules: | |||
- name: "allow_ssh" | |||
description: "allow SSH" | |||
description: "Deny SSH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we updating this example to a Deny
SSH ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to showcase that even though port 22 is defaulted, the rule for this can be overridden to deny it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would benefit if we show an overriding scenario the example you added that adds 9345
to the APIServer LB.
Updating the rule to Deny could throw off users in the first glance.
computeGallery: | ||
gallery: replace-me | ||
name: replace-me | ||
version: replace-me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this left intentionally with replace-me
value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was replaced by CC patch, but it turns out that having this field set by default causes errors on double template apply, which might have affected e2e execution. Replaced with full template patch now.
templates/flavors/clusterclass-rke2/azure-machine-template-controlplane.yaml
Outdated
Show resolved
Hide resolved
07e0e7b
to
f5702e8
Compare
The tests are still failing due to this error:
even though the template in artifacts has the value and CRD changes are a part of this PR. Is there something missing, please advise? (Edit: my mistake - wrong variable name) |
6fdfb5f
to
b66e8fe
Compare
/test pull-cluster-api-provider-azure-e2e-optional |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great PR, a lot of work has been put in here. Thank you @Danil-Grigorev !
I am glad that the optional RKE2 test is working!
If I could request, could you please add unit test cases to cluster_test.go
and loadbalancers/spec_test.go
to support your changes.
I also added some comments below, and would love to hear your thoughts!
api/v1beta1/types.go
Outdated
@@ -111,9 +111,22 @@ type NetworkSpec struct { | |||
// +optional | |||
ControlPlaneOutboundLB *LoadBalancerSpec `json:"controlPlaneOutboundLB,omitempty"` | |||
|
|||
// AdditionalControlPlaneLBPorts is the configuration for the additional inbound control-plane load balancer ports | |||
// +optional | |||
AdditionalControlPlaneLBPorts []LoadBalancerPort `json:"additionalControlPlaneLBPorts,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of renaming AdditionalControlPlaneLBPorts
to AdditionalAPIServerLBPorts
?
CAPZ has APIServerLB
, NodeOutboundLB
and ControlplaneOutboundLB
. Since the additional ports will be used in APIServerLB
, it is better, I believe, to have the additional ports renamed to AdditionalAPIServerLBPorts
api/v1beta1/types.go
Outdated
@@ -893,6 +906,17 @@ func (s SubnetSpec) IsIPv6Enabled() bool { | |||
return false | |||
} | |||
|
|||
// GetSecurityRuleByDestination returns security group rule, which matches provided destination ports. | |||
func (s SubnetSpec) GetSecurityRuleByDestination(ports string) *SecurityRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the current implementation does not seem to be iterating over a comma-separated list of ports ?
api/v1beta1/types_template.go
Outdated
|
||
// AdditionalControlPlaneLBPorts is the configuration for the additional inbound control-plane load balancer ports | ||
// +optional | ||
AdditionalControlPlaneLBPorts []LoadBalancerPort `json:"additionalControlPlaneLBPorts,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
@@ -1020,9 +1020,18 @@ func (s *ClusterScope) SetControlPlaneSecurityRules() { | |||
if !s.ControlPlaneEnabled() { | |||
return | |||
} | |||
if s.ControlPlaneSubnet().SecurityGroup.SecurityRules == nil { | |||
|
|||
subnet := s.ControlPlaneSubnet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems reasonable to have the additional ports of the APIServer LB opened-up/have an opinion about.
Could you please elaborate on why the security rules should not be updated too ?
azure/scope/cluster.go
Outdated
if subnet.SecurityGroup.SecurityRules == nil { | ||
s.AzureCluster.Spec.NetworkSpec.UpdateControlPlaneSubnet(subnet) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add more context on setting s.AzureCluster.Spec.NetworkSpec.UpdateControlPlaneSubnet(subnet)
three times in this function ?
Could we not update it once in the end of the function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but by equivalent aggregation of ifs evaluating to true. Since the method is only setting struct default values, I assumed the cost is negligible, for the improved readability and confidence of data being persisted.
test/e2e/common.go
Outdated
} | ||
count := 0 | ||
for _, machine := range machineList.Items { | ||
if machine.Status.NodeRef != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have a more granular check here instead of checking for not nil ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it to Ready
condition, which:
is true if the Machine's deletionTimestamp is not set, Machine's BootstrapConfigReady, InfrastructureReady,
// NodeHealthy and HealthCheckSucceeded
But the main goal of the check is to verify that CP nodes joined cluster in the expected quantity. Previous nil check fulfilled that too…
@@ -141,22 +146,22 @@ spec: | |||
name: my-subnet-cp-nsg | |||
securityRules: | |||
- name: "allow_ssh" | |||
description: "allow SSH" | |||
description: "Deny SSH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would benefit if we show an overriding scenario the example you added that adds 9345
to the APIServer LB.
Updating the rule to Deny could throw off users in the first glance.
3c7b16f
to
0091001
Compare
Signed-off-by: Danil-Grigorev <[email protected]>
0091001
to
34ffcbb
Compare
Signed-off-by: Danil-Grigorev <[email protected]>
34ffcbb
to
b408363
Compare
/retest |
/test pull-cluster-api-provider-azure-e2e |
Signed-off-by: Danil-Grigorev <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implement
AdditionalControlPlaneLBPorts
field for the additional inbound control-plane load balancer port configuration, allowing external access for user specified ports on the CP machine.This change also improves UX while using
CAPZ
withRKE2
provider, by allowing to specify only additional security rules on theAzureCluster
resource, and making existing security rule defaults forAPIServer
andSSH
port more permissive and based on destination port selection.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #5511
Special notes for your reviewer:
TODOs:
Release note: