-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICMP Rule with NodeLatencyMonitor #7011
Conversation
d690aed
to
52b466b
Compare
This PR provides the solution to an edge case with NodeLatencyMonitor, where the feature does not work if ICMP queries are blocked by default. To fix this, an iptable rule will be added if NodeLatencyMonitor is enabled, such that ICMP requests via the Antrea gateway will be allowed. Fixes issue antrea-io#6952 Signed-off-by: Dhruv Jain <[email protected]>
still updating pipelines to pass, but will be ready for review right after |
…r, fixed route_windows.go Signed-off-by: Dhruv Jain <[email protected]>
pkg/agent/route/route_linux.go
Outdated
@@ -1245,6 +1258,38 @@ func (c *Client) initWireguard() { | |||
} | |||
} | |||
|
|||
func (c *Client) initNodeLatency() { | |||
gateway := "antrea-gw0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is configurable in theory so you should not hardcode it
the gateway name should be available in the nodeConfig
field, please take a look
pkg/agent/route/route_linux.go
Outdated
@@ -1245,6 +1258,38 @@ func (c *Client) initWireguard() { | |||
} | |||
} | |||
|
|||
func (c *Client) initNodeLatency() { | |||
gateway := "antrea-gw0" | |||
if c.networkConfig.TrafficEncapMode.String() == "networkPolicyOnly" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the IsNetworkPolicyOnly()
method on c.networkConfig.TrafficEncapMode
I think there are examples in this file
pkg/agent/route/route_linux.go
Outdated
func (c *Client) initNodeLatency() { | ||
gateway := "antrea-gw0" | ||
if c.networkConfig.TrafficEncapMode.String() == "networkPolicyOnly" { | ||
gateway = "transport" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct, there is no interface names transport
. But there is a TransportIface
field in the networkConfig
.
pkg/agent/route/route_linux.go
Outdated
antreaOutputChainRules := []string{ | ||
iptables.NewRuleBuilder(antreaOutputChain). | ||
MatchOutputInterface(gateway). | ||
SetComment("Antrea: allow egress packets from NodeLatencyMonitor"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inconsistent with the input rule comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjusting both comments to specify ICMP and which type (ingress or egress)
pkg/agent/route/route_linux.go
Outdated
antreaInputChainRules := []string{ | ||
iptables.NewRuleBuilder(antreaInputChain). | ||
MatchInputInterface(gateway). | ||
SetComment("Antrea: allow ICMP packets from NodeLatencyMonitor"). | ||
SetTarget(iptables.AcceptTarget). | ||
Done(). | ||
GetRule(), | ||
} | ||
antreaOutputChainRules := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no match on "icmp" packets, this rule is way too generic as it allows all traffic
pkg/agent/route/route_linux.go
Outdated
@@ -737,10 +748,12 @@ func (c *Client) syncIPTables() error { | |||
// for performance reasons. | |||
addFilterRulesToChain(iptablesFilterRulesByChainV4, &c.wireguardIPTablesIPv4) | |||
addFilterRulesToChain(iptablesFilterRulesByChainV4, &c.nodeNetworkPolicyIPTablesIPv4) | |||
addFilterRulesToChain(iptablesFilterRulesByChainV4, &c.nodeLatencyMonitorIPTablesIPv4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the comment above:
Install the static rules (WireGuard for now) before the dynamic rules (e.g., NodeNetworkPolicy) for performance reasons
this applies to the NLM rules as well, which are also static
Signed-off-by: Dhruv Jain <[email protected]>
pkg/agent/route/route_linux.go
Outdated
if c.networkConfig.TrafficEncapMode.IsNetworkPolicyOnly() { | ||
gateway = c.networkConfig.TransportIface | ||
} | ||
icmpZero := int32(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use https://pkg.go.dev/golang.org/x/net/ipv4#ICMPType since I notice that we already import / use this package in pkg/agent/monitortool
. Same for IPV6.
pkg/agent/route/route_linux.go
Outdated
antreaInputChainRulesIPv6 := []string{ | ||
iptables.NewRuleBuilder(antreaInputChain). | ||
MatchInputInterface(gateway). | ||
MatchICMP(&icmpZero, &icmpZero, iptables.ProtocolIPv6). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the same values for IPv6 and IPv4. See https://pkg.go.dev/golang.org/x/net/ipv6
pkg/agent/route/route_linux.go
Outdated
antreaInputChainRulesIPv6 := []string{ | ||
iptables.NewRuleBuilder(antreaInputChain). | ||
MatchInputInterface(gateway). | ||
MatchICMP(&icmpZero, &icmpZero, iptables.ProtocolIPv6). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use nil
for the code
pkg/agent/route/route_linux_test.go
Outdated
}, | ||
}, | ||
{ | ||
name: "encap,egress=true,multicastEnabled=true,proxyAll=true,nodeNetworkPolicy=false,nodeLatencyMonitor=true,nodeSNATRandomFully=false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not add this extra test IMO, the test update above is enough
test/e2e/nodelatencymonitor_test.go
Outdated
@@ -28,6 +28,7 @@ import ( | |||
"antrea.io/antrea/pkg/features" | |||
) | |||
|
|||
// go test -timeout=10m -v antrea.io/antrea/test/e2e -run TestNodeLatencyMonitor -provider=kind -deploy-antrea=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't add this generally, so I'd suggest removing it
Signed-off-by: Dhruv Jain <[email protected]>
Signed-off-by: Dhruv Jain <[email protected]>
pkg/agent/route/route_linux.go
Outdated
if icmpType == 128 { | ||
comment = "Antrea: allow ICMP echo request egress packets from NodeLatencyMonitor" | ||
} else if icmpType == 8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to distinguish between the 2, they are both for ICMP echo requests (one for IPv4 and one for IPv6)
personally i think we can use the same comment everywhere, for both request and reply, and avoid the complexity (e.g. Antrea: allow ICMP probes for NodeLatencyMonitor
).
pkg/agent/route/route_linux.go
Outdated
inputRequestRule := buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoRequest)) | ||
outputRequestRule := buildOutputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoRequest)) | ||
inputReplyRule := buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoReply)) | ||
outputReplyRule := buildOutputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoReply)) | ||
c.nodeLatencyMonitorIPTablesIPv6.Store(antreaInputChain, []string{inputRequestRule, inputReplyRule}) | ||
c.nodeLatencyMonitorIPTablesIPv6.Store(antreaOutputChain, []string{outputRequestRule, outputReplyRule}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid fining these extra variables, for readability, I think they are not necessary:
c.nodeLatencyMonitorIPTablesIPv6.Store(antreaInputChain, []string{
buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoRequest)),
buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoReply)),
})
Signed-off-by: Dhruv Jain <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, otherwise lgtm
cmd/antrea-agent/agent.go
Outdated
@@ -154,6 +154,7 @@ func run(o *Options) error { | |||
enableBridgingMode := enableAntreaIPAM && o.config.EnableBridgingMode | |||
l7NetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.L7NetworkPolicy) | |||
nodeNetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.NodeNetworkPolicy) | |||
nodeLatencyMonitorEnabled := o.config.FeatureGates[string(features.NodeLatencyMonitor)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this different from the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using the default featuregate value, I wanted to use the actual value of the config. I noticed that even if I set NodeLatencyMonitor to true in the deployment, it wouldn't set the boolean correctly. Is this expected behavior until I deploy a NodeLatencyMonitor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using the default featuregate value, I wanted to use the actual value of the config
This is what all the surrounding lines are doing, otherwise the code would be badly broken.
I set NodeLatencyMonitor to true in the deployment, it wouldn't set the boolean correctly. Is this expected behavior until I deploy a NodeLatencyMonitor?
No, that's not the expected behavior but I would be surprised if this was the actual behavior as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will update, it seemed to just use a constant's value, so I chose to use the config value isntead.
pkg/agent/route/route_linux.go
Outdated
@@ -735,10 +748,12 @@ func (c *Client) syncIPTables() error { | |||
iptablesFilterRulesByChainV4 := make(map[string][]string) | |||
// Install the static rules (WireGuard for now) before the dynamic rules (e.g., NodeNetworkPolicy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the comment:
s/WireGuard for now/WireGuard + NodeLatencyMonitor
pkg/agent/route/route_linux.go
Outdated
@@ -1245,6 +1260,53 @@ func (c *Client) initWireguard() { | |||
} | |||
} | |||
|
|||
func (c *Client) initNodeLatency() { | |||
gateway := c.nodeConfig.GatewayConfig.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the gateway
variable to iface
as the "gateway" name is not appropriate for policyOnly mode
maybe also add a comment for future readability:
the interface on which ICMP probes are sent / received is the Antrea gateway interface, except in networkPolicyOnly mode, for which it is the Node's transport interface.
Signed-off-by: Dhruv Jain <[email protected]>
cmd/antrea-agent/agent.go
Outdated
@@ -154,6 +154,7 @@ func run(o *Options) error { | |||
enableBridgingMode := enableAntreaIPAM && o.config.EnableBridgingMode | |||
l7NetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.L7NetworkPolicy) | |||
nodeNetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.NodeNetworkPolicy) | |||
nodeLatencyMonitorEnabled := features.DefaultFeatureGate.Enabled(features.NodeLatencyMonitor) //o.config.FeatureGates[string(features.NodeLatencyMonitor)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the comment needed?
pkg/agent/route/route_linux.go
Outdated
@@ -278,6 +287,10 @@ func (c *Client) Initialize(nodeConfig *config.NodeConfig, done func()) error { | |||
if c.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeWireGuard { | |||
c.initWireguard() | |||
} | |||
if c.nodeLatencyMonitorEnabled { | |||
c.initNodeLatency() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does initNodeLatencyMonitor
sound more clear what it's for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured since it wasn't initializing the monitor itself, and just the rules, could initNodeLatencyRules()
be a better descriptor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initNodeLatencyRules
sounds good to me
@@ -278,6 +287,10 @@ func (c *Client) Initialize(nodeConfig *config.NodeConfig, done func()) error { | |||
if c.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeWireGuard { | |||
c.initWireguard() | |||
} | |||
if c.nodeLatencyMonitorEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it only add the rule when latencyConfig.Enable
in pkg/agent/monitortool/monitor.go is true
?
If we only check the feature gate, the 4 rules would be installed unconditionally on all clusters even the API is not used, which might add unnecessary overhead to Node-to-Node traffic. This might already be the case of nodeNetworkPolicyEnabled, but with more feature-specific rule added, I'm a bit concerned about the total overhead.
Or we could defer this and add one item to these features' beta graduation criteria: the feature-specific rules should only be installed when the feature is in-use.
what do you think? @antoninbas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with taking care of this as a follow-up item, and prior to graduating the feature. Maybe @Dhruv-J could open a separate issue to keep track of this.
Please change "Fixes issue #.." to "Fixes #..", otherwise merging the PR wouldn't close the issue. |
Signed-off-by: Dhruv Jain <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
1 similar comment
/test-all |
This PR provides the solution to an edge case with NodeLatencyMonitor, where
the feature does not work if ICMP queries are blocked by default. To fix this,
an iptable rule will be added if NodeLatencyMonitor is enabled, such that ICMP
requests via the Antrea gateway will be allowed.
Fixes #6952
Signed-off-by: Dhruv-J [email protected]