Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICMP Rule with NodeLatencyMonitor #7011

Merged
merged 8 commits into from
Apr 8, 2025
Merged

Conversation

Dhruv-J
Copy link
Contributor

@Dhruv-J Dhruv-J commented Feb 20, 2025

This PR provides the solution to an edge case with NodeLatencyMonitor, where
the feature does not work if ICMP queries are blocked by default. To fix this,
an iptable rule will be added if NodeLatencyMonitor is enabled, such that ICMP
requests via the Antrea gateway will be allowed.

Fixes #6952

Signed-off-by: Dhruv-J [email protected]

@antoninbas
Copy link
Contributor

@Dhruv-J you can rebase your branch and follow the same pattern as in #7030 (which has been merged)

@Dhruv-J Dhruv-J force-pushed the nlm-icmp branch 4 times, most recently from d690aed to 52b466b Compare March 13, 2025 22:31
This PR provides the solution to an edge case with NodeLatencyMonitor, where
the feature does not work if ICMP queries are blocked by default. To fix this,
an iptable rule will be added if NodeLatencyMonitor is enabled, such that ICMP
requests via the Antrea gateway will be allowed.

Fixes issue antrea-io#6952

Signed-off-by: Dhruv Jain <[email protected]>
@Dhruv-J Dhruv-J marked this pull request as ready for review March 14, 2025 07:41
@Dhruv-J Dhruv-J requested review from wenyingd and antoninbas March 14, 2025 07:43
@Dhruv-J
Copy link
Contributor Author

Dhruv-J commented Mar 14, 2025

still updating pipelines to pass, but will be ready for review right after

…r, fixed route_windows.go

Signed-off-by: Dhruv Jain <[email protected]>
@@ -1245,6 +1258,38 @@ func (c *Client) initWireguard() {
}
}

func (c *Client) initNodeLatency() {
gateway := "antrea-gw0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is configurable in theory so you should not hardcode it

the gateway name should be available in the nodeConfig field, please take a look

@@ -1245,6 +1258,38 @@ func (c *Client) initWireguard() {
}
}

func (c *Client) initNodeLatency() {
gateway := "antrea-gw0"
if c.networkConfig.TrafficEncapMode.String() == "networkPolicyOnly" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the IsNetworkPolicyOnly() method on c.networkConfig.TrafficEncapMode

I think there are examples in this file

func (c *Client) initNodeLatency() {
gateway := "antrea-gw0"
if c.networkConfig.TrafficEncapMode.String() == "networkPolicyOnly" {
gateway = "transport"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, there is no interface names transport. But there is a TransportIface field in the networkConfig.

antreaOutputChainRules := []string{
iptables.NewRuleBuilder(antreaOutputChain).
MatchOutputInterface(gateway).
SetComment("Antrea: allow egress packets from NodeLatencyMonitor").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent with the input rule comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusting both comments to specify ICMP and which type (ingress or egress)

Comment on lines 1266 to 1274
antreaInputChainRules := []string{
iptables.NewRuleBuilder(antreaInputChain).
MatchInputInterface(gateway).
SetComment("Antrea: allow ICMP packets from NodeLatencyMonitor").
SetTarget(iptables.AcceptTarget).
Done().
GetRule(),
}
antreaOutputChainRules := []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no match on "icmp" packets, this rule is way too generic as it allows all traffic

@@ -737,10 +748,12 @@ func (c *Client) syncIPTables() error {
// for performance reasons.
addFilterRulesToChain(iptablesFilterRulesByChainV4, &c.wireguardIPTablesIPv4)
addFilterRulesToChain(iptablesFilterRulesByChainV4, &c.nodeNetworkPolicyIPTablesIPv4)
addFilterRulesToChain(iptablesFilterRulesByChainV4, &c.nodeLatencyMonitorIPTablesIPv4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the comment above:

Install the static rules (WireGuard for now) before the dynamic rules (e.g., NodeNetworkPolicy) for performance reasons

this applies to the NLM rules as well, which are also static

Signed-off-by: Dhruv Jain <[email protected]>
if c.networkConfig.TrafficEncapMode.IsNetworkPolicyOnly() {
gateway = c.networkConfig.TransportIface
}
icmpZero := int32(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use https://pkg.go.dev/golang.org/x/net/ipv4#ICMPType since I notice that we already import / use this package in pkg/agent/monitortool. Same for IPV6.

antreaInputChainRulesIPv6 := []string{
iptables.NewRuleBuilder(antreaInputChain).
MatchInputInterface(gateway).
MatchICMP(&icmpZero, &icmpZero, iptables.ProtocolIPv6).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same values for IPv6 and IPv4. See https://pkg.go.dev/golang.org/x/net/ipv6

antreaInputChainRulesIPv6 := []string{
iptables.NewRuleBuilder(antreaInputChain).
MatchInputInterface(gateway).
MatchICMP(&icmpZero, &icmpZero, iptables.ProtocolIPv6).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use nil for the code

},
},
{
name: "encap,egress=true,multicastEnabled=true,proxyAll=true,nodeNetworkPolicy=false,nodeLatencyMonitor=true,nodeSNATRandomFully=false",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not add this extra test IMO, the test update above is enough

@@ -28,6 +28,7 @@ import (
"antrea.io/antrea/pkg/features"
)

// go test -timeout=10m -v antrea.io/antrea/test/e2e -run TestNodeLatencyMonitor -provider=kind -deploy-antrea=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't add this generally, so I'd suggest removing it

Comment on lines 1288 to 1290
if icmpType == 128 {
comment = "Antrea: allow ICMP echo request egress packets from NodeLatencyMonitor"
} else if icmpType == 8 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to distinguish between the 2, they are both for ICMP echo requests (one for IPv4 and one for IPv6)

personally i think we can use the same comment everywhere, for both request and reply, and avoid the complexity (e.g. Antrea: allow ICMP probes for NodeLatencyMonitor).

Comment on lines 1305 to 1310
inputRequestRule := buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoRequest))
outputRequestRule := buildOutputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoRequest))
inputReplyRule := buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoReply))
outputReplyRule := buildOutputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoReply))
c.nodeLatencyMonitorIPTablesIPv6.Store(antreaInputChain, []string{inputRequestRule, inputReplyRule})
c.nodeLatencyMonitorIPTablesIPv6.Store(antreaOutputChain, []string{outputRequestRule, outputReplyRule})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid fining these extra variables, for readability, I think they are not necessary:

c.nodeLatencyMonitorIPTablesIPv6.Store(antreaInputChain, []string{
        buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoRequest)),
        buildInputRule(iptables.ProtocolIPv6, int32(ipv6.ICMPTypeEchoReply)),
})

Signed-off-by: Dhruv Jain <[email protected]>
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, otherwise lgtm

@@ -154,6 +154,7 @@ func run(o *Options) error {
enableBridgingMode := enableAntreaIPAM && o.config.EnableBridgingMode
l7NetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.L7NetworkPolicy)
nodeNetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.NodeNetworkPolicy)
nodeLatencyMonitorEnabled := o.config.FeatureGates[string(features.NodeLatencyMonitor)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this different from the other ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the default featuregate value, I wanted to use the actual value of the config. I noticed that even if I set NodeLatencyMonitor to true in the deployment, it wouldn't set the boolean correctly. Is this expected behavior until I deploy a NodeLatencyMonitor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the default featuregate value, I wanted to use the actual value of the config

This is what all the surrounding lines are doing, otherwise the code would be badly broken.

I set NodeLatencyMonitor to true in the deployment, it wouldn't set the boolean correctly. Is this expected behavior until I deploy a NodeLatencyMonitor?

No, that's not the expected behavior but I would be surprised if this was the actual behavior as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will update, it seemed to just use a constant's value, so I chose to use the config value isntead.

@@ -735,10 +748,12 @@ func (c *Client) syncIPTables() error {
iptablesFilterRulesByChainV4 := make(map[string][]string)
// Install the static rules (WireGuard for now) before the dynamic rules (e.g., NodeNetworkPolicy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the comment:
s/WireGuard for now/WireGuard + NodeLatencyMonitor

@@ -1245,6 +1260,53 @@ func (c *Client) initWireguard() {
}
}

func (c *Client) initNodeLatency() {
gateway := c.nodeConfig.GatewayConfig.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the gateway variable to iface as the "gateway" name is not appropriate for policyOnly mode

maybe also add a comment for future readability:

the interface on which ICMP probes are sent / received is the Antrea gateway interface, except in networkPolicyOnly mode, for which it is the Node's transport interface.

@antoninbas antoninbas requested a review from tnqn March 21, 2025 20:09
Signed-off-by: Dhruv Jain <[email protected]>
@@ -154,6 +154,7 @@ func run(o *Options) error {
enableBridgingMode := enableAntreaIPAM && o.config.EnableBridgingMode
l7NetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.L7NetworkPolicy)
nodeNetworkPolicyEnabled := features.DefaultFeatureGate.Enabled(features.NodeNetworkPolicy)
nodeLatencyMonitorEnabled := features.DefaultFeatureGate.Enabled(features.NodeLatencyMonitor) //o.config.FeatureGates[string(features.NodeLatencyMonitor)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the comment needed?

@@ -278,6 +287,10 @@ func (c *Client) Initialize(nodeConfig *config.NodeConfig, done func()) error {
if c.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeWireGuard {
c.initWireguard()
}
if c.nodeLatencyMonitorEnabled {
c.initNodeLatency()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does initNodeLatencyMonitor sound more clear what it's for?

Copy link
Contributor Author

@Dhruv-J Dhruv-J Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured since it wasn't initializing the monitor itself, and just the rules, could initNodeLatencyRules() be a better descriptor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initNodeLatencyRules sounds good to me

@@ -278,6 +287,10 @@ func (c *Client) Initialize(nodeConfig *config.NodeConfig, done func()) error {
if c.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeWireGuard {
c.initWireguard()
}
if c.nodeLatencyMonitorEnabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it only add the rule when latencyConfig.Enable in pkg/agent/monitortool/monitor.go is true?
If we only check the feature gate, the 4 rules would be installed unconditionally on all clusters even the API is not used, which might add unnecessary overhead to Node-to-Node traffic. This might already be the case of nodeNetworkPolicyEnabled, but with more feature-specific rule added, I'm a bit concerned about the total overhead.

Or we could defer this and add one item to these features' beta graduation criteria: the feature-specific rules should only be installed when the feature is in-use.

what do you think? @antoninbas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with taking care of this as a follow-up item, and prior to graduating the feature. Maybe @Dhruv-J could open a separate issue to keep track of this.

@tnqn
Copy link
Member

tnqn commented Mar 31, 2025

Please change "Fixes issue #.." to "Fixes #..", otherwise merging the PR wouldn't close the issue.

Signed-off-by: Dhruv Jain <[email protected]>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Apr 7, 2025
@tnqn
Copy link
Member

tnqn commented Apr 7, 2025

/test-all

1 similar comment
@Dhruv-J
Copy link
Contributor Author

Dhruv-J commented Apr 8, 2025

/test-all

@antoninbas antoninbas merged commit 05d0cf3 into antrea-io:main Apr 8, 2025
58 of 63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an iptable rule to allow icmp for antrea-gw0 for NodeLatencyMonitor feature
3 participants