Skip to content

[receiver/kubeletstats] Collect network metrics from all interfaces #34287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Jul 29, 2024

Description:

Revives #30626 by picking its changes (I have included the original author in the commit as suggested at #30626 (comment)).

Link to tracking Issue: Fixes #30196 #33993

Testing: Updated

Documentation: ~

Signed-off-by: ChrsMark [email protected]
Co-authored-by: jinja2 [email protected]

@povilasv
Copy link
Contributor

povilasv commented Aug 5, 2024

Question if this shouldn't be marked as breaking change?

If you relied previously on this metric in some dashboard / query then I think it will break, since it will now return time series per interface?

@ChrsMark
Copy link
Member Author

ChrsMark commented Aug 5, 2024

Question if this shouldn't be marked as breaking change?

If you relied previously on this metric in some dashboard / query then I think it will break, since it will now return time series per interface?

Hmm 🤔 , I think it shouldn't be an issue since the interface attribute is not sth new. The receiver was just populating metrics for only one interface. The schema is the same. So that shouldn't be a breaking change since the dimensions are the same, or do I miss anything else here?

Comment on lines 20 to 50
for i := range s.Interfaces {
recordNetworkDataPoint(mb, networkMetrics.IO, &s.Interfaces[i], getNetworkIO, currentTime)
recordNetworkDataPoint(mb, networkMetrics.Errors, &s.Interfaces[i], getNetworkErrors, currentTime)
}
Copy link
Member

@TylerHelmuth TylerHelmuth Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If s.Interfaces contains more than just eth0, then this will cause the receiver to produce more datapoints. While the interface attribute was always present, it has historically only ever had one value. Now it will have more, so the cardinality of the metric effectively increases as does the amount of datapoints it produces. This increase in data may be unexpected for new users.

I believe we need to put this behind a feature gate or a config option so that users are not surprised.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank's for the context. A feature gate sounds good to me.

Copy link
Member Author

@ChrsMark ChrsMark Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of it again, I realize that maybe this change cannot be a drop-in replacement.
Some users might already be happy with the default network metrics, so adding metrics for all the interfaces by-default might be problematic.
I think making this functionality configurable makes more sense (compared to going with a feature flag).

So instead of the feature flag I think introducing 2 new settings would provide better flexibility here:

add_node_interfaces_metrics: true|false
add_pod_interface_metrics: true|false

or

add_interfaces_metrics:
  node: true|false
  pod: true|false

The default behavior will continue to be as is today, collecting only metrics for the default network interface. By enabling the "interface_metrics" settings users will be explicitly making the receiver to include the additional metrics for all the interfaces.

Also providing the node/pod level granularity will be useful since users might want to avoid collecting all Pod's interfaces metrics when Pods are in hostNetwork: true mode (as mentioned at #30196 (comment)).

@TylerHelmuth @povilasv let me know what you think here.

Copy link
Contributor

@povilasv povilasv Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add feature flag, if people don't want additional interfaces they can write some OTTL to drop unneeded ones / keep only default one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! My only concern is about the hostNetwork case but eventually if a Pod is plugged to the host network it makes sense to observe "its" traffic anyways.
I have put the change behind a feature flag. Let me know you think folks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more in favor of the config option, although I am curious when the actual behavior would be in a standard cluster. Will most have other interfaces to collector metrics from?

@dmitryax @jinja2 what are your thoughts?

@ChrsMark ChrsMark force-pushed the fix_k8s_network_metrics branch from 957a0b3 to 6faabb4 Compare August 7, 2024 08:14
@ChrsMark ChrsMark force-pushed the fix_k8s_network_metrics branch from 5486580 to 17f7b65 Compare August 8, 2024 08:28
@ChrsMark ChrsMark force-pushed the fix_k8s_network_metrics branch from 17f7b65 to fea37cb Compare August 8, 2024 08:52
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 23, 2024
@rogercoll
Copy link
Contributor

/unstale feature gate added in fea37cb

@github-actions github-actions bot removed the Stale label Aug 24, 2024
Copy link
Contributor

github-actions bot commented Sep 8, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 8, 2024
@github-actions github-actions bot removed the Stale label Sep 19, 2024
Copy link
Contributor

github-actions bot commented Oct 3, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 3, 2024
@atoulme
Copy link
Contributor

atoulme commented Oct 8, 2024

Please resolve conflicts?

@github-actions github-actions bot removed the Stale label Oct 9, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 24, 2024
Copy link
Contributor

github-actions bot commented Nov 8, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Nov 8, 2024
dmitryax pushed a commit that referenced this pull request Apr 24, 2025
…38737)

#### Description
Revives #34287 by picking it's changes.
Also changed changed approach from feature gate to configuration
parameters (disabled by default).
In context of network metrics - default behavior of component hasn't
changed, so I consider it as a non-breaking change.

#### Link to tracking issue
Fixes #30196 

#### Testing
Unit test were updated to cover new functionality

#### Documentation
Documentation updated to cover new opt-in functionality

Original authors are included in the commit:
Co-authored-by: ChrsMark <[email protected]>
Co-authored-by: jinja2 <[email protected]>

---------

Co-authored-by: Christos Markou <[email protected]>
Co-authored-by: Jina Jain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

k8s.pod.network.io gives data only from eth0
6 participants