Skip to content

update branch ENI operation metrics & dev guide #465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

sushrk
Copy link
Contributor

@sushrk sushrk commented Sep 11, 2024

Description of changes:
Updated the metrics to measure branch ENI operations- create branch ENI, annotate pod, and initialize trunk. The summary metric also measures 0.5, 0.9, 0.99 quantiles (50th, 90th, and 99th percentiles), and quantile 0(min value) & 1(max value).

Test
Running scale tests to deploy 1K pods repeatedly:

# HELP branch_provider_operation_latency Branch Provider operations latency in ms
# TYPE branch_provider_operation_latency summary
branch_provider_operation_latency{branch_provider_operation="annotate_branch_eni",resource_count="1",quantile="0"} 9
branch_provider_operation_latency{branch_provider_operation="annotate_branch_eni",resource_count="1",quantile="0.5"} 15
branch_provider_operation_latency{branch_provider_operation="annotate_branch_eni",resource_count="1",quantile="0.9"} 35
branch_provider_operation_latency{branch_provider_operation="annotate_branch_eni",resource_count="1",quantile="0.99"} 84
branch_provider_operation_latency{branch_provider_operation="annotate_branch_eni",resource_count="1",quantile="1"} 110
branch_provider_operation_latency_sum{branch_provider_operation="annotate_branch_eni",resource_count="1"} 32414
branch_provider_operation_latency_count{branch_provider_operation="annotate_branch_eni",resource_count="1"} 1612
branch_provider_operation_latency{branch_provider_operation="create_branch_eni",resource_count="1",quantile="0"} 817
branch_provider_operation_latency{branch_provider_operation="create_branch_eni",resource_count="1",quantile="0.5"} 4928
branch_provider_operation_latency{branch_provider_operation="create_branch_eni",resource_count="1",quantile="0.9"} 5453
branch_provider_operation_latency{branch_provider_operation="create_branch_eni",resource_count="1",quantile="0.99"} 5854
branch_provider_operation_latency{branch_provider_operation="create_branch_eni",resource_count="1",quantile="1"} 15032
branch_provider_operation_latency_sum{branch_provider_operation="create_branch_eni",resource_count="1"} 7.565694e+06
branch_provider_operation_latency_count{branch_provider_operation="create_branch_eni",resource_count="1"} 1613
branch_provider_operation_latency{branch_provider_operation="init_trunk",resource_count="1",quantile="0"} 3112
branch_provider_operation_latency{branch_provider_operation="init_trunk",resource_count="1",quantile="0.5"} 3186
branch_provider_operation_latency{branch_provider_operation="init_trunk",resource_count="1",quantile="0.9"} 15665
branch_provider_operation_latency{branch_provider_operation="init_trunk",resource_count="1",quantile="0.99"} 15869
branch_provider_operation_latency{branch_provider_operation="init_trunk",resource_count="1",quantile="1"} 15869
branch_provider_operation_latency_sum{branch_provider_operation="init_trunk",resource_count="1"} 276597
branch_provider_operation_latency_count{branch_provider_operation="init_trunk",resource_count="1"} 30

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sushrk sushrk requested a review from a team as a code owner September 11, 2024 19:29
Copy link
Contributor

@haouc haouc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@barryrobison
Copy link

Just curious - will these metrics be exposed somewhere for the customer?

Copy link
Contributor

@yash97 yash97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🚀

@sushrk
Copy link
Contributor Author

sushrk commented Sep 12, 2024

Just curious - will these metrics be exposed somewhere for the customer?

@barryrobison yes, control plane metrics can be viewed via this command

kubectl get --raw /metrics

EDIT: Sorry only APIServer and etcd metrics are exposed at the moment. The controller metrics is not exposed.

More info on monitoring the metrics can be found here https://aws.github.io/aws-eks-best-practices/reliability/docs/controlplane/#monitor-control-plane-metrics

@sushrk sushrk merged commit 4ea11cb into aws:master Sep 12, 2024
4 checks passed
@sushrk sushrk deleted the pod-latency branch September 12, 2024 07:08
sushrk added a commit to sushrk/amazon-vpc-resource-controller-k8s that referenced this pull request Oct 25, 2024
sushrk added a commit that referenced this pull request Oct 25, 2024
* update branch ENI operation metrics & dev guide (#465)

* measure branch ENI operation latency in seconds (#469)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants