-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Added changes to support alternative recommender #4131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added changes to support alternative recommender #4131
Conversation
@kgolab for visibility |
vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/examples/hamster-default-recommender.yaml
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go
Outdated
Show resolved
Hide resolved
@@ -38,6 +38,8 @@ const ( | |||
// events (container OOMs). | |||
// All input to the VPA Recommender algorithm lives in this structure. | |||
type ClusterState struct { | |||
// Denote the recommender name | |||
RecommenderName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented above I don't yet understand why we need to store this dynamically in the default recommender.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgolab, as I mentioned, I hope it is easier to reuse in other alternative recommenders' codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split this change (adding RecommenderName
field) to a separate PR.
It will make this PR easier to review. It will also make it easier to see how we'd use the field in alternative recommenders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this change (adding RecommenderName
field) to a separate PR.
It will make this PR easier to review. It will also make it easier to see how we'd use RecommenderName field
Thank you for the PR. I'd like to understand if we really need to propagate recommenderName throughout the code of default VPA:
Looking at the KEP I think we're missing an update in validateVPA. Or have you already decided to support multiple recommenders (changes in cluster_feeder.go might suggest so)? |
vertical-pod-autoscaler/examples/hamster-default-recommender.yaml
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/recommender/checkpoint/checkpoint_writer_test.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go
Outdated
Show resolved
Hide resolved
@kgolab Thanks so much for your review.
Yes, sorry for missing the validateVPA. I added back. And yes, we want to support multiple VPA recommenders in the future. However, we may need a new KEP on how to cache the RecommendedPodResources for different recommenders in VPAStatus and how to select one recommendation among multiple recommenders when evicting and admitting the pod. We could initiate discussions once this first step PR is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments on my side. @kgolab waiting for your input on the current approach as explained by Chen
@@ -32,6 +32,9 @@ import ( | |||
"k8s.io/klog" | |||
) | |||
|
|||
// RecommenderName denotes the current recommender name as the "default" one. | |||
const RecommenderName = "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about DefaultRecommenderName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this went back to "RecommenderName" in the meantime, please change it back to DefaultRecommenderName
@wangchen615 It would be great to squash your commits. This repo doesn't squash it automatically. |
@mwielgus, @schylek, @kgolab, @jbartosik, @krzysied, this PR had no new comments for almost 2 months and all comments have been resolved. Could you help approve this? |
Sorry for the long delay. I'm looking at this right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand this and there is one thing I'm not sure about.
The idea I got during SIG was that the intent of this change is to allow A/B testing of recommenders. So I expected to be able to:
- Set two (or more) recommenders that will expose their recommendations (so I can compare those recommendations)
- Choose which recommendation to apply (when I see which one looks better)
Instead this looks to me like I can choose one recommender to provide a recommendation (that I will see and which will be applied).
I think I'm confused about something here.
I'll try to figure this out tomorrow. But if can see what I'm getting wrong I'd appreciate help
Hi, @jbartosik , there are two proposals that allow A/B testing eventually in two steps:
Once this PR is merged, we will move forward to add the second KEP and create a new PR accordingly. |
@wangchen615 Thank you for the explanation. I'm not sure if this is the best way to get an API that allows A/B testing of recommenders. If I understand the proposal in the presentation correctly we'd make a breaking API change (remove I think it would be better to:
I'll think if 1. is feasible and get back to you later this week. Also I want to make sure we don't make extending the API too hard for the future contributors. For example if in the future we might want to allow users to use CPU recommendation from one recommender but memory recommendation from another. I'd like to make it possible without a breaking API change. |
@jbartosik Changing the API from the string of In the next proposal where we support multiple recommenders, we do not need to further change the API but only support more than 1 recommender per VPA object. Supporting CPU and Memory recommendations using different recommenders is a very good new use case to be added in the multiple recommender's support proposal. I can try to add some slides for that as well. However, I believe if the recommender is only updating CPU or memory, the list of recommenders
This field will not be introduced until we merge the next KEP, so it is really the next stage of work. |
@jbartosik , I updated the multiple recommender support slides here: https://docs.google.com/presentation/d/1TtFNsCcEnbKki9xe31E8beNMq0fjm2Ek_-jXiGWst-8/edit?usp=sharing The The @joelsmith , could you also help review? |
Just double-checking if we're not mixing up two things - IIRC we asked for introducing RecommenderNames (string[]) instead of RecommenderName (string) to support multiple recommenders in the future. What @jbartosik asked is that we should align this change with what happens to Recommendations field. When discussing the KEP (#3914) we excluded discussion about multiple recommender / statuses for the time being but with the new proposal you showed (https://docs.google.com/presentation/d/1TtFNsCcEnbKki9xe31E8beNMq0fjm2Ek_-jXiGWst-8/edit?usp=sharing), it's becoming clearer which directions are possible. It would be great if this PR was aligned with the direction chosen for multiple recommenders, that's why we're trying to address the future proposal (slides) at the same time as the PR, to reduce uncertainty and make sure we don't end up with a half-way API (so either move all the changes to v2 or make them all in v1, compatible with current API). Finally: I'm sorry for the too-long a delay before my comments - it's been a mix of very busy times for me but this should be over now. Thank you for your patience. |
@kgolab , yes, that's correct.
Yes, so in the new proposal, we introduce the
|
@jbartosik , I made the changes according to the suggestions in PR#4329 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test that checks that:
- default recommender acts when no recommender is specified
- default recommender acts when it's explicitly configured
- default recommender doesn't act when a different recommender is configured
Also our test dashboards are currently broken, please run e2e tests (recommender and full suites should be enough). Please use K8s older than 1.23 (dashboard are red because of problems in 1.23. (only full
and recommender
tests are broken so there is no need to run other test suites)
EDIT: link to the dashboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments
vertical-pod-autoscaler/pkg/admission-controller/resource/vpa/handler.go
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go
Outdated
Show resolved
Hide resolved
@@ -32,6 +32,9 @@ import ( | |||
"k8s.io/klog" | |||
) | |||
|
|||
// RecommenderName denotes the current recommender name as the "default" one. | |||
const RecommenderName = "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this went back to "RecommenderName" in the meantime, please change it back to DefaultRecommenderName
@@ -38,6 +38,8 @@ const ( | |||
// events (container OOMs). | |||
// All input to the VPA Recommender algorithm lives in this structure. | |||
type ClusterState struct { | |||
// Denote the recommender name | |||
RecommenderName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split this change (adding RecommenderName
field) to a separate PR.
It will make this PR easier to review. It will also make it easier to see how we'd use the field in alternative recommenders
@@ -38,6 +38,8 @@ const ( | |||
// events (container OOMs). | |||
// All input to the VPA Recommender algorithm lives in this structure. | |||
type ClusterState struct { | |||
// Denote the recommender name | |||
RecommenderName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this change (adding RecommenderName
field) to a separate PR.
It will make this PR easier to review. It will also make it easier to see how we'd use RecommenderName field
@jbartosik Just a clarification question about your comment "Please move this change (adding RecommenderName field) to a separate PR. Do you mean for this PR, we do not pass the And in a new PR, we add |
I think in this PR we should only have changes that add support for alternative recommenders to the default recommender. And in the follow up PR changes to allow reusing code in alternative recommenders. So yes to both questions :) |
Please see ff85923 for all e2e tests and unit tests added. |
Thank you. I think I'm missing something important about the problem with e2e tests. I tried running e2e tests locally:
And this worked for me. All tests run as expected. Some have failed [1] but failures are things like pods not updating fast enough [2]. This doesn't sound like the problem you described (I think this is me using too small cluster). I'd like to help. Please share:
[1]
[2]
|
To speed things up. There is one guess I have. If you're trying to add If this is the problem then:
After I check
If this is not the problem you're facing then please give me error messages you see and how you get them and I'll do my best to help. Thank you. |
@jbartosik , thanks so much for your advice. I did not add any tests to v1beta2. That's why I was confusing. Your comment on the cluster size helped a lot and it seemed the cluster I was using had too small nodes to test VPA and all tests eventually failed. Please see the full log here. According to your comment, I created another GKE cluster (1.20.10-gke.301) with one large node (e2-standard-16), and then all tests passed.
More details of full log message can also be seen here. I think we are ready to merge. Please take a final look and see if there are some last changes we want to make. Thanks.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Please squash changes
|
@jbartosik , thanks for the review. Yes, I did the rebase and squash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jbartosik, wangchen615 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This update is to support customized recommender in VPA.
The KEP is here: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/3919-customized-recommender-vpa
resolve #3913
close #3913