Skip to content

Added changes to support alternative recommender #4131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 16, 2021
Merged

Added changes to support alternative recommender #4131

merged 1 commit into from
Nov 16, 2021

Conversation

wangchen615
Copy link
Contributor

This update is to support customized recommender in VPA.
The KEP is here: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/3919-customized-recommender-vpa

resolve #3913
close #3913

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 10, 2021
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 10, 2021
@bskiba
Copy link
Member

bskiba commented Jun 10, 2021

@kgolab for visibility

@@ -38,6 +38,8 @@ const (
// events (container OOMs).
// All input to the VPA Recommender algorithm lives in this structure.
type ClusterState struct {
// Denote the recommender name
RecommenderName string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented above I don't yet understand why we need to store this dynamically in the default recommender.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgolab, as I mentioned, I hope it is easier to reuse in other alternative recommenders' codes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this change (adding RecommenderName field) to a separate PR.

It will make this PR easier to review. It will also make it easier to see how we'd use the field in alternative recommenders

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this change (adding RecommenderName field) to a separate PR.

It will make this PR easier to review. It will also make it easier to see how we'd use RecommenderName field

@kgolab
Copy link
Collaborator

kgolab commented Jun 18, 2021

Thank you for the PR.

I'd like to understand if we really need to propagate recommenderName throughout the code of default VPA:

  • if we don't the change could be simplified to yaml files, types.go and (parts of) cluster_feeder.go,
  • if we do: could please explain why (so likely how you plan to use the code)?

Looking at the KEP I think we're missing an update in validateVPA. Or have you already decided to support multiple recommenders (changes in cluster_feeder.go might suggest so)?

@wangchen615
Copy link
Contributor Author

@kgolab Thanks so much for your review.

  • As I mentioned above, keeping recommenderName will make the default recommender code easier to be reused when others developed their own alternative recommenders.
    • Some functions like loadVPAs in reconcilers are definitely reusable.
    • However, other recommenders may rely on completely different metric collectors, such as Prometheus, and run different logics to come up with the target, lower bound and upper bound for requests.
    • For example, I am writing a customized recommender that will reuse the ClusterStateFeederFactory but rewrite the metric client.
    • At the same time, we provide support for the controller-level alternative recommender, so developers also have the freedom to develop their own recommender from scratch.

Thank you for the PR.

I'd like to understand if we really need to propagate recommenderName throughout the code of default VPA:

  • if we don't the change could be simplified to yaml files, types.go and (parts of) cluster_feeder.go,
  • if we do: could please explain why (so likely how you plan to use the code)?

Looking at the KEP I think we're missing an update in validateVPA. Or have you already decided to support multiple recommenders (changes in cluster_feeder.go might suggest so)?

Yes, sorry for missing the validateVPA. I added back.

And yes, we want to support multiple VPA recommenders in the future. However, we may need a new KEP on how to cache the RecommendedPodResources for different recommenders in VPAStatus and how to select one recommendation among multiple recommenders when evicting and admitting the pod. We could initiate discussions once this first step PR is merged.

Copy link
Member

@bskiba bskiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments on my side. @kgolab waiting for your input on the current approach as explained by Chen

@@ -32,6 +32,9 @@ import (
"k8s.io/klog"
)

// RecommenderName denotes the current recommender name as the "default" one.
const RecommenderName = "default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about DefaultRecommenderName

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this went back to "RecommenderName" in the meantime, please change it back to DefaultRecommenderName

@Jeffwan
Copy link
Member

Jeffwan commented Jul 1, 2021

@wangchen615 It would be great to squash your commits. This repo doesn't squash it automatically.

@wangchen615
Copy link
Contributor Author

wangchen615 commented Aug 30, 2021

@mwielgus, @schylek, @kgolab, @jbartosik, @krzysied, this PR had no new comments for almost 2 months and all comments have been resolved. Could you help approve this?

@jbartosik
Copy link
Collaborator

@mwielgus, @schylek, @kgolab, @jbartosik, @krzysied, this PR had no new comments for almost 2 months and all comments have been resolved. Could you help approve this?

Sorry for the long delay. I'm looking at this right now

Copy link
Collaborator

@jbartosik jbartosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand this and there is one thing I'm not sure about.

The idea I got during SIG was that the intent of this change is to allow A/B testing of recommenders. So I expected to be able to:

  • Set two (or more) recommenders that will expose their recommendations (so I can compare those recommendations)
  • Choose which recommendation to apply (when I see which one looks better)

Instead this looks to me like I can choose one recommender to provide a recommendation (that I will see and which will be applied).

I think I'm confused about something here.

I'll try to figure this out tomorrow. But if can see what I'm getting wrong I'd appreciate help

@wangchen615
Copy link
Contributor Author

I'm trying to understand this and there is one thing I'm not sure about.

The idea I got during SIG was that the intent of this change is to allow A/B testing of recommenders. So I expected to be able to:

  • Set two (or more) recommenders that will expose their recommendations (so I can compare those recommendations)
  • Choose which recommendation to apply (when I see which one looks better)

Instead this looks to me like I can choose one recommender to provide a recommendation (that I will see and which will be applied).

I think I'm confused about something here.

I'll try to figure this out tomorrow. But if can see what I'm getting wrong I'd appreciate help

Hi, @jbartosik , there are two proposals that allow A/B testing eventually in two steps:

  1. Support using one recommender for one VPA object, but the recommender can use a customized one. This is the initial proposal Support Customized Recommenders for Vertical Pod Autoscalers
  2. Support multiple recommenders for a single VPA object. The proposal slides are here.

Once this PR is merged, we will move forward to add the second KEP and create a new PR accordingly.

@jbartosik
Copy link
Collaborator

@wangchen615 Thank you for the explanation.

I'm not sure if this is the best way to get an API that allows A/B testing of recommenders.

If I understand the proposal in the presentation correctly we'd make a breaking API change (remove VerticalPodAutoscalerStatus.Recommendation field and replace it with VerticalPodAutoscalerStatus.Recommendations). This means we'll need to introduce a new API version. If we do that then v1 API will be a bit weird. It will have a repeated VerticalPodAutoscalerSpec.RecommenderNames field but only one value will matter.

I think it would be better to:

  1. Allow A/B testing without breaking API changes, or
  2. If that's not feasible then in the v1 API support only one recommender in a clearer way. Add a string field VerticalPodAutoscalerSpec.RecommenderName (instead of RecommenderNames []string)

I'll think if 1. is feasible and get back to you later this week.

Also I want to make sure we don't make extending the API too hard for the future contributors. For example if in the future we might want to allow users to use CPU recommendation from one recommender but memory recommendation from another. I'd like to make it possible without a breaking API change.

@wangchen615
Copy link
Contributor Author

wangchen615 commented Sep 1, 2021

@jbartosik Changing the API from the string of VerticalPodAutoscalerStatus.Recommendation to the list of VerticalPodAutoscalerStatus.Recommendations was exactly what was requested by @bskiba and @kgolab. I believe the code now is supporting a list of VerticalPodAutoscalerStatus.Recommendations as the API but only one recommender for this PR.

In the next proposal where we support multiple recommenders, we do not need to further change the API but only support more than 1 recommender per VPA object.

Supporting CPU and Memory recommendations using different recommenders is a very good new use case to be added in the multiple recommender's support proposal. I can try to add some slides for that as well. However, I believe if the recommender is only updating CPU or memory, the list of recommenders VerticalPodAutoscalerStatus.Recommendations in the API will work. But we need to extend the following field to a map for multiple recommenders' cases.

  // Denote the name of recommender to be in use.
  SelectedRecommender string `json:"selectedRecommender,omitempty" protobuf:"bytes,5,opt,name=selectedRecommender"`

This field will not be introduced until we merge the next KEP, so it is really the next stage of work.

@wangchen615
Copy link
Contributor Author

wangchen615 commented Sep 6, 2021

@jbartosik , I updated the multiple recommender support slides here: https://docs.google.com/presentation/d/1TtFNsCcEnbKki9xe31E8beNMq0fjm2Ek_-jXiGWst-8/edit?usp=sharing

The AllRecommendations field is introduced in VerticalPodAutoscalerStatus so we do not need to change the existing Recommendation field but can track multiple recommenders' recommendations.

The SelectedRecommender is introduced in VerticalPodAutoscalerSpec so users can denote which recommender's recommendation is selected. If it is not specified, it will either use the default recommender's recommendation or if default not exists, it will use the first recommender specified in the list of RecommenderNames.

@joelsmith , could you also help review?

@kgolab
Copy link
Collaborator

kgolab commented Sep 7, 2021

@jbartosik Changing the API from the string of VerticalPodAutoscalerStatus.Recommendation to the list of VerticalPodAutoscalerStatus.Recommendations was exactly what was requested by @bskiba and @kgolab. I believe the code now is supporting a list of VerticalPodAutoscalerStatus.Recommendations as the API but only one recommender for this PR.

Just double-checking if we're not mixing up two things - IIRC we asked for introducing RecommenderNames (string[]) instead of RecommenderName (string) to support multiple recommenders in the future.

What @jbartosik asked is that we should align this change with what happens to Recommendations field.

When discussing the KEP (#3914) we excluded discussion about multiple recommender / statuses for the time being but with the new proposal you showed (https://docs.google.com/presentation/d/1TtFNsCcEnbKki9xe31E8beNMq0fjm2Ek_-jXiGWst-8/edit?usp=sharing), it's becoming clearer which directions are possible.

It would be great if this PR was aligned with the direction chosen for multiple recommenders, that's why we're trying to address the future proposal (slides) at the same time as the PR, to reduce uncertainty and make sure we don't end up with a half-way API (so either move all the changes to v2 or make them all in v1, compatible with current API).

Finally: I'm sorry for the too-long a delay before my comments - it's been a mix of very busy times for me but this should be over now. Thank you for your patience.

@wangchen615
Copy link
Contributor Author

@kgolab , yes, that's correct.

Just double-checking if we're not mixing up two things - IIRC we asked for introducing RecommenderNames (string[]) instead of RecommenderName (string) to support multiple recommenders in the future.

Yes, so in the new proposal, we introduce the VerticalPodAutoscalerStatus.AllRecommendations field and VerticalPodAutoscalerSpec.SelectedRecommender field. So VerticalPodAutoscalerStatus.Recommendation will keep the same and will only store the recommendations from VerticalPodAutoscalerSpec.SelectedRecommender. Please refer to the slides. In this way, we can make this PR aligned with future changes.

It would be great if this PR was aligned with the direction chosen for multiple recommenders, that's why we're trying to address the future proposal (slides) at the same time as the PR, to reduce uncertainty and make sure we don't end up with a half-way API (so either move all the changes to v2 or make them all in v1, compatible with current API).

@wangchen615
Copy link
Contributor Author

@jbartosik , I made the changes according to the suggestions in PR#4329

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2021
Copy link
Collaborator

@jbartosik jbartosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test that checks that:

  1. default recommender acts when no recommender is specified
  2. default recommender acts when it's explicitly configured
  3. default recommender doesn't act when a different recommender is configured

Also our test dashboards are currently broken, please run e2e tests (recommender and full suites should be enough). Please use K8s older than 1.23 (dashboard are red because of problems in 1.23. (only full and recommender tests are broken so there is no need to run other test suites)

EDIT: link to the dashboard

Copy link
Collaborator

@jbartosik jbartosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments

@@ -32,6 +32,9 @@ import (
"k8s.io/klog"
)

// RecommenderName denotes the current recommender name as the "default" one.
const RecommenderName = "default"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this went back to "RecommenderName" in the meantime, please change it back to DefaultRecommenderName

@@ -38,6 +38,8 @@ const (
// events (container OOMs).
// All input to the VPA Recommender algorithm lives in this structure.
type ClusterState struct {
// Denote the recommender name
RecommenderName string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this change (adding RecommenderName field) to a separate PR.

It will make this PR easier to review. It will also make it easier to see how we'd use the field in alternative recommenders

@@ -38,6 +38,8 @@ const (
// events (container OOMs).
// All input to the VPA Recommender algorithm lives in this structure.
type ClusterState struct {
// Denote the recommender name
RecommenderName string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this change (adding RecommenderName field) to a separate PR.

It will make this PR easier to review. It will also make it easier to see how we'd use RecommenderName field

@wangchen615
Copy link
Contributor Author

@jbartosik Just a clarification question about your comment "Please move this change (adding RecommenderName field) to a separate PR.
It will make this PR easier to review. It will also make it easier to see how we'd use the RecommenderName field".

Do you mean for this PR, we do not pass the RecommenderName through ClusterState struct and only checks if DefaultRecommenderName exists or if VPA objects choose other recommender names?

And in a new PR, we add RecommenderName in ClusterState structs so cluster.go code in default recommender can be reused by others who develop their own recommenders?

@jbartosik
Copy link
Collaborator

@jbartosik Just a clarification question about your comment "Please move this change (adding RecommenderName field) to a separate PR. It will make this PR easier to review. It will also make it easier to see how we'd use the RecommenderName field".

Do you mean for this PR, we do not pass the RecommenderName through ClusterState struct and only checks if DefaultRecommenderName exists or if VPA objects choose other recommender names?

And in a new PR, we add RecommenderName in ClusterState structs so cluster.go code in default recommender can be reused by others who develop their own recommenders?

I think in this PR we should only have changes that add support for alternative recommenders to the default recommender. And in the follow up PR changes to allow reusing code in alternative recommenders.

So yes to both questions :)

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 24, 2021
@wangchen615
Copy link
Contributor Author

Please add a test that checks that:

  1. default recommender acts when no recommender is specified
  2. default recommender acts when it's explicitly configured
  3. default recommender doesn't act when a different recommender is configured

Also our test dashboards are currently broken, please run e2e tests (recommender and full suites should be enough). Please use K8s older than 1.23 (dashboard are red because of problems in 1.23. (only full and recommender tests are broken so there is no need to run other test suites)

EDIT: link to the dashboard

Please see ff85923 for all e2e tests and unit tests added.

@jbartosik
Copy link
Collaborator

@wangchen615

Thank you.

I think I'm missing something important about the problem with e2e tests.

I tried running e2e tests locally:

  • I checked out 1c7fa88 locally,
  • I created a cluster (on GKE, K8s 1.20.10-gke.301, no VPA from GKE so I could setup mine)
  • I deployed VPA in the cluster ./vertical-pod-autoscaler/hack/deploy-for-e2e.sh full-vpa
  • I ran e2e tests (in k8s.io/autoscaler) ./vertical-pod-autoscaler/hack/run-e2e.sh full-vpa

And this worked for me. All tests run as expected. Some have failed [1] but failures are things like pods not updating fast enough [2]. This doesn't sound like the problem you described (I think this is me using too small cluster).

I'd like to help. Please share:

  • Errors you see
  • How you run the test.

[1]

$ grep \"msg\" 2021-10-25.alt-recommender-e2e.out 
{"msg":"Test Suite starting","total":3,"completed":0,"skipped":0,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1beta2] OOMing pods under VPA have memory requests growing with OOMs","total":3,"completed":1,"skipped":5,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1beta2] Pods under VPA have memory requests growing with usage","total":3,"completed":2,"skipped":8,"failed":0}
{"msg":"FAILED [sig-autoscaling] [VPA] [full-vpa] [v1beta2] Pods under VPA have cpu requests growing with usage","total":3,"completed":2,"skipped":22,"failed":1,"failures":["[sig-autoscaling] [VPA] [full-vpa] [v1beta2] Pods under VPA have cpu requests growing with usage"]}
{"msg":"Test Suite completed","total":3,"completed":2,"skipped":38,"failed":1,"failures":["[sig-autoscaling] [VPA] [full-vpa] [v1beta2] Pods under VPA have cpu requests growing with usage"]}
{"msg":"Test Suite starting","total":5,"completed":0,"skipped":0,"failed":0}
{"msg":"FAILED [sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured have cpu requests growing with usage","total":5,"completed":0,"skipped":6,"failed":1,"failures":["[sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured have cpu requests growing with usage"]}
{"msg":"FAILED [sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA have memory requests growing with OOMs","total":5,"completed":0,"skipped":13,"failed":2,"failures":["[sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured have cpu requests growing with usage","[sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA have memory requests growing with OOMs"]}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA have memory requests growing with usage","total":5,"completed":1,"skipped":18,"failed":2,"failures":["[sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured have cpu requests growing with usage","[sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA have memory requests growing with OOMs"]}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with non-recognized recommender explicitly configured deployment not updated by non-recognized recommender","total":5,"completed":2,"skipped":19,"failed":2,"failures":["[sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured have cpu requests growing with usage","[sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA have memory requests growing with OOMs"]}

[2]

ESC[91mESC[1m<E2><80><A2> Failure [959.135 seconds]ESC[0m
[sig-autoscaling] [VPA] [full-vpa] [v1beta2] Pods under VPA
ESC[90m/usr/local/google/home/jbartosik/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1beta2/common.go:72ESC[0m
  ESC[91mESC[1mhave cpu requests growing with usage [It]ESC[0m
  ESC[90m/usr/local/google/home/jbartosik/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1beta2/full_vpa.go:94ESC[0m

  ESC[91mOct 25 22:39:54.558: Unexpected error:
      <*errors.errorString | 0xc0005476a0>: {
          s: "error waiting for cpu request in range of ({{500 -3} {<nil>} 500m DecimalSI},{{1000 -3} {<nil>}  DecimalSI}) for pods: {TypeMeta:{Kind: APIVersion:} LabelSelector:name=hamster FieldSelector: Watch:false AllowWatchBookmarks:false ResourceVersion: ResourceVersionMatch: TimeoutSeconds:<nil> Limit:0 Continue:}",
      }
      error waiting for cpu request in range of ({{500 -3} {<nil>} 500m DecimalSI},{{1000 -3} {<nil>}  DecimalSI}) for pods: {TypeMeta:{Kind: APIVersion:} LabelSelector:name=hamster FieldSelector: Watch:false AllowWatchBookmarks:false ResourceVersion: ResourceVersionMatch: TimeoutSeconds:<nil> Limit:0 Continue:}
  occurredESC[0m
[sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured
ESC[90m/usr/local/google/home/jbartosik/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1/common.go:72ESC[0m
  ESC[91mESC[1mhave cpu requests growing with usage [It]ESC[0m
  ESC[90m/usr/local/google/home/jbartosik/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1/full_vpa.go:172ESC[0m

  ESC[91mOct 25 22:55:58.782: Unexpected error:
      <*errors.errorString | 0xc0007799a0>: {
          s: "error waiting for cpu request in range of ({{500 -3} {<nil>} 500m DecimalSI},{{1000 -3} {<nil>}  DecimalSI}) for pods: {TypeMeta:{Kind: APIVersion:} LabelSelector:name=hamster FieldSelector: Watch:false AllowWatchBookmarks:false ResourceVersion: ResourceVersionMatch: TimeoutSeconds:<nil> Limit:0 Continue:}",
      }
      error waiting for cpu request in range of ({{500 -3} {<nil>} 500m DecimalSI},{{1000 -3} {<nil>}  DecimalSI}) for pods: {TypeMeta:{Kind: APIVersion:} LabelSelector:name=hamster FieldSelector: Watch:false AllowWatchBookmarks:false ResourceVersion: ResourceVersionMatch: TimeoutSeconds:<nil> Limit:0 Continue:}
  occurredESC[0m
[sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA
ESC[90m/usr/local/google/home/jbartosik/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1/common.go:72ESC[0m
  ESC[91mESC[1mhave memory requests growing with OOMs [It]ESC[0m
  ESC[90m/usr/local/google/home/jbartosik/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1/full_vpa.go:283ESC[0m

  ESC[91mOct 25 23:04:47.713: Unexpected error:
      <*errors.errorString | 0xc000cb8480>: {
          s: "error waiting for memory request in range of ({{1468006400 0} {<nil>}  BinarySI},{{10485760000 0} {<nil>}  BinarySI}) for pods: {TypeMeta:{Kind: APIVersion:} LabelSelector:name=hamster FieldSelector:status.phase!=Failed,status.phase!=Succeeded Watch:false AllowWatchBookmarks:false ResourceVersion: ResourceVersionMatch: TimeoutSeconds:<nil> Limit:0 Continue:}",
      }
      error waiting for memory request in range of ({{1468006400 0} {<nil>}  BinarySI},{{10485760000 0} {<nil>}  BinarySI}) for pods: {TypeMeta:{Kind: APIVersion:} LabelSelector:name=hamster FieldSelector:status.phase!=Failed,status.phase!=Succeeded Watch:false AllowWatchBookmarks:false ResourceVersion: ResourceVersionMatch: TimeoutSeconds:<nil> Limit:0 Continue:}
  occurredESC[0m

  /usr/local/google/home/jbartosik/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1/full_vpa.go:291
ESC[90m------------------------------ESC[0m
{"msg":"FAILED [sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA have memory requests growing with OOMs","total":5,"completed":0,"skipped":13,"failed":2,"failures":["[sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured have cpu requests growing with usage","[sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA have memory requests growing with OOMs"]}

@jbartosik
Copy link
Collaborator

@wangchen615

To speed things up. There is one guess I have. If you're trying to add Pods under VPA with default recommender explicitly configured have cpu requests growing with usage and Pods under VPA with non-recognized recommender explicitly configured deployment not updated by non-recognized recommender tests for API v1beta2 then I don't think they will work. This PR only extends v1 API.

If this is the problem then:

  • I think we shouldn't change v1beta2 API but I'll check
  • So please let me know so I know that I should check,
  • If I'm correct about this then we can submit the PR with tests only for the v1 API
    • So for now please don't add those tests for v1beta2 API.

After I check

  • If I'm correct about us not needing to change v1beta2 API then I'll review new changes and this PR should be ready to submit soon.
  • If I'm wrong then we'll need to change v1beta2 API too.

If this is not the problem you're facing then please give me error messages you see and how you get them and I'll do my best to help.

Thank you.

@wangchen615
Copy link
Contributor Author

wangchen615 commented Nov 1, 2021

@jbartosik , thanks so much for your advice. I did not add any tests to v1beta2. That's why I was confusing.

Your comment on the cluster size helped a lot and it seemed the cluster I was using had too small nodes to test VPA and all tests eventually failed. Please see the full log here.

According to your comment, I created another GKE cluster (1.20.10-gke.301) with one large node (e2-standard-16), and then all tests passed.

grep \"msg\" alt_recommender_e2e.log
{"msg":"Test Suite starting","total":3,"completed":0,"skipped":0,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1beta2] Pods under VPA have memory requests growing with usage","total":3,"completed":1,"skipped":4,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1beta2] Pods under VPA have cpu requests growing with usage","total":3,"completed":2,"skipped":23,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1beta2] OOMing pods under VPA have memory requests growing with OOMs","total":3,"completed":3,"skipped":35,"failed":0}
{"msg":"Test Suite completed","total":3,"completed":3,"skipped":38,"failed":0}
{"msg":"Test Suite starting","total":5,"completed":0,"skipped":0,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA have cpu requests growing with usage","total":5,"completed":1,"skipped":13,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with default recommender explicitly configured have cpu requests growing with usage","total":5,"completed":2,"skipped":26,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA with non-recognized recommender explicitly configured deployment not updated by non-recognized recommender","total":5,"completed":3,"skipped":27,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1] OOMing pods under VPA have memory requests growing with OOMs","total":5,"completed":4,"skipped":28,"failed":0}
{"msg":"PASSED [sig-autoscaling] [VPA] [full-vpa] [v1] Pods under VPA have memory requests growing with usage","total":5,"completed":5,"skipped":36,"failed":0}
{"msg":"Test Suite completed","total":5,"completed":5,"skipped":39,"failed":0}

More details of full log message can also be seen here.

I think we are ready to merge. Please take a final look and see if there are some last changes we want to make. Thanks.

@wangchen615

To speed things up. There is one guess I have. If you're trying to add Pods under VPA with default recommender explicitly configured have cpu requests growing with usage and Pods under VPA with non-recognized recommender explicitly configured deployment not updated by non-recognized recommender tests for API v1beta2 then I don't think they will work. This PR only extends v1 API.

If this is the problem then:

  • I think we shouldn't change v1beta2 API but I'll check

  • So please let me know so I know that I should check,

  • If I'm correct about this then we can submit the PR with tests only for the v1 API

    • So for now please don't add those tests for v1beta2 API.

After I check

  • If I'm correct about us not needing to change v1beta2 API then I'll review new changes and this PR should be ready to submit soon.
  • If I'm wrong then we'll need to change v1beta2 API too.

If this is not the problem you're facing then please give me error messages you see and how you get them and I'll do my best to help.

Thank you.

@jbartosik jbartosik mentioned this pull request Nov 10, 2021
Copy link
Collaborator

@jbartosik jbartosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Please squash changes

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2021
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 15, 2021

CLA Signed

The committers are authorized under a signed CLA.

@wangchen615
Copy link
Contributor Author

Looks good to me. Please squash changes

@jbartosik , thanks for the review. Yes, I did the rebase and squash.
The EasyCLA issue was due to this issue: kubernetes/org#3068
It should be automatically merged if you do /lgtm and /approve

Copy link
Collaborator

@jbartosik jbartosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jbartosik, wangchen615

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need support for customized recommender in Vertical Pod Autoscaler (VPA)
6 participants