Skip to content
This repository was archived by the owner on Apr 17, 2025. It is now read-only.

Don't throw away errors from ShouldPropagate #82

Closed
adrianludwin opened this issue Sep 21, 2021 · 9 comments
Closed

Don't throw away errors from ShouldPropagate #82

adrianludwin opened this issue Sep 21, 2021 · 9 comments
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@adrianludwin
Copy link
Contributor

adrianludwin commented Sep 21, 2021

See comment. We're discarding errors but we shouldn't be; they should be propagated upwards like any other error or at least logged as an error (Update Sep 25 - we should log them, see discussion below).

/cc @joe2far

@adrianludwin
Copy link
Contributor Author

/good-first-issue

@k8s-ci-robot
Copy link
Contributor

@adrianludwin:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Sep 21, 2021
@yashvardhan-kukreja
Copy link

/assign

@yashvardhan-kukreja
Copy link

Hi @adrianludwin 👋🏼
At the time of error, should I proceed towards just logging the error and letting the loop iterate further, or should I actually, break the loop right away by returning the error hence, making it propagate upwards in the call stack?

I prefer the latter one but as I am new, I might be missing something so I'd love to hear your suggestion :)

@adrianludwin
Copy link
Contributor Author

Ok I had a look and I can see why we're not propagating these up. The scenario where you could get an error is:

  • Someone puts a bad selector on an object. This should never happen, but let's say there was a bug and one got in.
    • This selector is basically now ignored
  • Someone tries to validate another change that might affect this object (or one of its propagated copies). We call ShouldPropagate to see if there's a conflict, but instead we get the error due to the actions of the earlier editor.

In this case, we should not propagate the error back upwards, as it would effectively block everyone else, without giving them any way to resolve the issue. This seems suboptimal.

Instead, I'd vote for logging the error, e.g. something like this:

log.Error(err, "Invalid selector")

That way, if the admin is monitoring errors, they'll get a flood of them whenever we might be making the wrong choice - which is exactly what we want. The log object isn't currently passed down to the places that call ShouldPropagate but this should be easy to solve.

The one downside here is that it looks like the name of the object that has the bad selector isn't going to be identified. We could add it individually to each of the places where we log the error, but IMO it's better to add this information to the error itself when they're first generated, i.e. in GetSelector, GetTreeSelector and GetNoneSelector. Each of these functions (which are the only ones that can cause an error in ShouldPropagate) know the object that's causing the error, so IMO that's the right place to report it.

To take the simplest example of GetNoneSelector, I'd change

		return false,
			fmt.Errorf("invalid %s value: %w",
				api.AnnotationNoneSelector,
				err,
			)

with something like this:

		return false,fmt.Errorf("invalid %s value on %s/%s (%s): %w",
			api.AnnotationNoneSelector, inst.GetNamespace(), inst.GetName(), inst.GroupVersionKind(), err)

Which will give a string like invalid propagate.hnc.x-k8s.io/none value on foo-ns/bar (Secret): <lower level error>. In GetTreeSelector the error can be returned from many different places so you'll probably want to precreate the string "%s/%s (%s) but that's pretty easy to do.

I wouldn't worry about testing this too much, since we're just talking about improving an error message and logging messages that got away. If you like, you could create a bad selector on an object by disabling all the webhooks, making a bad change, and then re-enabling the webhooks and checking the logs, but it's not strictly necessary.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants