-
Notifications
You must be signed in to change notification settings - Fork 22
Add support for a custom service account for hub templates #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for a custom service account for hub templates #236
Conversation
@@ -98,6 +102,9 @@ type PolicySpec struct { | |||
// PolicyDependencies is a list of dependency objects detailed with extra considerations for | |||
// compliance that should be fulfilled before applying the policies to the managed clusters. | |||
Dependencies []PolicyDependency `json:"dependencies,omitempty"` | |||
|
|||
// HubTemplateOptions changes the default behavior of hub templates. | |||
HubTemplateOptions *HubTemplateOptions `json:"hubTemplateOptions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pointer so that the omitempty
has effect rather than returning an empty struct when hubTemplateOptions
is unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create HubTemplateOptions.ServiceAccountName? because it is only one property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need it but I chose it in case we make other things customizable and I thought a hubTemplateServiceAccountName
field was too long lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping more pretty and this is good for future maybe
if errors.Is(tmplErr, ErrRetryable) { | ||
// Return the error if it's retryable, which will utilize controller-runtime's exponential backoff. | ||
return reconcile.Result{}, tmplErr | ||
} else if errors.Is(tmplErr, ErrSAMissing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not because template errors that are not retryable must continue so that the replicated policy has the hub template error annotation set.
I'm not sure what's going on with the test. It's failing every time in CI but working every time locally... |
test/e2e/case9_templates_test.go
Outdated
g.Expect(err).ToNot(HaveOccurred()) | ||
g.Expect(refreshedToken).ToNot(BeNil()) | ||
g.Expect(bytes.Equal(token, refreshedToken)).To(BeFalse(), "Expected the token to have been refreshed") | ||
}, 70, 3).Should(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 70 not 60?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because it could be exactly 60 seconds before the token refreshes, so I gave it a bit of a buffer. I could have made it 65 seconds though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a big deal as long as they change it. I don't think no one complain that it is over 1min
} | ||
|
||
// RemovePolicy will clean up watches on the current template resolver (service account used in the last call to | ||
// GetResolver) and if this was the last policy using this template resolver, the template resolver will be cleaned up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
root policy or replicated policy? It is confusing. Base on referenceCount description it is a replicated policy right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it's a replicated policy. I can rename the method or clarify the description. Any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RemoveRePolicy? I don't know Dale is good at this lol
globalLock sync.RWMutex | ||
// The default template resolver when a replicated policy does not specify a service account. | ||
defaultTemplateResolver *templates.TemplateResolver | ||
// A map of service acconts to template resolvers. Access to this must leverage globalLock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// A map of service acconts to template resolvers. Access to this must leverage globalLock. | |
// A map of service acconts to templateResolverWithCancel. Access to this must leverage globalLock. |
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or template resolver
single
This is amazing! Yare are the best expert Go concurrency |
/hold for reviews |
@@ -408,7 +432,5 @@ func (r *ReplicatedPolicyReconciler) processTemplates( | |||
} | |||
} | |||
|
|||
log.V(1).Info("Successfully processed templates") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log message is removed because processTemplates
is now always called and when no templates are present, it does nothing. This is to clean up watches if hub templates used to be present but are now removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems about right 😅
var returnErr error | ||
|
||
// Retry template errors due to permission issues. This isn't ideal, but there's no good event driven way to | ||
// be notified when the permissions are given to the service account. | ||
if k8serrors.IsForbidden(tmplErr) { | ||
returnErr = tmplErr | ||
} | ||
|
||
if watcherErr != nil { | ||
log.Info("Requeueing for the dynamic watcher error") | ||
|
||
returnErr = watcherErr | ||
} | ||
|
||
return reconcile.Result{}, watcherErr | ||
return reconcile.Result{}, returnErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if customizing the retry interval for Forbidden errors might be nice. So maybe something like this:
if watcherErr != nil {
log.Info("Requeueing for the dynamic watcher error")
return reconcile.Result{}, watcherErr
}
// Retry template errors due to permission issues. This isn't ideal, but there's no good event driven way to
// be notified when the permissions are given to the service account.
if k8serrors.IsForbidden(tmplErr) {
log.Info("Requeueing in 2 minutes for the permission issue")
// returning the error would override the RequeueAfter setting
return reconcile.Result{RequeueAfter: 2 * time.Minute}, nil
}
return reconicile.Result{}, nil
// the template resolver will be cleaned up. A pointer is used since the reference count can be copied to a new | ||
// templateResolverWithCancel instance if the TemplateResolver instance was unexpectedly canceled and was | ||
// restarted. | ||
referenceCount *atomic.Uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't shake the feeling that us doing our own reference counting means we're missing some other way to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other way I could think of that would work other than periodically checking to see if any root policy still references this service account was to have a map of service accounts to replicated policies. I thought the reference count was better to avoid locking around a shared map. It also saves a bit of memory.
Another way that came to mind was using a read and write lock for each template resolver and all replicated policies have a read lock and then a separate goroutine waits for a write lock, which would indicate there are no more readers left to then clean up the template resolver. The problem is that the read and write lock in the Go standard library doesn't work like that as it prioritizes giving the write lock when it can, which makes sense.
A new field of spec.hubTemplateOptions.serviceAccountName can be set to leverage a service account in the root policy namespace to resolve hub templates instead of the restricted hub templating of referencing only objects in the root policy namespace. This leverages the TokenRequest API to get short-lived tokens of the service accounts and periodically refreshes them before expiration. Relates: https://issues.redhat.com/browse/ACM-12790 Signed-off-by: mprahl <[email protected]>
This was covering up an error during test development. Signed-off-by: mprahl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JustinKuli, mprahl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
fe30aeb
into
open-cluster-management-io:main
A new field of spec.hubTemplateOptions.serviceAccountName can be set to leverage a service account in the root policy namespace to resolve hub templates instead of the restricted hub templating of referencing only objects in the root policy namespace.
This leverages the TokenRequest API to get short-lived tokens of the service accounts and periodically refreshes them before expiration.
Relates:
https://issues.redhat.com/browse/ACM-12790