-
Notifications
You must be signed in to change notification settings - Fork 276
feat(cert): use intent to set issuers during root cert rotation #5201
Conversation
I was able to rotate a new certificate in and remove the old one 🚀 |
one note:
maybe we can make that
|
41a84d7
to
482534d
Compare
After playing with it we landed on the following pattern:
|
name: State | ||
- description: Intent for the MeshRootCertificate | ||
jsonPath: .spec.intent | ||
name: Intent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I have been playing around with the MRC for the e2e tests and been thinking this field name might be better as Role
. for example:
"Moving MRC to Passive intent in cluster\n" compared to "Moving MRC [%s] -> Passive role cluster"
"The MRC is in the Active role in the cluster" compared to "The MRC is in the Active intent in the cluster"
Thoughts? @jaellio @keithmattix @steeling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with that; this is the time for naming changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya that sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! I will make the update in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Role makes sense since we're ultimately representing what the certificate can and cannot do.
d845887
to
81d872d
Compare
The cert rotation tests are passing:
|
@@ -490,7 +450,7 @@ func buildMeshRootCertificate(presetMeshRootCertificateConfigMap *corev1.ConfigM | |||
APIVersion: "config.openservicemesh.io/configv1alpha2", | |||
}, | |||
ObjectMeta: metav1.ObjectMeta{ | |||
Name: meshRootCertificateName, | |||
Name: constants.DefaultMeshRootCertificateName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that the default mesh root certificate always exists. but which essentially prevents users from deleting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a change in behavior for this PR, this was a local constant (see line 52 above) and I moved it to the constant so it would be in one place to be used across tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add a TODO at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steeling Could you explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// PassiveIntent means the settings and certificate provider in this MRC are used for validating | ||
// certificates. | ||
PassiveIntent MeshRootCertificateIntent = "passive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"passive means its ued for validating", maybe it should be called validateOnly?
Non-blocking comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like passive but if there is strong agreement here, we should switch active
to something like signing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ValidateOnly and Signing make understanding the code a bit more straightforward for newcomers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially leaning this way as well, but in that case does signing mean "signingAndValidating" since its doing both? The ambiguity and verbosity felt to much. For this reason, after sitting on it for awhile and trying out several rotations I began to like passive
and active
@jaellio you presented this at the servicemeshcon did you hear any feedback in either direction?
|
||
var ( | ||
validity = time.Hour | ||
osmConfigClient "github.com/openservicemesh/osm/pkg/gen/client/config/clientset/versioned" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imported/aliased package names should not contain uppercases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Is this in the uber style guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a linter that enforced this but I found this which has a little dialog https://stackoverflow.com/a/20864492/697126
pkg/certificate/fake_manager.go
Outdated
) | ||
|
||
// fakeMRCClient implements the MRCClient interface | ||
type fakeMRCClient struct{} | ||
type fakeMRCClient struct { | ||
configClient osmConfigClient.Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're adding this here, we may as well remove this altogether and use the MRCComposer
, which has an interface for the compute client. We can leverage a fake or mock compute client for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fakeMRCClient in this pkg exists for use in tests to avoid a circular dependency with pkg/certificate/providers. Since the MRCComposer is also defined in pkg/certificate/providers this would result in a circular dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's likely indicative of a separate issue then. maybe we should bring MRCComposer into the certificate package, or into it's own subpackage so it can be used in both packages independently
type UseCase string | ||
func (m *Manager) handleMRCEvent(event MRCEvent) error { | ||
log.Debug().Msgf("handling MRC event for MRC %s", event.MRCName) | ||
// TODO(#5226): optimize event handling to reduce cost of listing all MRCs for each event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we could simply keep the MRC on the issuer struct, and then we have the 2 mrc's we care about. This seems like a simple enough change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I won't block, but I'll point it that it looks like we could do a drop in replacement into this code:
...
desiredSigning, desiredValidating := m.getSigningAndValidating(event.MRC)
// note: no need to filter out inactive anymore, or call ListMeshRootCertificates
...
func getSigningAndValidating(newMRC MRC) {
mrcs := make(map[string]MRC)
mrcs[m.signingIssuer.ID] = m.signingIssuer.mrc
mrcs[m.validatingIssuer.ID] = m.validatingIssuer.mrc
mrcs[newMRC.Name] = newMRC
if len(mrcs) > 2 {.../*err*/ }
... // continue as normal
}
I think using a map like this would also clean up some of the logic in that method. But again, non-blocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of a case where this gets tricky is when a controller restarts. In that case the existing manager won't have any sign/validating issuers set. By approaching it this way we are handling the events in a trigger level fashion.
This means that on the first event that comes through the issuer/validator would be the same but this might not be the case, there could be more than one Active/Passive. In this case there are some conditions where the controller could start to re-issue certificates since the ShouldRotate
function would see this as a change. Once more events came through it would be normalize but this could be a spot where we have strange behavior intermittently.
Using a custom indexer (#5226) that only contains active MRCs we could be more level triggered and solve this problem of having to filter and list all the MRCS.
Maybe we could combine the two options as well, which would allow us to exit earlier in the case things are set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds simpler then to call list exactly one time at startup, then use the events for the remainder of the lifetime of the manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking at this point
pkg/certificate/mrc_reconciler.go
Outdated
transitionAfter := mrc.Status.TransitionAfter | ||
if transitionAfter == nil { | ||
return false | ||
if mrc1 == nil || mrc2 == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: i don't think this possible, but defense in depth is not necessarily a bad thing.
I'd consider a validateMRC function, which has this check, and others like ValidateIntent. This seems like it would be useful in general, and would allow greater re-use in ie: a validating webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this logic could be combined in a validateMRC method. This would help reduce duplicate code between the validating webhook for the mrc and the mrc reconciler. I think it makes sense to update this in #5220 when it more clear what the commonalities are for mrc validation in the webhook and the reconciler.
type UseCase string | ||
func (m *Manager) handleMRCEvent(event MRCEvent) error { | ||
log.Debug().Msgf("handling MRC event for MRC %s", event.MRCName) | ||
// TODO(#5226): optimize event handling to reduce cost of listing all MRCs for each event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I won't block, but I'll point it that it looks like we could do a drop in replacement into this code:
...
desiredSigning, desiredValidating := m.getSigningAndValidating(event.MRC)
// note: no need to filter out inactive anymore, or call ListMeshRootCertificates
...
func getSigningAndValidating(newMRC MRC) {
mrcs := make(map[string]MRC)
mrcs[m.signingIssuer.ID] = m.signingIssuer.mrc
mrcs[m.validatingIssuer.ID] = m.validatingIssuer.mrc
mrcs[newMRC.Name] = newMRC
if len(mrcs) > 2 {.../*err*/ }
... // continue as normal
}
I think using a map like this would also clean up some of the logic in that method. But again, non-blocking
|
||
// if the issuer has already been created for the specified MRC, | ||
// return the existing issuer | ||
if signingIssuer != nil && mrc.Name == signingIssuer.ID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have this check here, then we can be less defensive about the shouldUpdateIssuer, and remove the checks on whether the desired is already set or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think shouldUpdateIssuer
should keep the check if the issuers are already set to their desired values. If this condition is true, the issuers should not be updated and therefore shouldUpdateIssuer
should return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this handles the scenario desiredSigningMRC != desiredValidatingMRC and both MRCs have active intents, their state is non-deterministic.
?
Even if it does, I kind of like being explicit about though and exiting the reconciliation early. It isn't immediately obvious that there are no changes otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave a quick thought on this extra defense in-depth:
You can find a lot of literature around unit testing methods in go, which typically mention there's no need to test unexported funcs. Expanding on this, we can think of unexported funcs as internals that are subject to change, and wholly within the control of the developers, while exported methods should be tested, because they define the API of our package, and can be used throughout the codebase.
Indeed, if a method becomes used frequently enough it can become difficult to change its behavior, and with many developers working on a project there tends to be specialization among packages, so good API's become more prudent as a dev workign on package A may not be familiar with the internals of package B. This is less true for unexported methods which are contained within the package.
Similarly, an exported method should be more defensive about parameter sanitization, while an internal method less so.
So without the need to add extra defense in-depth, we're left with the question of should we? There's 2 considerations here: readability, and defense from ourselves.
On one hand, the extra defense in-depth may hurt readability. It tells our future readers that a value is possible, and we should protect from it, even when it is currently not possible (again we don't have these guarantees at the API level). It creates an extra case in our state machine that we have to hold in our mental model.
On the other hand, we protect from our future selves by limiting the chance of a future caller not sanitizing inputs correctly (however if we are sanitizing at the API level, then any internal calls should be safe)
Personally, I err on the side that we read code much more than we write it, so we should aim for readability, but I'll defer to others on this.
Sorry for the long text on a mostly benign topic :)
} | ||
if validatingIssuer != nil && mrc.Name == validatingIssuer.ID { | ||
return validatingIssuer, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may also want to check if we are promoting/demoting an MRC, to avoid an extra call to GetCertIssuerForMRC
, if we're swapping the issuers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case is already handled by the existing conditionals. For a given MRC we check if the name matches the ID of either issuer and return the issuer on a match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah gotcha, sounds like a good thing for a test case :)
dc33a90
to
1f029bf
Compare
Signed-off-by: jaellio <[email protected]> refactored mrc reconciler methods Signed-off-by: jaellio <[email protected]> tests Signed-off-by: Kalya Subramanian <[email protected]> Begin work on e2es update logging Signed-off-by: jaellio <[email protected]> Response to intial PR feedback - removed deacative as an intent Signed-off-by: jaellio <[email protected]> Finish tests to validate basic rotation Signed-off-by: James Sturtevant <[email protected]> Use active/active pattern Signed-off-by: James Sturtevant <[email protected]> - handle single MRC case in each method rather than seperately - shouldSetIssuers check - non-deterministic active, active intent check to avoid repeated updates to issuers Signed-off-by: jaellio <[email protected]> - test for filtering inactive MRCs Signed-off-by: jaellio <[email protected]> - simplified logic for handling 1 MRC in control plane - added more unit tests - moved HandleMRCEvent unit test to mrc_reconciler_test.go Signed-off-by: jaellio <[email protected]> - clean up lint errors and commented out code Signed-off-by: jaellio <[email protected]> - remove duplicate switch case in shouldSetIssuer and setIssuer Signed-off-by: jaellio <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: jaellio <[email protected]>
Signed-off-by: jaellio <[email protected]>
have active intents, and the issuers are not set to the desired values, but match either MRC, then the issuers do not need to be updated Signed-off-by: jaellio <[email protected]>
- simplified validaion of MRC intent and their combinations - simplified switch state to two conditional checks to determine validating and signing MRCs - removed unused MRC constants from constants.go - removed noop status update in osm bootstrap Signed-off-by: jaellio <[email protected]>
- remove validateMRCIntent Signed-off-by: jaellio <[email protected]>
unexpected calls Signed-off-by: jaellio <[email protected]>
7f5e44e
to
1fae622
Compare
Signed-off-by: jaellio <[email protected]>
dd57e57
to
bf5f16d
Compare
@keithmattix a few changes have been made since your last look, would you take another look? |
Thanks for the ping! Will take a look! |
} | ||
return nil | ||
return mrcList[:n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this protecting against potentially 3 active MRC's? That seems like a critical issue and we should return an error?
func isTerminal(mrc *v1alpha2.MeshRootCertificate) bool { | ||
// TODO(5046): check if the mrc is in a terminal state. | ||
return true | ||
func (m *Manager) updateIssuers(signingIssuer, validatingIssuer *issuer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function doesn't need to return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, but nothing worth blocking this PR on. I'd prefer to bias towards action and get this merged and iterate via separate PRs. Nice work!
@@ -136,6 +107,7 @@ var testPresetMeshRootCertificate = &corev1.ConfigMap{ | |||
}, | |||
Data: map[string]string{ | |||
presetMeshRootCertificateJSONKey: `{ | |||
"intent": "Active", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the casing probably matters here; should be active
instead of Active
Description:
The PR includes the changes required to support an MVP of manual root certificate rotation. With these changes, a user will be able to manually create and update the
intent
of the MRCs and rotate the root certificate without downtime. OSM utilizes a level-triggered approach to bring the state of the mesh to match the MRC intent.There are 3 possible values for intent. Each describes the desired role of the provider and certificate settings specified in the MRC:
To rotate the root certificate or update the certificate provider, the user must perform the following actions:
There is an existing MRC, MRC1 and a new MRC will be created, MRC2
inactive
intent -> MRC1.intent = active, MRC2.intent = inactivepassive
. Wait to allow the certificates to be rotated -> MRC1.intent = active, MRC2.intent = passiveactive
-> MRC1.intent = active, MRC2.intent = activepassive
. Wait to allow the certificates to be rotated -> MRC1.intent = passive, MRC2.intent = activeinactive
-> MRC1.intent = inactive, MRC2.intent = activeNote: All combinations of intent are deterministic except for
active
andactive
. The order in which the MRCs are listed determines which will be thesigningIssuer
or thevalidatingIssuer
.Initial proposal for root certificate rotation simplification: https://docs.google.com/document/d/1bh3Yfn5DXWjmzdKa4XHlk-9KepUNb_EgDDrzd1G_deY/edit?usp=sharing
Resolves #5198
Part of #4998
Testing done:
Affected area:
Please answer the following questions with yes/no.
Does this change contain code from or inspired by another project? No
Is this a breaking change? No
Has documentation corresponding to this change been updated in the osm-docs repo (if applicable)? No