Skip to content

Set the database ID annotations on replicated policies #165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 5, 2024
Merged

Set the database ID annotations on replicated policies #165

merged 2 commits into from
Feb 5, 2024

Conversation

mprahl
Copy link
Member

@mprahl mprahl commented Jan 25, 2024

Note that the second commit enables the compliance API to exposed from the KinD cluster.

When the compliance events API is enabled, the replicated policies will
contain the annotation
policy.open-cluster-management.io/parent-policy-compliance-db-id and
each of its policy-templates entries will have the
policy.open-cluster-management.io/policy-compliance-db-id annotation.

As part of this, resilience to database connection losses and changes
was added. This is done by monitoring the database connection. If the
database connection is down, the replicated policy controller will queue
up reconcile requests to add the database specific annotations if the
answer isn't already cached. Once the database connection is restored,
the queued up reconciles will be triggered.

Note that now everytime a database connection changes, a database
migration is run. This is an idempotent action and it's to catch the
case where the database server has been swapped out or restored to an
older backup which may not have the latest database schema.

Relates:
https://issues.redhat.com/browse/ACM-6889

@mprahl mprahl marked this pull request as draft January 25, 2024 16:25
@mprahl mprahl marked this pull request as ready for review January 25, 2024 20:05
@openshift-ci openshift-ci bot requested a review from dhaiducek January 25, 2024 20:05
@mprahl mprahl changed the title WIP: Set the database ID annotations on replicated policies Set the database ID annotations on replicated policies Jan 25, 2024
@mprahl
Copy link
Member Author

mprahl commented Jan 25, 2024

This is now ready for review.

@@ -280,6 +284,7 @@ e2e-stop-instrumented:

.PHONY: e2e-test-coverage
e2e-test-coverage: E2E_TEST_ARGS = --json-report=report_e2e.json --output-dir=.
e2e-test-coverage: E2E_TEST_CODE_ARGS = --compliance-api-port=8385
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is diff 8385 vs 8384

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you have the Kind cluster running, it exposes the 8384 port so it can't be reused when running e2e-test-coverage since the controller is run outside of the cluster.

@@ -35,6 +37,35 @@ var (
ErrInvalidLabelValue = errors.New("unexpected format of label value")
)

type GuttedObject struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I had to move GuttedObject to prevent an import loop.

Copy link
Member

@JustinKuli JustinKuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything reset the cached IDs when the database connection changes? It seems like a migration might lead to those IDs being changed, but maybe I missed something.

Comment on lines 363 to 368
log.Error(
err,
"Failed to get the database ID of the parent policy",
"namespace", replicatedPolicy.Namespace,
"name", replicatedPolicy.Name,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to imagine what the rest of the function does after this, because I was trying to think if it could just queue up the thing, and then return early, which might be cleaner/faster?

It seems like it would skip trying to connect to the database again, and just just uses whatever it might already have in its cache. But when its cache is missing something, it deletes that annotation from the policy. So in a case where the cache is empty because the controller just restarted, and it (for some reason) can't connect to the database yet, it seems like it removes all those annotations? Is that right, is that fully intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking you can't guarantee accuracy if the DB is down and there is no cache entry, but you're right, we can verify the replicated policy didn't change based on the DB unique constraints and reuse the ID from the replicated policy.

I just added this.

@mprahl
Copy link
Member Author

mprahl commented Jan 26, 2024

@JustinKuli good catch! I'm now clearing the cache after a database migration.

When the compliance events API is enabled, the replicated policies will
contain the annotation
policy.open-cluster-management.io/parent-policy-compliance-db-id and
each of its policy-templates entries will have the
policy.open-cluster-management.io/policy-compliance-db-id annotation.

As part of this, resilience to database connection losses and changes
was added. This is done by monitoring the database connection. If the
database connection is down, the replicated policy controller will queue
up reconcile requests to add the database specific annotations if the
answer isn't already cached. Once the database connection is restored,
the queued up reconciles will be triggered.

Note that now everytime a database connection changes, a database
migration is run. This is an idempotent action and it's to catch the
case where the database server has been swapped out or restored to an
older backup which may not have the latest database schema.

Relates:
https://issues.redhat.com/browse/ACM-6889

Signed-off-by: mprahl <[email protected]>
@@ -332,8 +335,77 @@ func (p *ParentPolicy) GetOrCreate(ctx context.Context, db *sql.DB) error {
return getOrCreate(ctx, db, p)
}

func (p ParentPolicy) key() string {
return fmt.Sprintf("%s;%v;%v;%v", p.Name, p.Categories, p.Controls, p.Standards)
func (p ParentPolicy) Key() string {
Copy link
Contributor

@yiraeChristineKim yiraeChristineKim Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this use as an id as parent-policy in db? I cannot find insert in db this key as ia id. If not, we can add resourceVersion and we can use this as an id. can we?? ParentPolicy doesn't need to save status in db right?

Copy link

openshift-ci bot commented Feb 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mprahl, yiraeChristineKim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mprahl,yiraeChristineKim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants