Skip to content

feat: serve recording rules from config in tenant-settings #4299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 16, 2025

Conversation

alsoba13
Copy link
Contributor

@alsoba13 alsoba13 commented Jul 14, 2025

In this PR we provide tenant-settings the ability to load recording rules from config (apart from existing storage rules). So now it needs to handle rules from storage and rules from config. Note following properties:

  • For consistency, loaded rules from config will come with a deterministic ID.
  • Listing rules now get storage and config rules and merge them (no deduplication).
  • Upsert now will also check if you are trying to update a provisioned rule, and fail in that case.
  • Delete will never find a provisioned rule as it only looks for them in storage.

I've detected some improvement points to do in a following PR (or maybe this one):

  • Right now, delete won't return Not found if requested rule Id is not present, it returns internal error instead.
  • Tests: we lack a lot of coverage here, we only unit test some private functions, but not the core functionality of the module.

}
}

type RecordingRules struct {
cfg Config
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

return connect.NewResponse(&settingsv1.GetRecordingRuleResponse{Rule: r}), nil
}
}
return nil, connect.NewError(connect.CodeNotFound, fmt.Errorf("no rule with id='%s' found", req.Msg.Id))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case was considered connect.CodeInternal error (in s.Get). Rewrote this to be a 404 in case there's no storage/config rule hitting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 404 feels correct

@alsoba13 alsoba13 marked this pull request as ready for review July 14, 2025 15:16
@alsoba13 alsoba13 requested review from simonswine, marcsanmi, aleks-p and a team as code owners July 14, 2025 15:16
@@ -86,6 +86,9 @@ message RecordingRule {
// This allows recording rules to focus on specific functions and calculate their "total"
// resource usage.
optional StacktraceFilter stacktrace_filter = 8;

// Provisioned rules are added by config and can't be Upsert or Deleted
bool provisioned = 9;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should help UI to disable delete icon AND/OR to mark it as provisioned

Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

The main thing I like you to change before merging is what store takes priority.

We currently have it inconsistent and IMHO the overrides (or rules from config), should always win. As this allows us most operation flexibility.

Comment on lines 98 to 104
res := &settingsv1.ListRecordingRulesResponse{
Rules: make([]*settingsv1.RecordingRule, 0, len(rules.Rules)),
Rules: make([]*settingsv1.RecordingRule, 0, len(rulesFromStore.Rules)+len(rulesFromOverrides)),
}
for _, rule := range rules.Rules {
for _, rule := range rulesFromStore.Rules {
res.Rules = append(res.Rules, convertRuleToAPI(rule))
}
res.Rules = append(res.Rules, rulesFromOverrides...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if there is a conflict between a rule from the API and a rule from overrides.

I think rules from overrides should always "win". This would allow us to override a misbehaving rule fairly quickly without having to fire requests/go to the bucket.

So basically

  • You have an API rule `the-killer-rule'
  • At a later stage we add an overriding rule in the overrides called the-killer-rule, do nothing.

Now it behaves there would only be a provisioned the-killer-rule, so the rule in the API store cannot be list/read/deleted/updated until we remove the override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay as a recap I'll build a proper override system as follows:

  • Rules coming from config may have an ID, this may be the same a one generated through API (UI).
  • For consistency, rules coming from config without ID will be filled with a deterministic autogenerated ID.
  • When calling Get, we will return overridden rules first, then API rules.
  • When calling List, we will return rules with unique IDs, so we filter out any API rule that was overridden (this is important as this is the call the compaction-worker does).
  • When calling Delete, we will only check for API rules (ignore overrides and straightly go to delete the OG id).
  • When calling Upsert, we can do a similar logic as delete: only update API rules (ignoring overrides), and insert rules that doesn't repeat an ID in overrides.

The operational idea behind is:

  • Given a broken rule, we can create an override. Then, we can fix it through bucket or Delete/Upsert operation.
  • We'd lose the capability of call Delete/Upsert easily from the UI, as provisioned rules won't offer the delete 🗑️ icon. But we can still make a call by other means to fix/delete the rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this all sounds good to me. Thanks for writing this summary 👍

}
return connect.NewResponse(res), nil

// look for provisioned rules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do this first and only return the one from the API store, when we do have no override.

return connect.NewResponse(&settingsv1.GetRecordingRuleResponse{Rule: r}), nil
}
}
return nil, connect.NewError(connect.CodeNotFound, fmt.Errorf("no rule with id='%s' found", req.Msg.Id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 404 feels correct

@@ -128,7 +155,7 @@ func (r *RecordingRules) DeleteRecordingRule(ctx context.Context, req *connect.R
return nil, connect.NewError(connect.CodeInvalidArgument, fmt.Errorf("invalid request: %v", err))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy the check from the Upsert method into here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to leave upsert and delete as is. This way if a rule is overridden by config, we can still update/delete the original, so these two operations just ignore overrides and go directly to store

@alsoba13 alsoba13 merged commit 506840f into main Jul 16, 2025
24 checks passed
@alsoba13 alsoba13 deleted the alsoba13/tenant-settings-rules-from-overrides branch July 16, 2025 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants