feat(v2): recording rules of function names #4232

alsoba13 · 2025-06-06T13:29:42Z

This PR approaches recording rules of functions.

There's one part I'd like to change (marked as TODO), but as it mostly involves code style, and is not compromising performance yet, I prefer to introduce this PR with a clearer presented core logic.

You may want to read this before jumping to reviewing the code, as this may help:

Observer state

Data is organized and accessed sequentially this way:

Tenant	Dataset	Series	Rows/profiles
tenant1	service_name1	fingerprint1	row1
			row2
			row3
		fingerprint2	row4
	service_name2	fingerprint3	row5
		fingerprint4	row6
		fingerprint4	row7
tenant2 ...	service_name3 ...	fingerprint5 ...	row8 ...

Observer only see single rows (last column), and need to deduce if a context switch has occurred.
There are 3 types of context/states/scopes:

Tenant scope:
- A tenant switch will cause a flush of the recorded metrics, and fetch and init new recording rules for the new tenant.
Dataset scope:
- Symbols information is scoped to every dataset. During the compaction process, symbols within a dataset are gathered from different blocks and rewritten to a new block. Observe needs to detect dataset changes so symbolical information may be reset.
- Tenant's rules are filtered and narrowed to those that target the specific dataset.
- Dataset state holds lookup tables, that relate symbolical information to rules targeting those symbols.
Series scope:
- Every batch of rows with the same fingerprint may match some rules. On a series switch, every recording rule is evaluated to know which ones are relevant to the series.

Observing symbols

We will try to avoid processing symbols that don't matter to any rule. With that purpose, row is Observed before symbols. Observe will compute the state of the series so we know if the series matches some rule with a FunctionName.

Observer will completely ignore symbols when the current Series state has no FunctionName rules. In a later series, if there's one rule that matches and has a FunctionName, symbols will be observed and pointers symbol-to-rule will be computed. As we only want to process symbols once for every new symbol rewritten, we will compute lookup tables of all rules that contain FunctionNames, not only the ones matching the current Series. The reason for this is that in a later series, a new rule with FunctionName may match and maybe all the symbols were already observed, and hence missed.

As a summary, requirements/properties we find:

We want to Observe row before observing symbols.
Symbols are only observed if we need to.
The observed row has old symbols, so we need to observe again to get the rewritten ones. For this reason, I included a Flush function, that re-observes the symbol. I'd like to get rid of this, or in case we keep this 2-step observer, rename it (maybe we can have a Observe(row, state) function instead)

pkg/experiment/metrics/observer.go

alsoba13 · 2025-06-06T14:02:47Z

pkg/experiment/block/compaction.go

+	// TODO: Find an alternative to this
+	Flush(ProfileEntry)


Happy to find a way to avoid this. We want to observe row first, then symbols, then observe row again when symbols are rewritten. I tried some strategies but failed to make this interface simpler.

I think the most confusing I find the naming here:

Flush, is used already in the exporter and also there is flush().

Observe is not really observing yet until ObserveSymbols or Flush.

Maybe naming it Observe -> Prepare and Flush -> ObserveValues.

In terms of interface, I am not too sure if I would prefer a callback here, but it could be a possibility. It feels weird though with ObserveSymbols:

if m.observer != nil { observeValues := m.observer.Prepare(r) defer observeValues() }

I like the idea of deferred observation. Something like:

if observe := evaluate(entry); observe != nil { defer observe() }

I'm curious how it performs on widely spread functions, e.g., Go runtime.mallocgc. What I had in mind is that we could resolve symbols once per aggregate (we'd need to aggregate all sample values first) or at dataset boundaries. That would also allow for slightly clearer contracts. I'm not proposing we implement this unless we encounter performance issues – just something that might be helpful in the future.

simonswine

Thank you for working on this @alsoba13.

You have done a good job making your implementation easy to follow and it is well structured. I also think you taken a lot of optimisations already into account, so I think we should merge this at this stage and get some real world usage to inform us what to do next.

I think before you merge, please make sure you cover the new code paths with some tests.

simonswine · 2025-06-11T08:42:12Z

api/settings/v1/recording_rules.proto

+
+  // StacktraceFilterFunctionNameRegExp functionNameRegExp = 2;


Would remove those comments from here, to avoid confusing users.

simonswine · 2025-06-11T08:42:16Z

api/settings/v1/recording_rules.proto

+
+  // SELF = 1;


Would remove those comments from here, to avoid confusing users.

simonswine · 2025-06-11T09:09:36Z

pkg/experiment/block/compaction.go

+	// TODO: Find an alternative to this
+	Flush(ProfileEntry)


I think the most confusing I find the naming here:

Flush, is used already in the exporter and also there is flush().

Observe is not really observing yet until ObserveSymbols or Flush.

Maybe naming it Observe -> Prepare and Flush -> ObserveValues.

In terms of interface, I am not too sure if I would prefer a callback here, but it could be a possibility. It feels weird though with ObserveSymbols:

if m.observer != nil { observeValues := m.observer.Prepare(r) defer observeValues() }

pkg/experiment/metrics/observer.go

simonswine · 2025-06-11T09:17:45Z

pkg/phlare/phlare.go

@@ -593,6 +593,7 @@ func (f *Phlare) Run() error {
 					"service_git_ref":    serviceGitRef(),
 					"service_repository": "https://github.com/grafana/pyroscope",
 				},
+				//TenantID: "pyroscope-tenant",


Clean me up

simonswine · 2025-06-11T09:18:27Z

pkg/phlaredb/symdb/rewriter.go

 }

-func NewRewriter(w *SymDB, r SymbolsReader) *Rewriter {
+type SymbolsObserver interface {
+	// ObserveSymbols is called once new symbols has been rewritten. This method must not modify the symbols.


Worth noting that Prepare() should be called before

kolesnikovae

LGTM

Thanks Alberto for adding this! It looks great. I think we can iterate on the API and performance improvements in the future

alsoba13 · 2025-06-20T11:01:11Z

Tests to cover observeSymbols will be added in a following PR

Since #4232 we added supported for it.

alsoba13 requested a review from a team as a code owner June 6, 2025 13:29