-
Notifications
You must be signed in to change notification settings - Fork 583
Add RW2 support #11100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RW2 support #11100
Conversation
Add the RW2 proto definition to mimirpb. The implementation will follow but it needs to patch the generated files, so we commit the proto now to have actually readable diff in the PR that has the functionality. See #10423 and especially #11100 diff. Signed-off-by: György Krajcsovits <[email protected]>
Add the RW2 proto definition to mimirpb. The implementation will follow but it needs to patch the generated files, so we commit the proto now to have actually readable diff in the PR that has the functionality. See #10423 and especially #11100 diff. Signed-off-by: György Krajcsovits <[email protected]>
Add the RW2 proto definition to mimirpb. The implementation will follow but it needs to patch the generated files, so we commit the proto now to have actually readable diff in the PR that has the functionality. See #10423 and especially #11100 diff. Signed-off-by: György Krajcsovits <[email protected]>
Add the RW2 proto definition to mimirpb. The implementation will follow but it needs to patch the generated files, so we commit the proto now to have actually readable diff in the PR that has the functionality. See #10423 and especially #11100 diff. Signed-off-by: György Krajcsovits <[email protected]>
Add the RW2 proto definition to mimirpb. The implementation will follow but it needs to patch the generated files, so we commit the proto now to have actually readable diff in the PR that has the functionality. See #10423 and especially #11100 diff. Signed-off-by: György Krajcsovits <[email protected]>
fc8df4d
to
ee6f843
Compare
* feat(rw2): prepare for RW2 support Add the RW2 proto definition to mimirpb. The implementation will follow but it needs to patch the generated files, so we commit the proto now to have actually readable diff in the PR that has the functionality. See #10423 and especially #11100 diff. * test(AddTypeToTree): allow testing for protobuf type and field number Signed-off-by: György Krajcsovits <[email protected]> Co-authored-by: Copilot <[email protected]>
2268095
to
f697a73
Compare
Signed-off-by: György Krajcsovits <[email protected]>
b8f2eb1
to
d0df4d9
Compare
Implement unmarshal of Promtheus remote write request 1.0 data from Remote Write 2.0 input by converting the model. The conversion is lossy as RW1 does not have metadata per series, only per metric family. Signed-off-by: György Krajcsovits <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>
d0df4d9
to
5523a3f
Compare
Duplicate all existing testcases with RW2. Add metadata check. Signed-off-by: György Krajcsovits <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
pkg/util/test/rw2.go:124
- The GetSymbols method builds the symbols slice by iterating over a map, which does not guarantee deterministic ordering. Consider iterating sequentially from 0 to the highest index to ensure a consistent order of symbols for downstream consumers.
func (symbols *SymbolTableBuilder) GetSymbols() []string {
pkg/mimirpb/compat_rw2.go:50
- [nitpick] After releasing pages to the pool, it may be beneficial to reset the overall pages slice (e.g. setting ps.pages to nil) to avoid retaining memory if the rw2PagedSymbols instance is reused.
page = page[:0]
|
// - is symbols is not nil, we unmarshal from Remote Write 2.0 format. | ||
func (p *PreallocTimeseries) Unmarshal(dAtA []byte, symbols *rw2PagedSymbols, metadata map[string]*MetricMetadata) error { | ||
if TimeseriesUnmarshalCachingEnabled && symbols == nil { | ||
// TODO(krajorama): check if it makes sense for RW2 as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aknuds1 wdyt?
Moving the new PR Signed-off-by: György Krajcsovits <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took an initial review pass.
pkg/distributor/push.go
Outdated
addWriteResponseStats(w, rs) | ||
} else { | ||
// This should not happen, but if it does, we should not panic. | ||
addWriteResponseStats(w, &promRemote.WriteResponseStats{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't addWriteResponseStats
only relevant for RW v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, you're right, I've ended up doing it for RW1 as well, let me see how I could avoid that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, only do stats for RW2 and also add check to the integration test
@krajorama Could you compare your new benchmark results with those from main (i.e. |
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Main RW1 -> PR RW2
Main RW1 -> PR RW1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I left some nits, please check them.
My main concern isn't regarding this specific implementation but RW2 itself: it seems that we rely on the fact that the symbol table would be sent before the timeseries, but that doesn't seem to be something enforced by protobuf spec, is it?
When a message is serialized, there is no guaranteed order for how its known or unknown fields will be written. Serialization order is an implementation detail, and the details of any particular implementation may change in the future. Therefore, protocol buffer parsers must be able to parse fields in any order.
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
I've been investigating how the new code interacts with GRPC buffer release/reuse: While the push is being processed, we don't let go of the request, so GRPC won't do anything with it I assume. After that we call mimirpb.ReuseSlice(req.Timeseries) which will in its inside end up clearing the reference to the yolostrings in label names/values for series and exemplars. However, for the metadata unit and help string, there's no such cleanup, because RW1 simply copied those. The problem is that when I process the symbols, I have no idea if the symbol will be used for metadata or for labels, so I just yolostring all. Basically this means I need to make a copy somewhere. |
On the other hand we only reuse the |
Signed-off-by: György Krajcsovits <[email protected]>
👋🏽 Disclaimer: I'm not allowed to read Mimir specific code (AGPLv3 license blocks me, sorry), but @krajorama reached me for questions around protobuf and PRWv2 which are super interesting, I might have some pointers. First of all I love the unmarshalling technique you do here. I would propose we do this in some shared Apache2 shared space, so everyone can maintain and use it e.g. Prometheus or custom Go proto generator we discuss a lot. I am not sure about the order of fields in the byte stream. Knowing protobuf project, they might be reserving the right to change it if needed, so it might non-deterministic. To be on safer side, my suggestion is do what we did in Prometheus for the protobuf scrape parsing. We wrote decoder that streams and allow parsing to a concrete type without intermediate structure. As I understand from the discussion of this PR, you do similar but even more in-place (directly on unmarshal). This allows removing the need to keep some network bytes for a bit longer, but might be prone to the field order. Maybe we can prove you can rely on field order. We could check with Protobuf experts. Again, whatever you do, this decoder could live on Prometheus or we could play with custom generator for streaming decoding. Thanks for innovating here 💪🏽 |
Just noticed you do this on grpc level, so maybe full decoder is not trivial? Would love to learn more on constraints here. |
These examples are very much remote write 1 and scraped protobuf related as far as I can tell, so they don't suffer from the potential problem of symbols coming later than label/unit/help references. But I'll mention the potential for problems in the changelog under limitations. I think we can add a slow path that would detect whether symbols was read or not and somehow defer the resolution. |
Signed-off-by: György Krajcsovits <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
pkg/util/test/unindent.go:41
- [nitpick] Consider enhancing this error message by including the problematic line content to aid debugging of unexpected indentation issues.
t.Fatal("non empty line is shorter than the indentation")
pkg/mimirpb/split_test.go:169
- [nitpick] Review the RW2 field names in the expected list; consider aligning the naming conventions (e.g. using consistent capitalization for "SymbolsRW2" and "rw2symbols") to avoid confusion.
assert.ElementsMatch(t, []string{ "Timeseries", "Source", "Metadata", "SymbolsRW2", "TimeseriesRW2", "SkipLabelValidation", "SkipLabelCountValidation", "skipUnmarshalingExemplars", "unmarshalFromRW2", "rw2symbols", }, fieldNames)
// if p.skipUnmarshalingExemplars is false. | ||
func (p *PreallocTimeseries) Unmarshal(dAtA []byte) error { | ||
if TimeseriesUnmarshalCachingEnabled { | ||
// Copied from the protobuf generated code, change are that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the comment to read "// Copied from the protobuf generated code, changes are as follows:" to improve clarity.
// Copied from the protobuf generated code, change are that: | |
// Copied from the protobuf generated code, changes are as follows: |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but see comments (in particular the error string, which I wonder about).
Signed-off-by: György Krajcsovits <[email protected]>
What this PR does
Optimized version of #10432 .
Instead of unmarshal into RW2 then convert to RW1, unmarshal directly into RW1.
Which issue(s) this PR fixes or relates to
Related to #9072
Leftovers for next PRs
-timeseries-unmarshal-caching-optimization-enabled
is possible for RW2Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.