Skip to content

[pdata] Find replacement for now deprecated gogo/protobuf #7095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yurishkuro opened this issue Feb 2, 2023 · 20 comments
Open

[pdata] Find replacement for now deprecated gogo/protobuf #7095

yurishkuro opened this issue Feb 2, 2023 · 20 comments
Labels
area:pdata pdata module related issues dependencies Pull requests that update a dependency file priority:p1 High

Comments

@yurishkuro
Copy link
Member

I don't know the full background of using gogo/protobuf, I suspect it was similar to Jaeger's use of gogoproto for serialization efficiency (in particular, the ability to embed structs by value, as in []KeyValue instead of []*KeyValue, which significantly reduces memory allocations).

Well, the bad news, which was long coming, is that GoGo is officially deprecated since Oct'22. Continue to depend on it going forward presents security and compatibility risks.

This ticket is to kick off a discussion for a viable path forward, since I don't have a solution in mind right now.

Some past related tickets:

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Feb 2, 2023

Yes, Collector uses Gogoproto instead of Google's implementation for performance reasons.

Here are a few replacement options that I can think of:

  • Move to Google Protobuf library. This will most likely result in worse performance for the the Collector (about 1.3x worse in our past benchmarks IIRC). The newer version of Google Protobuf (v2?) is even slower than the older version.
  • Otel to take responsibility of entire Gogoproto. This is a big undertaking, I don't think we want to subscribe to that.
  • Fork Gogoproto and maintain an Otel fork just for ourselves. It is still a pretty large codebase to maintain and my cursory assessment of it was that it was not a codebase that I would want to take responsibility to maintain myself.
  • Create our own Protobuf library. This may sound crazy but may be tractable if we limit it to only support what OTLP needs. I managed to write an experimental compiler/generator in just a few days that works on OTLP Logs and may be faster than Gogoproto (and with optional fields support, BTW). Obviously a production quality library requires much bigger investment and I definitely won't claim that I alone want to be a sole author and maintainer of such a library.

Unfortunately I do not see great options here, all have some downsides. We will need to decide how much effort we are willing to invest into this.

There is probably also an option to find some other alternate Protobuf library but I have not explored this.

@yurishkuro yurishkuro transferred this issue from open-telemetry/opentelemetry-proto Feb 2, 2023
@yurishkuro
Copy link
Member Author

As far as replacements, there are other projects that seem to provide similar capabilities to what Gogo had:

At minimum, they seem to be maintained at the moment, and hopefully are more narrow in scope than gogo.

@jpkrohling
Copy link
Member

What is Jaeger using nowadays? Any experience you can share with us?

@yurishkuro
Copy link
Member Author

Jaeger hasn't changed from gogo/proto

@codeboten
Copy link
Contributor

I've not tried using csproto, but there were a few investigations I'd done in the past including trying to use vtproto with the standard google protobuf library here: #4658

It's probably worth another go at it now that it's been over a year.

@yurishkuro
Copy link
Member Author

@codeboten I saw that PR, but can you share some findings / conclusions? It was closed without any explanation, so not clear if you ran into a dead end with something.

@codeboten
Copy link
Contributor

codeboten commented Feb 7, 2023

The scope of the work then was to add support in the collector for optional fields which was achieved through a workaround (#4258 (comment)). That was the reason this PR was closed.

From what I recall the performance numbers were not encouraging at first. Additionally I ran into issues w/ lack of support foroneof fields in pools.

@mx-psi mx-psi added dependencies Pull requests that update a dependency file priority:p1 High area:pdata pdata module related issues labels Apr 13, 2023
mx-psi added a commit that referenced this issue Dec 11, 2023
**Description:** Reorganize `VERSIONING.md`, specify compatibility
guarantees that have already been discussed and answer questions on
#8002.

Three new guarantees in the document had already been discussed
elsewhere:

- String representation:
#7625 (comment)
- Go version compatibility: Mentioned [on
README.md](https://github.com/open-telemetry/opentelemetry-collector?tab=readme-ov-file#compatibility)
- Dependency updates: Discussed in private, most recently in relation to
whether #7095 should block `pdata`'s 1.0 (consensus seems to be that it
should not).

Regarding the questions mentioned on #8002:

> Does adding new validation rules mean a breaking change?

Generally, yes except when the configuration was already invalid (i.e.
the Collector did not work properly with it).

> Should we offer a "NewDefault...." for each config and say that the
behavior of the config is stable only if initialized with NewDefault?

I did not add anything for this one. I think we should discuss it, but:
- I don't think there is a sensible output for `NewDefault...` for all
configurations (e.g. for `confignet.TCPAddr`, what would the default
be?)
- It seems to me that we should handle configuration validity through
`Validate`, and not through `NewDefault...`. `NewDefault` may help but
ultimately we need to verify through `Validate`.

> Is adding new fields means breaking change? Let's assume someone
"squash" the message into another message, then adding a field with same
mapstruct value will be a breaking change, what do we do with that? What
are the rules we follow.

This was already in this document, where we had decided that adding new
fields was okay. I think it would be too stringent to bar us from adding
new fields.

**Link to tracking Issue:** Fixes #8002

---------

Co-authored-by: Yang Song <[email protected]>
@yurishkuro
Copy link
Member Author

yurishkuro commented Aug 29, 2024

This may be time to revisit this issue. I was dealing with grpc upgrade to v1.66.0 in Jaeger where gRPC introduced CodecV2 and memory buffers that broke out integration with gogoproto. I was able to fix it by forcing the new buffers back to []byte, which is probably going to degrate the efficiency of marshaling, thus partially negating the benefits of gogoproto. And I also noticed new dependency on https://github.com/planetscale/vtprotobuf coming from grpc via envoyproxy/go-control-plane, which led me to this issue envoyproxy/go-control-plane#824 showing large perf improvements by switching to vtprotobuf (which it calls "it is gogo protobuf, but built on top of protobuf v2, instead of replacing it.")

@yurishkuro
Copy link
Member Author

https://go.dev/blog/protobuf-opaque looks very promising as a path towards removing gogo/protobuf

@tigrannajaryan
Copy link
Member

https://go.dev/blog/protobuf-opaque looks very promising as a path towards removing gogo/protobuf

Yes, let's see if performance is on par.

Also, yay for lazy decoding.

@els0r
Copy link

els0r commented Jan 16, 2025

We are very interested in this initiative, as we've observed the following in our OTel deployment:

  • CPU is the defining cost-driver in the OTel deployment (~15 TB / day shipped)
  • Profiles show that the hot path / allocation-heavy path is jsonpb.go from the gogo proto library

Would you be interested in our contribution to switch to google.golang.org/protobuf?

@mx-psi
Copy link
Member

mx-psi commented Jan 16, 2025

@els0r Thanks for offering to contribute this!

I would definitely be interested in seeing how the performance compares and making the switch if we can do so :) Based on prior experience, this will take time and some back and forth, but the blog post shared by Yuri and your comments makes it feel like this is a very promising option.

If it is possible to start small in some way (e.g. the simplest/smallest PoC you can think of), I think the first step should be to do this and bring it up on one of our Collector SIG virtual meetings, to get some early feedback.

@sfc-gh-bdrutu
Copy link

@els0r

Profiles show that the hot path / allocation-heavy path is jsonpb.go from the gogo proto library

Why do you use JSON encoding?

@els0r
Copy link

els0r commented Jan 17, 2025

Thanks for the quick responses. We'll explore the option on a fork to get a sense of the complexity / the back and forth.

We use json encoding when writing telemetry (mainly logs) onto a Kafka queue towards external parties. It's the OTLP -> JSON conversion step where we currently see the most CPU time spent.

@denysvitali
Copy link

After a bit of experimentation (see https://github.com/denysvitali/opentelemetry-collector/tree/feature/use-protobuf-opaque-api) I was able to partially migrate to protobuf-go to test the performance improvement it would have.

Before entering into the details of the feature, here are some results:

Results

Note

I bet the results would be even faster as soon as golang/protobuf#1673 is implemented. The current implementation is still a bit inefficient, but already shows some nice improvements.

Upstream implementation (6757cc7)

Only addition: json_test.go (and the fix logRecord.SetSpanID([8]byte{0, 1, 2, 3, 4, 5, 6, 7}))

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/collector/pdata/internal/json
cpu: Apple M3 Pro
BenchmarkEncoding-12    	     133	   8909393 ns/op	 5652242 B/op	  126244 allocs/op
BenchmarkEncoding-12    	     132	   9045715 ns/op	 5652374 B/op	  126244 allocs/op
BenchmarkEncoding-12    	     130	   8997149 ns/op	 5652260 B/op	  126243 allocs/op
BenchmarkEncoding-12    	     129	   9038421 ns/op	 5652305 B/op	  126244 allocs/op
BenchmarkEncoding-12    	     129	   8960452 ns/op	 5652328 B/op	  126244 allocs/op
BenchmarkEncoding-12    	     133	   8933958 ns/op	 5652268 B/op	  126244 allocs/op
BenchmarkEncoding-12    	     132	   9031978 ns/op	 5652378 B/op	  126245 allocs/op
BenchmarkEncoding-12    	     132	   8986970 ns/op	 5652293 B/op	  126243 allocs/op
BenchmarkEncoding-12    	     133	   8983367 ns/op	 5652272 B/op	  126243 allocs/op
BenchmarkEncoding-12    	     132	   8985762 ns/op	 5652313 B/op	  126244 allocs/op
PASS
ok  	go.opentelemetry.io/collector/pdata/internal/json	21.541s

Fork (bba99f7)

Note

This implementation doesn't generate the exact same JSON - but for the relevant parts (log lines) it does.
This is due to the fact that the whole refactoring was taking too long due to the generated code - I'll explain in details down below why this is the case.

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/collector/pdata/internal/json
cpu: Apple M3 Pro
BenchmarkEncoding-12    	    1198	   1014819 ns/op	 1342051 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1183	   1016699 ns/op	 1342074 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1130	   1006196 ns/op	 1342030 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1204	   1015811 ns/op	 1342045 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1172	   1013799 ns/op	 1342047 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1197	   1029485 ns/op	 1342032 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1202	   1005060 ns/op	 1342055 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1231	    999719 ns/op	 1342044 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1195	    989382 ns/op	 1342058 B/op	   17065 allocs/op
BenchmarkEncoding-12    	    1206	    995486 ns/op	 1342062 B/op	   17065 allocs/op
PASS
ok  	go.opentelemetry.io/collector/pdata/internal/json	13.318s

Comparison

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/collector/pdata/internal/json
cpu: Apple M3 Pro
            │ /tmp/old.txt │            /tmp/new.txt             │
            │    sec/op    │   sec/op     vs base                │
Encoding-12    8.986m ± 1%   1.010m ± 1%  -88.76% (p=0.000 n=10)

            │ /tmp/old.txt │             /tmp/new.txt             │
            │     B/op     │     B/op      vs base                │
Encoding-12   5.390Mi ± 0%   1.280Mi ± 0%  -76.26% (p=0.000 n=10)

            │ /tmp/old.txt │            /tmp/new.txt             │
            │  allocs/op   │  allocs/op   vs base                │
Encoding-12   126.24k ± 0%   17.07k ± 0%  -86.48% (p=0.000 n=10)

This shows that the new implementation is 88% faster (CPU) and decreases the memory usage by 76%.
The amount of allocations per operation also decrease of a good 86%.

Implementation

My implementation is far from complete, and probably very buggy. My goal was to understand how much of a performance improvement this change would bring - so I focused entirely on that and didn't care too much about external users of the pdata module or the correctness outside of the main benchmark part (log lines).

Adding protojson to the encoding part

Initially I thought it was enough to just call protojson in the json.go file to already get a better performance:

https://github.com/denysvitali/opentelemetry-collector/blob/bba99f7239fed1fb9379487383efb928b36507ab/pdata/internal/json/json.go#L20

Unfortunately I was very wrong: since the GoGo generated protobuf files aren't compatible with google.golang.org/protobuf (duh!), we can't use them like this.

Recompiling the protos with proto-gen-go + API_OPAQUE

For this reason, I've recompiled the protos using protoc-gen-go and added the GRPC definitions via protoc-gen-go-grpc.
In this case, I had to create a Dockerfile since the upstream build-tools don't support this (yet).
Additionally, I had to use nix-unstable because the latest version of protoc-gen-go on Nix doesn't yet support the Opaque API (1.6.x+ is needed).

I then built the Docker image docker build -f Dockerfile.build -t opentelemetry-collector/build ..
I then modified the way we compile the protos to use protoc-gen-go and protoc-gen-go-grpc and to add the Opaque API switch: https://github.com/denysvitali/opentelemetry-collector/blob/bba99f7239fed1fb9379487383efb928b36507ab/Makefile#L235

Finally, I removed most of the original sed hacks (that part is horrible, sorry, but thankfully with the new protobuf library it would not be needed) and replaced the fixed-size SpanIDs and TraceIDs to a slice - so that they match the proto type.

"Fixing" the code generation

In the Opaque API, most of the fields are pointers and they should be accessed using the getters and setters Get{{ .fieldName }} and Set{{ .fieldName }}. On top of that, the way proto structs are filled changed and thus the _builder struct should be used in those cases. For oneofs, the case matching changed slightly too.

Of course pdata contains a lot of generated code, and this code needs to be fixed in order to test the encoding performance. In my branch, I've tried to do that without breaking the pdata types (too much).

After spending a lot of time refactoring, I managed to get the code to compile and fixed enough of the generated code to get to the point where I was able to test the json encoding performance.

My implementation is mostly patched together just for the sake of trying to get the encoder benchmark to work and give us an idea of how good of an improvement it would be.

Next Steps

I intend to work on golang/protobuf#1673 if it's trivial and no-one picks it up before me, so that we can see an hopefully further increase in performance and decrease in memory consumption.

Conclusion

I will not make a PR to switch away from gogo/protobuf as I don't think I have the right amount of pdata's code generation knowledge to properly fix the module - but I hope my little experimentation shows you that this part is feasible albeit complex in the code-generation part.

You're free to reuse my feature branch - but I would not recommend to use my code generation part there as it's pretty messy.

@mx-psi
Copy link
Member

mx-psi commented Jan 30, 2025

@denysvitali Thanks for sharing the results! The results are definitely promising for OTLP/HTTP JSON.

I expect most of our users use OTLP/gRPC which uses a protobuf binary encoding, so it's very important for us to benchmark those as well, to ensure there are no regressions on this case.

@denysvitali
Copy link

Yes, that would be also interesting. I can try to benchmark that too if I find the time in the following days.

@dmathieu
Copy link
Member

Can you run those two benchmarks with benchstat for an easier comparison?

@bwplotka
Copy link

👋 Just some datapoints from the Prometheus ecosystem (SDKs, Prometheus, Thanos, Cortex etc).:

We still use gogo/protobuf in general and we are (very) slowly progressing on the migration. I tried OpaqueAPI a bit and it feels similar to @tigrannajaryan https://github.com/splunk/exp-lazyproto indeed.

However, I’m not sure OpaqueAPI+lazy decoding addresses all challenges we have with data-intensive proto use. The non-nullability of fields (e.g. []Elem instead of []*Elem ) is still beneficial in practice (less allocs, less GC, less need for pooling, trivial to benchmark) and lazy decoding makes already too big structs even bigger (what was the last time our code needed information about unknown fields?) .

I wonder if there’s an opportunity to maintain together some simpler efficient Go generator for both Otel and Prometheus for simpler use case cases e.g.

  • empty value == not specified, so non-nullability
  • lack of unknown fields (or optionality in generator for this)
  • streaming (alternative or different form of lazy decoding)
  • ways to extend generation to decod/encode directly to native types.

Last two features were extremely beneficial for the tailored Prometheus use for scraping.

Some recent discussion: https://groups.google.com/g/prometheus-developers/c/uFWRyqZaQis/m/IvnEjVkHAQAJ

cc @bboreham

@tigrannajaryan
Copy link
Member

I wonder if there’s an opportunity to maintain together some simpler efficient Go generator for both Otel and Prometheus for simpler use case cases

@bwplotka I offered to work on this a while ago (when working on lazy-proto) but there were no takers. Unfortunately I no longer have time to work on this at the moment, but as a data point: doing lazy-proto didn't feel like an insurmountable amount of work to write your own Protobuf generator.

When working on lazy-proto I also thought about doing a streaming Protobuf deserializer and eventually decided to spend my time on a better streaming format (STEF handily beats OTLP Protobuf both on size and speed dimensions. When it matures, I would love to discuss this with Prometheus folks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:pdata pdata module related issues dependencies Pull requests that update a dependency file priority:p1 High
Projects
None yet
Development

No branches or pull requests

10 participants