Description
Use case(s) - what problem will this feature solve?
On their way to the wire, messages are encoded through user-provided encoding.Codec
modules. These modules implement the following interface:
// Codec defines the interface gRPC uses to encode and decode messages. Note
// that implementations of this interface must be thread safe; a Codec's
// methods can be called from concurrent goroutines.
type Codec interface {
// Marshal returns the wire format of v.
Marshal(v any) ([]byte, error)
[...]
}
The implementation of this interface is responsible for allocating the resulting byte slice. Once gRPC is done writing the message to the underlying transport, the reference to the slice is dropped, and garbage collection has to scavenge it. This means memory used for message encoding is never directly reused but has to go through a GC cycle before it can be allocated again.
Proposed Solution
We propose adding an additional codec interface that complements the existing one:
// BufferedCodec is an optional interface Codec may implement.
// It signals the ability of the codec to use pre-existing memory
// when writing the wire format of messages.
//
// # Experimental
//
// Notice: This API is EXPERIMENTAL and may be changed or removed in a
// later release.
type BufferedCodec interface {
// MarshalWithBuffer returns the wire format of v.
//
// Implementation may use a buffer from the provided buffer
// pool when marshalling. Doing so enables memory reuse.
MarshalWithBuffer(v any, pool SharedBufferPool) ([]byte, error)
}
This API takes the existing type grpc.SharedBufferPool
that provides an allocator-like API. Codecs that can marshal into pre-existing buffers, such as protobuf
, may implement it.
This interface should be optional because codecs need to estimate an upper bound for the size of the marshaled message to allocate a sufficiently large buffer. However, this estimate need not be accurate, as Go will reallocate buffers transparently if needed, and the GC will safely collect the original slice.
This new interface allows the allocation side to reuse buffers, but gRPC needs to return them to be recycled. We propose that gRPC return these encoded message buffers to the buffer pool once messages have been fully written to the transport and are no longer read from.
This implies adding new APIs for clients and servers, such as:
// ClientEncoderBufferPool is a CallOption to provide a SharedBufferPool
// used for the purpose of encoding messages. Buffers from this pool are
// used when encoding messages and returned once they have been transmitted
// over the network to be reused.
//
// Note that a compatible encoding.Codec is needed for buffer reuse. See
// encoding.BufferedCodec for additional details. If a non-compatible codec
// is used, buffer reuse will not apply.
//
// # Experimental
//
// Notice: This API is EXPERIMENTAL and may be changed or removed in a
// later release.
func ClientEncoderBufferPool(bufferPool SharedBufferPool) CallOption {
return EncoderBufferPoolCallOption{BufferPool: bufferPool}
}
and
// ServerEncoderBufferPool is a ServerOption to provide a SharedBufferPool
// used for the purpose of encoding messages. Buffers from this pool are
// used when encoding messages and returned once they have been transmitted
// over the network to be reused.
//
// Note that a compatible encoding.Codec is needed for buffer reuse. See
// encoding.BufferedCodec for additional details. If a non-compatible codec
// is used, buffer reuse will not apply.
//
// # Experimental
//
// Notice: This type is EXPERIMENTAL and may be changed or removed in a
// later release.
func ServerEncoderBufferPool(bufferPool SharedBufferPool) ServerOption {
return newFuncServerOption(func(o *serverOptions) {
o.encoderBufferPool = bufferPool
})
}
Giving users the ability to provide their implementation of buffer pools is advantageous for those that deal with specific messages of known sizes, and their implementation can be optimal. It also leverages prior art (see grpc.WithRecvBufferPool
, for example)
Alternatives Considered
Changes to the codec could be omitted entirely: simply returning buffers to a user-provided pool and letting them coordinate with their codec to pull from it. However, this would make it difficult for the default proto
codec to support this feature.
Alternatively, this change could be made without any external API changes by internally creating and managing the buffer pool. This would prevent third-party codecs from benefitting from the improvement, however.
Additional Context
We are currently running a service that uses gRPC streaming to stream (potentially) large files, in chunks, back to gRPC clients over the network. We measured that the Go allocation volume per second is roughly equal to the network throughput of the host. This creates GC cycles that introduce latency spikes and prevent us from predictably saturating the network at a reasonable CPU cost. After investigation, we have isolated the source of most of these allocations to protobuf slice creation during message serialization.
This service demonstrated the benefit of this approach with a very significant reduction in allocation volume (>90% fewer bytes allocated per second and a 7x reduction in allocations) and much lower CPU usage (>30% less). We offer an implementation of this proposal for context and as a conversation starter: #6613
I should note that there was prior work on this issue here #2817 and here #2816 but these issues didn't attract much discussion.
Thank you!