Skip to content

Opportunities for performance improvements in Kafka receiver #39813

Open
@dentonk

Description

@dentonk

Component(s)

receiver/kafka

Describe the issue you're reporting

In an effort to benchmark the kafka receiver, my findings have lead me to believe there may be opportunities to improve the consumption throughput in a very meaningful way. While running various tests, the throughput never seemed as high as I would expect. To put this to the test, I decided to use RedPanda Connect (formerly Benthos), as an ideal point of comparison given it is also written in Go, also uses the sarama library and is also a pipeline solution where you can configure batching, etc. The results of my tests show Benthos consuming messages at a much higher rate compared to the kafka receiver running roughly equivalent configurations on the exact same host against the same Kafka cluster/topic/partition(s) and data stream.

In both cases the pipeline is as simple as possible: Kafka (log text decoding) -> Batch -> Drop/Nop.
I have tested with both a locally deployed Kafka cluster as well as Google Managed Kafka.
I have tested with both a single partition as well as multiple partitions.
I have tweaked and tuned the various fetch parameters to ensure both are performing similar requests.
I have tested with both already stored messages as well as actively streaming messages.

In all of the tests, I'm seeing reduced consumption by the kafka receiver.

After reviewing the code and speaking with folks closer to the Benthos project, one possible area of focus for improvement is around how the messages a being propagated via channels. A comment was made during this research: channels with select statements in a pipeline can actually start to become a bottleneck at throughput as low as ~50k/s. Benthos, instead, sends batches of messages as a single channel push. I'm not enough of a Go engineer to be able to look at the current receiver and confirm if this is indeed a bottleneck we might be running into, but it certainly seems like a place to start.

I'm happy to share more specifics from the testing if that would be helpful, include actual numbers and client configurations.

Also, I have been given permission to offer up support from the Bindplane engineering team, we'd just like to get some thoughts and guidance from folks more familiar with the existing code.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions