Skip to content

fix(decoder): use configurable limit for max number of records in a record batch #3120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 11, 2025

Conversation

rmb938
Copy link
Contributor

@rmb938 rmb938 commented Mar 5, 2025

Replace the tmp > 2*math.MaxUint16 arbitrary number safety blanket with tmp > MaxResponseSize as we already have that constant (that a user can change via sarama.MaxResponseSize = ...) which is what we use for MaxBytes in a FetchRequest.

Also added a test for large record counts.

Fixes: #3119

created `getArrayLengthNoLimit` as record batches can contain more
than `2*math.MaxUint16 records`. The packet decoder will make sure
that the array length isn't greater than the number of bytes remaining
to be decoding in the packet.

Also added a test for large record counts.

Fixes: IBM#3119
Signed-off-by: Ryan Belgrave <[email protected]>
Copy link
Contributor

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me

@@ -254,3 +256,50 @@ func TestRecordBatchDecoding(t *testing.T) {
}
}
}

func TestRecordBatchLargeNumRecords(t *testing.T) {
numOfRecords := 10 + (2 * math.MaxUint16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how useful making this parameterizable is, when the CRC will change if the value changes, but then I like that it’s apparent and inarguable what length we’re encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I only set it as a var to make it obvious what it is. I kept forgetting what I had it set to during testing, so it's probably useful just to have it easily readable like you said haha.

@dnwe
Copy link
Collaborator

dnwe commented Mar 10, 2025

@rmb938 commented over on the issue here, but I think we probably want to just keep the existing getArrayLength func rather than adding a new one, and just change the magic number to use the existing constant we have for FetchBytes perhaps?

@rmb938
Copy link
Contributor Author

rmb938 commented Mar 10, 2025

@dnwe Yep makes sense to me, done.

Signed-off-by: Ryan Belgrave <[email protected]>
@dnwe dnwe changed the title don't limit the number of records in a record batch when decoding fix(decoder): use configurable limit for max number of records in a record batch Mar 11, 2025
@dnwe dnwe added the fix label Mar 11, 2025
Copy link
Collaborator

@dnwe dnwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM!

@dnwe dnwe merged commit 0fb6d9a into IBM:main Mar 11, 2025
16 checks passed
@nkawli
Copy link

nkawli commented Apr 29, 2025

Hit the same issue in my setup. When will we get the new release with this fix ?

Uduchi2nd pushed a commit to mixpanel/sarama that referenced this pull request May 29, 2025
…ecord batch (IBM#3120)

Replace the `tmp > 2*math.MaxUint16` arbitrary number safety blanket with `tmp > MaxResponseSize` as we already have that constant (that a user can change via sarama.MaxResponseSize = ...) which is what we use for MaxBytes in a FetchRequest.

Also added a test for large record counts.

Fixes: IBM#3119
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue hitting max records in a batch
5 participants