-
Notifications
You must be signed in to change notification settings - Fork 641
[v24.3.x] storage: work around segment index overflows #26022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v24.3.x] storage: work around segment index overflows #26022
Conversation
Once values begin coming in whose deltas are beyond uint32_t::max, the index is basically unusable (queries have somewhat undefined behavior). This updates the index to skip adding values if the offset delta would overflow. Prior to this change, one included test would fail with: ``` src/v/storage/tests/log_segment_reader_test.cc(376): error: in "test_ghost_read_with_index_overflow": check first.last_offset() == model::offset{100} has failed [2147483746 != 100] src/v/storage/tests/log_segment_reader_test.cc(377): error: in "test_ghost_read_with_index_overflow": check first.header().type == model::record_batch_type::raft_data has failed [batch_type::ghost_batch != batch_type::raft_data] ``` This test failing means that a recovery from such an overflowed topic could end up missing some batches. Another included test would fail, this one relying on behavior addressed by redpanda-data#25974 to cause trouble: ``` src/v/storage/tests/log_segment_reader_test.cc(382): error: in "test_read_with_index_overflow_base": check first.base_offset() == model::offset{uint32_max} has failed [8589934591 != 4294967295] src/v/storage/tests/log_segment_reader_test.cc(383): error: in "test_read_with_index_overflow_base": check first.last_offset() == model::offset{uint32_max} has failed [8589934591 != 4294967295] ``` This test failing means that a compaction could attempt to read a whole segment but only get a part of a segment, and we could end up removing batches incorrectly. Just an interesting tidbit, these tests pass if using the batch cache. This is relevant as we have been chasing issues with the __consumer_offsets topic, which does not use the batch cache. (cherry picked from commit 0ed8d64)
Indexes that have overflowed should not be used, because seeking within is somewhat undefined when considering the potential of overflowed uint32_t offset delta values. This commit tweaks find_nearest() to return the first indexed entry if this is detected. This allows Redpanda to safely proceed even when there are segment indexes that have overflowed from previous versions of Redpanda. Without this change, the included test would fail with: ``` src/v/storage/tests/index_state_test.cc(360): Entering test case "index_overflow" src/v/storage/tests/index_state_test.cc(377): error: in "index_overflow": check res->offset == model::offset{0} has failed [9 != 0] src/v/storage/tests/index_state_test.cc(378): error: in "index_overflow": check res->filepos == 1 has failed [4 != 1] src/v/storage/tests/index_state_test.cc(360): Leaving test case "index_overflow"; testing time: 77us ``` (cherry picked from commit fc1e8bf)
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
Retry command for Build#65428please wait until all jobs are finished before running the slash command
|
CI test resultstest results on build#65428
|
Backport of PR #25979