Skip to content

storage: fix index state truncate overflow #26051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions src/v/storage/index_state.cc
Original file line number Diff line number Diff line change
Expand Up @@ -759,12 +759,21 @@ bool index_state::truncate(
if (new_max_offset < base_offset) {
return needs_persistence;
}
const uint32_t i = new_max_offset() - base_offset();
auto res = index.offset_lower_bound(i);
size_t remove_back_elems = index.size() - res.value_or(index.size());
if (remove_back_elems > 0) {
needs_persistence = true;
pop_back(remove_back_elems);
static constexpr int64_t u32_max = std::numeric_limits<uint32_t>::max();
int64_t delta = new_max_offset() - base_offset();
// NOTE: we expect that deltas above u32_max would have not been
// added to the index, unless by an older buggy version of Redpanda.
// With this in mind, either there are no entries above u32_max, or
// offset_lower_bound() isn't going to work correctly anyway, so we just
// skip removal and rely on queries to detect the overflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we just skip removal and rely on queries to detect the overflow.

i'm not clear on what this means or how it works. the confusion i have is something like: if the truncation request is legit, but we ignore it because of the overflow, then how do we know that a later query would not be affected by having dropped a truncation request? presumably we rely on suffix truncation for raft correctness?

Copy link
Contributor Author

@andrwng andrwng May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that if entries aren't truncated here, it's because we are attempting to truncate to something beyond the uint32 delta limit. If that's the case:

Does that answer your question? Or are you getting at that there's still an edge case where a later query returns the wrong thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it looks ok. i'm less concerned about my question after read it again today

if (delta <= u32_max) {
auto i = static_cast<uint32_t>(delta);
auto res = index.offset_lower_bound(i);
size_t remove_back_elems = index.size() - res.value_or(index.size());
if (remove_back_elems > 0) {
needs_persistence = true;
pop_back(remove_back_elems);
}
}
if (new_max_offset < max_offset) {
needs_persistence = true;
Expand Down
33 changes: 33 additions & 0 deletions src/v/storage/tests/index_state_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -377,3 +377,36 @@ BOOST_AUTO_TEST_CASE(index_overflow) {
BOOST_CHECK_EQUAL(res->offset, model::offset{0});
BOOST_CHECK_EQUAL(res->filepos, 1);
}

BOOST_AUTO_TEST_CASE(index_overflow_truncate) {
storage::index_state state;

// Previous versions of Redpanda can have an index with offsets spanning
// greater than uint32 offset space. In these cases, the index is not
// reliable and truncation shouldn't attempt to truncate entries based on
// an overflown lookup. But, queries should fallback to returning the
// beginning of the index.
const model::timestamp dummy_ts;
const storage::offset_delta_time should_offset{false};
storage::offset_time_index time_idx{dummy_ts, should_offset};
constexpr long uint32_max = std::numeric_limits<uint32_t>::max();
state.add_entry(0, time_idx, 1);
state.add_entry(100, time_idx, 2);
state.add_entry(static_cast<uint32_t>(uint32_max + 1), time_idx, 3);
state.add_entry(static_cast<uint32_t>(uint32_max + 10), time_idx, 4);
state.max_offset = model::offset{uint32_max + 10};
BOOST_CHECK_EQUAL(4, state.size());

// The truncation shouldn't remove index entries, given the overflow.
auto needs_flush = state.truncate(
model::offset(uint32_max + 1), model::timestamp::now());
BOOST_CHECK(needs_flush);
BOOST_CHECK_EQUAL(4, state.size());
BOOST_CHECK_EQUAL(state.max_offset, model::offset{uint32_max + 1});

// Queries for the offset should start from the beginning of the segment.
auto res = state.find_nearest(model::offset(uint32_max + 1));
BOOST_REQUIRE(res.has_value());
BOOST_CHECK_EQUAL(res->offset, model::offset{0});
BOOST_CHECK_EQUAL(res->filepos, 1);
}