Skip to content

Refactor Shred::Payload to use Bytes #7049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 12, 2025

Conversation

cpubot
Copy link

@cpubot cpubot commented Jul 18, 2025

Problem

shred::Payload is currently set up as the sum:

pub enum Payload {
    Shared(Arc<Vec<u8>>),
    Unique(Vec<u8>),
}

As we migrate more components of the codebase with similar semantics to Bytes, shred::Payload is a great candidate.

There are multiple places in the codebase that would benefit from being able to cheaply clone shred::Payloads or already expect Bytes where Payloads are being manually converted (or both) (just a couple examples):

// When `shred::Payload` is refactored to use `Bytes`, this can be adjusted to simply pass the payload
// `Bytes` directly to the XDP thread(s).

agave/ledger/src/shred.rs

Lines 429 to 436 in c87ab7c

pub fn to_packet(&self) -> BytesPacket {
let buffer: &[u8] = match self.payload() {
Payload::Shared(bytes) => bytes.as_ref(),
Payload::Unique(bytes) => bytes.as_ref(),
};
let buffer = Bytes::copy_from_slice(buffer);
BytesPacket::new(buffer, Meta::default())
}

let shred = Bytes::from(shred::Payload::unwrap_or_clone(shred.clone()));

let shred = Bytes::from(shred::Payload::unwrap_or_clone(shred));

Many of these paths deal with shreds generated upstream that are always of the Unique variant, meaning they're always cloned (heap alloc + copy) unless some clever workaround is employed. There is little to no benefit to preserving a Unique variant -- being able to rely on cheap clones of Payloads anywhere in the codebase trumps any overhead associated with Arc or Bytes in the small subset of cases where shared references aren't needed (unclear to me that those actually exist).

Summary of Changes

This updates shred::Payload to simply be a wrapper around Bytes with some convenience methods that allow it to be dropped into the existing codebase with very minimal code changes.

Notably, there are code paths that need mutable access to the shred's payload, as they "incrementally" build up the payload in various steps. Given this, and Bytes being immutable, a little creativity was required to get this to work as a "drop in" replacement. All the heavy lifting here is done by a helper, PayloadMutGuard, which can effectively simulate a mutable Bytes API without relying on any enums or separate payload "builder" steps. It's well documented in the code, but, in short, it temporarily moves the payload's Bytes instance into a BytesMut instance, which receives all the writes. Upon drop, that BytesMut instance is frozen and set as the payload's Bytes reference. Because payload mutation only happens during this incremental shred construction phase, all Bytes instances should be unique, meaning PayloadMutGuard usage (and its Bytes -> BytesMut conversion) shouldn't entail any copying or allocation.

@codecov-commenter
Copy link

codecov-commenter commented Jul 19, 2025

Codecov Report

❌ Patch coverage is 84.94624% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.3%. Comparing base (d25ae71) to head (e45b7c2).

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #7049    +/-   ##
========================================
  Coverage    83.3%    83.3%            
========================================
  Files         798      798            
  Lines      363061   363101    +40     
========================================
+ Hits       302444   302568   +124     
+ Misses      60617    60533    -84     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cpubot cpubot requested review from alessandrod and vadorovsky July 19, 2025 21:52
///
/// If the payload is unique (single reference), this will not perform any copying. Otherwise it
/// will. As such, take care to avoid performing this conversion if the payload is not unique.
pub fn get_mut<I>(&mut self, index: I) -> Option<PayloadMutGuard<'_, I>>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it wise to have as_mut and get_mut methods that might secretly Clone under the hood? get_mut usually implies a fast-ish getter method rather than something that might clone under the hood. I'd rather work with try_get_mut that would error when another reference exists, preventing accidental creation of copies. And if someone is really happy with a copy made, they can explicitly call .clone() followed up by try_get_mut().unwrap(). What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly understand where you're coming from. The reason I didn't go with the try_get_mut is because it would introduce a Result type in all places of the codebase that mutate shred payloads. And the practical reality is that payloads should be unique while we're building them (i.e., in all places that actually perform mutation), and even if we have an edge case where it isn't unique (I do not know if such a case exists), it's probably not correct to error.

I'd rather work with try_get_mut that would error when another reference exists

I can understand how this might be true in some scenarios, but I do not think this is one. Using Result would make these parts of the codebase syntactically obtuse. If a part of the codebase must mutate a shred payload, erroring when a copy would be introduced is probably not semantically correct -- the Result would almost certainly need to be explicitly handled, and as such the copy done explicitly.

let payload = match shred.payload.try_get_mut() {
    Ok(p) => p,
    Err(_) => shred.payload.copy_into_mut()
};

Introducing an API like this only makes building shred payloads more cumbersome by introducing semantics that we practically wont be dealing with.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see your point. Either way making as_mut do allocations sounds like a massive antipattern. Can we name it otherwise so it is explicitly separated from https://doc.rust-lang.org/std/convert/trait.AsMut.html ? Like, get_mut_or_clone or whatever to imply that this can get expensive?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the practical reality is that payloads should be unique while we're
building them

Can we assert then? I agree that Result is likely overkill, but doing implicit
copies accidentally would also be bad.

It's frustrating because the code that builds shreds is so involved, it should
be refactored to just build into BytesMut. That doesn't seem like a change for
this PR tho, nor necessarily something that we should be doing in the perf team. @alexpyattaev this sounds more like something you should have in the net
team backlog.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can live with this, I'll be hiding the whole shred module behind agave-unstable-api anyway, so we can clean it up in due course. But some kind of guardrails (debug_assert!) would be great to have!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the practical reality is that payloads should be unique while we're
building them

Can we assert then? I agree that Result is likely overkill, but doing implicit copies accidentally would also be bad.

Sure, sounds reasonable to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up on this.

Adding an assert!(self.bytes.is_unique()) appears to work with fairly minimal changes to actual blockstore / shred code.

These appear to be the only locations that actually require unique copies for shred::recover.

return Some(shred.as_ref().clone());

return Some(shred.as_ref().clone());

That being said, there are multiple tests in the codebase that rely on implicit copy-on-write semantics, and as such, will in fact panic when assert! is used in *_mut methods. This requires adding additional methods to Payload specifically to allow genuine copying or bypassing the asserts. For example: https://github.com/cpubot/agave/compare/shred-payload-to-bytes...cpubot:agave:shred-payload-bytes-mut-assert?expand=1. This feels rather unfortunate given we know there are only a few isolated cases in which shred payloads actually need to be mutated.

It's also worth noting that the copy-on-write-if-not-unique semantics implied by Bytes -> BytesMut are already present in the master code via Arc::make_mut, so we are not really introducing new behavior here.

Self::Shared(bytes) => Arc::make_mut(bytes),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, let us take the small win for now. If #7413 lands before 3.0 we will be able to make any changes in shred module without care for API stability.

@cpubot cpubot force-pushed the shred-payload-to-bytes branch from b8df403 to 3fd4c31 Compare August 6, 2025 21:16
Copy link

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. The only question is whether we can make the guard assert that the shred is unique.

///
/// If the payload is unique (single reference), this will not perform any copying. Otherwise it
/// will. As such, take care to avoid performing this conversion if the payload is not unique.
pub fn get_mut<I>(&mut self, index: I) -> Option<PayloadMutGuard<'_, I>>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the practical reality is that payloads should be unique while we're
building them

Can we assert then? I agree that Result is likely overkill, but doing implicit
copies accidentally would also be bad.

It's frustrating because the code that builds shreds is so involved, it should
be refactored to just build into BytesMut. That doesn't seem like a change for
this PR tho, nor necessarily something that we should be doing in the perf team. @alexpyattaev this sounds more like something you should have in the net
team backlog.

@alexpyattaev
Copy link

@cpubot I'm happy to approve this once conflicts are resolved and Alessandro's nits are addressed.

@cpubot cpubot requested a review from a team as a code owner August 8, 2025 20:08
Copy link

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll be doing some cleanup on this.

@alexpyattaev alexpyattaev merged commit 407c9f1 into anza-xyz:master Aug 12, 2025
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants