-
Notifications
You must be signed in to change notification settings - Fork 660
Refactor Shred::Payload to use Bytes #7049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7049 +/- ##
========================================
Coverage 83.3% 83.3%
========================================
Files 798 798
Lines 363061 363101 +40
========================================
+ Hits 302444 302568 +124
+ Misses 60617 60533 -84 🚀 New features to boost your workflow:
|
/// | ||
/// If the payload is unique (single reference), this will not perform any copying. Otherwise it | ||
/// will. As such, take care to avoid performing this conversion if the payload is not unique. | ||
pub fn get_mut<I>(&mut self, index: I) -> Option<PayloadMutGuard<'_, I>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it wise to have as_mut
and get_mut
methods that might secretly Clone under the hood? get_mut
usually implies a fast-ish getter method rather than something that might clone under the hood. I'd rather work with try_get_mut
that would error when another reference exists, preventing accidental creation of copies. And if someone is really happy with a copy made, they can explicitly call .clone()
followed up by try_get_mut().unwrap()
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly understand where you're coming from. The reason I didn't go with the try_get_mut
is because it would introduce a Result
type in all places of the codebase that mutate shred payloads. And the practical reality is that payloads should be unique while we're building them (i.e., in all places that actually perform mutation), and even if we have an edge case where it isn't unique (I do not know if such a case exists), it's probably not correct to error.
I'd rather work with try_get_mut that would error when another reference exists
I can understand how this might be true in some scenarios, but I do not think this is one. Using Result
would make these parts of the codebase syntactically obtuse. If a part of the codebase must mutate a shred payload, erroring when a copy would be introduced is probably not semantically correct -- the Result
would almost certainly need to be explicitly handled, and as such the copy done explicitly.
let payload = match shred.payload.try_get_mut() {
Ok(p) => p,
Err(_) => shred.payload.copy_into_mut()
};
Introducing an API like this only makes building shred payloads more cumbersome by introducing semantics that we practically wont be dealing with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see your point. Either way making as_mut
do allocations sounds like a massive antipattern. Can we name it otherwise so it is explicitly separated from https://doc.rust-lang.org/std/convert/trait.AsMut.html ? Like, get_mut_or_clone
or whatever to imply that this can get expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the practical reality is that payloads should be unique while we're
building them
Can we assert then? I agree that Result is likely overkill, but doing implicit
copies accidentally would also be bad.
It's frustrating because the code that builds shreds is so involved, it should
be refactored to just build into BytesMut. That doesn't seem like a change for
this PR tho, nor necessarily something that we should be doing in the perf team. @alexpyattaev this sounds more like something you should have in the net
team backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can live with this, I'll be hiding the whole shred module behind agave-unstable-api anyway, so we can clean it up in due course. But some kind of guardrails (debug_assert!) would be great to have!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the practical reality is that payloads should be unique while we're
building themCan we assert then? I agree that Result is likely overkill, but doing implicit copies accidentally would also be bad.
Sure, sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up on this.
Adding an assert!(self.bytes.is_unique())
appears to work with fairly minimal changes to actual blockstore / shred code.
These appear to be the only locations that actually require unique copies for shred::recover
.
agave/ledger/src/blockstore.rs
Line 785 in 04a8483
return Some(shred.as_ref().clone()); |
agave/ledger/src/blockstore.rs
Line 814 in 04a8483
return Some(shred.as_ref().clone()); |
That being said, there are multiple tests in the codebase that rely on implicit copy-on-write semantics, and as such, will in fact panic when assert!
is used in *_mut
methods. This requires adding additional methods to Payload
specifically to allow genuine copying or bypassing the asserts. For example: https://github.com/cpubot/agave/compare/shred-payload-to-bytes...cpubot:agave:shred-payload-bytes-mut-assert?expand=1. This feels rather unfortunate given we know there are only a few isolated cases in which shred payloads actually need to be mutated.
It's also worth noting that the copy-on-write-if-not-unique semantics implied by Bytes -> BytesMut
are already present in the master
code via Arc::make_mut
, so we are not really introducing new behavior here.
agave/ledger/src/shred/payload.rs
Line 24 in 04a8483
Self::Shared(bytes) => Arc::make_mut(bytes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, let us take the small win for now. If #7413 lands before 3.0 we will be able to make any changes in shred module without care for API stability.
b8df403
to
3fd4c31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. The only question is whether we can make the guard assert that the shred is unique.
/// | ||
/// If the payload is unique (single reference), this will not perform any copying. Otherwise it | ||
/// will. As such, take care to avoid performing this conversion if the payload is not unique. | ||
pub fn get_mut<I>(&mut self, index: I) -> Option<PayloadMutGuard<'_, I>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the practical reality is that payloads should be unique while we're
building them
Can we assert then? I agree that Result is likely overkill, but doing implicit
copies accidentally would also be bad.
It's frustrating because the code that builds shreds is so involved, it should
be refactored to just build into BytesMut. That doesn't seem like a change for
this PR tho, nor necessarily something that we should be doing in the perf team. @alexpyattaev this sounds more like something you should have in the net
team backlog.
@cpubot I'm happy to approve this once conflicts are resolved and Alessandro's nits are addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll be doing some cleanup on this.
Problem
shred::Payload
is currently set up as the sum:As we migrate more components of the codebase with similar semantics to
Bytes
,shred::Payload
is a great candidate.There are multiple places in the codebase that would benefit from being able to cheaply clone
shred::Payload
s or already expectBytes
wherePayload
s are being manually converted (or both) (just a couple examples):agave/turbine/src/broadcast_stage.rs
Lines 484 to 485 in c87ab7c
agave/ledger/src/shred.rs
Lines 429 to 436 in c87ab7c
agave/turbine/src/broadcast_stage.rs
Line 557 in c87ab7c
agave/turbine/src/retransmit_stage.rs
Line 423 in c87ab7c
Many of these paths deal with shreds generated upstream that are always of the
Unique
variant, meaning they're always cloned (heap alloc + copy) unless some clever workaround is employed. There is little to no benefit to preserving aUnique
variant -- being able to rely on cheap clones ofPayload
s anywhere in the codebase trumps any overhead associated withArc
orBytes
in the small subset of cases where shared references aren't needed (unclear to me that those actually exist).Summary of Changes
This updates
shred::Payload
to simply be a wrapper aroundBytes
with some convenience methods that allow it to be dropped into the existing codebase with very minimal code changes.Notably, there are code paths that need mutable access to the shred's payload, as they "incrementally" build up the payload in various steps. Given this, and
Bytes
being immutable, a little creativity was required to get this to work as a "drop in" replacement. All the heavy lifting here is done by a helper,PayloadMutGuard
, which can effectively simulate a mutableBytes
API without relying on any enums or separate payload "builder" steps. It's well documented in the code, but, in short, it temporarily moves the payload'sBytes
instance into aBytesMut
instance, which receives all the writes. Upon drop, thatBytesMut
instance is frozen and set as the payload'sBytes
reference. Because payload mutation only happens during this incremental shred construction phase, allBytes
instances should be unique, meaningPayloadMutGuard
usage (and itsBytes
->BytesMut
conversion) shouldn't entail any copying or allocation.