Skip to content

Do something useful with SUBSCRIBE_DONE #609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 11, 2024
Merged

Do something useful with SUBSCRIBE_DONE #609

merged 11 commits into from
Dec 11, 2024

Conversation

martinduke
Copy link
Contributor

@martinduke martinduke commented Nov 7, 2024

Fixes #424
Fixes #456
Fixes #465

Comes up with some rules on when to send SUBSCRIBE_DONE, and what to do with it.

Deletes the ContentExists and Final Group/Object fields, because there is nothing the receiver does with it.

@martinduke
Copy link
Contributor Author

Note: Another design would not use a timer at the receiver. Instead, it would require that all reset streams use RESET_STREAM_AT with at least the track alias delivered reliably, and the SUBSCRIBE_DONE would indicate how many streams were opened over the life of the subscription.

Copy link
Collaborator

@afrind afrind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individual Review:

I can live with using a timer, but I prefer putting an opened stream count in SUBSCRIBE_DONE and requiring any reset streams to use RELIABLE_RESET_STREAM with an offset >= the stream header.

destroy subscription state. The QUIC layer might still be resolving the
closing of the stream on the wire.

A receiver that receives SUBSCRIBE_DONE SHOULD set a timer of at least 2 seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an exception if the SUBSCRIBE_DONE has code UNSUBSCRIBED (eg: the subscriber already expressed disinterest)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the text says, the receiver MAY delete early if the application has lost interest in the data.

I don't think we need special language for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds seems arbitrary. Can we use 3 * PTO or leave it open to implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is totally arbitrary. The MoQ application is unlikely to have an API for the PTO, but there's no number I'd be unhappy with.

Copy link
Collaborator

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but I'm curious why you removed the Final Group and Final Object?

delivery timeout.

A sender MUST NOT send SUBSCRIBE_DONE until it has closed all streams it will
ever open, and has no pending datagrams, for a subscription. After sending
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ever open, and has no pending datagrams, for a subscription. After sending
ever open, and has no unsent datagrams, for a subscription. After sending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC Datagrams can't be pending at the application level. If QUIC can't take them, they just drop on the floor?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quicr relay queues datagram packets at the applications layer and only passing them to QUIC layer when there is space in the CWIN for them. This allows the relay to control the times outs and priorities for object sent with datagram. I think that will deliver a much better user experience than something that just drops them on the floor (thought both implementations are valid spec compliant AFAIC )

destroy subscription state. The QUIC layer might still be resolving the
closing of the stream on the wire.

A receiver that receives SUBSCRIBE_DONE SHOULD set a timer of at least 2 seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds seems arbitrary. Can we use 3 * PTO or leave it open to implementations?

Copy link
Collaborator

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot I would like to change the 2 second constant before merging this.

@martinduke
Copy link
Contributor Author

I thought of a third way this could work:

If we got WebTransport APIs to expose the QUIC Stream ID of a stream, we could include the largest stream ID opened on behalf of a subscription. If all unidirectional sender-opened streams up to that ID have been observed, it's safe to destroy the state.

and whether it was an error.
A publisher sends a `SUBSCRIBE_DONE` message to indicate it will not open any
additional streams or send any more datagrams for a subscription. When all
streams for the subscription are fully closed, each endpoint can destroy its
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the phrase "fully closed" , do you mean stream fin or does it cover stream reset cases too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It covers Reset -- I'm not sure how it could work just on FIN

A subscriber that receives SUBSCRIBE_DONE SHOULD set a timer of at least 2 seconds
in case some datagrams or unopened streams are still inbound due to
prioritization or packet loss. Once the timer has expired, the receiver destroys
subscription state once all open streams for the subscription have closed. A
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Once the timer has expired, the receiver can cleanup its subscription state"

I don't think we need to have "once al open streams for the subscriptions have closed".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a stream were still open, it would be smart to continue, no?

Copy link
Collaborator

@fluffy fluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this would work.

A publisher sends a `SUBSCRIBE_DONE` message to indicate it is done publishing
Objects for that subscription. The Status Code indicates why the subscription ended,
and whether it was an error.
A publisher sends a `SUBSCRIBE_DONE` message to indicate it will not open any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strongly prefer the "will not send any more objects" version of this.

Objects for that subscription. The Status Code indicates why the subscription ended,
and whether it was an error.
A publisher sends a `SUBSCRIBE_DONE` message to indicate it will not open any
additional streams or send any more datagrams for a subscription. When all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So lets say I have a track with a single group and single sub group and a million objects in the group. (this might be a common patter for deliver software updates or AI models updates. This new text say the DONE would be sent as soon as the subscribe was received. This seems very wrong and will cause weird race conditions when the DONE was sent after the publisher opened the last stream, but the subscriber got the DONE before they new stream had been opened.

Copy link
Contributor Author

@martinduke martinduke Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not what it says at all. A few paragraphs down, it says:

"A sender MUST NOT send SUBSCRIBE_DONE until it has closed all streams it will
ever open, and has no pending datagrams, for a subscription."

Race conditions are inevitable, even with that rule, because we can't guarantee the order in which QUIC will deliver things. That's why there has to be either a timer or a careful accounting of what streams have been opened.

Closing streams gives the receiver all the information it needs, except if there are more streams pending, hence the SUBSCRIBE_DONE. Delaying sending it till the sender has closed the streams just reduces the race condition. It doesn't eliminate it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So

"A sender MUST NOT send SUBSCRIBE_DONE until it has closed all streams it will
ever open, and has no pending datagrams, for a subscription."

and

"A publisher sends a SUBSCRIBE_DONE message to indicate it will not open any
additional streams or send any more datagrams for a subscription."

Seem to contradict each other about when DONE is sent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is confusing and I'll fix it.

What I'm getting at is this: we don't need a FETCH_DONE because it's all on one stream, so the FIN/RESET is adequate to communicate the end of the FETCH. If the subscriber somehow knew for sure how many streams would open for a SUBSCRIBE, then stream FIN/RESETs would be adequate to know when to destroy subscription state.

SUBSCRIBE_DONE is only needed because it doesn't know for sure; the sender has to tell it that it's done opening streams, so once the current crop of streams is closed it's safe to tear down.

Unfortunately, it is quite likely that SUBSCRIBE_DONE outraces the last stream openings. So either there has to be a timer, or there has to be some indication of how many streams there were (which would levy new requirements on QUIC/WebTransport)

SUBSCRIBE_DONE, the sender can immediately destroy subscription state, although
stream state may persist until delivery completes.

A subscriber that receives SUBSCRIBE_DONE SHOULD set a timer of at least 2 seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds sounds insanely fragile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to suggest a different timeout, or one of the other proposals discussed in this thread? I'm open to any of them but was reluctant to impose more requirements on the QUIC/WebTransport stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect of the timer expiring early is that some objects may be dropped on the floor instead of delivered to the application.

If the receiver is interested in objects beyond that, it should set a longer timer.

@martinduke
Copy link
Contributor Author

I went with the timer design because I thought it would be easy to write. But it turns out it's confusing to people, so I'm starting to think that we either require RESET_STREAM_AT or expose QUIC stream IDs to the application, and get rid of the timer.

@ianswett
Copy link
Collaborator

I'm fine with this design, minus explicitly specifying a timeout.

@mengelbart
Copy link
Contributor

I think RESET_STREAM_AT might work for QUIC, but if I understand WebTransport correctly, it doesn't expose it to the application and only uses it for the WebTransport session IDs. Referencing the highest stream ID sounds cleaner to me than setting an arbitrary timeout, but if that is hard for some QUIC APIs, I'm fine with a timeout.

Another option I could think of is to send SUBSCRIBE_DONE on a separate last stream which must not be reset. But that would mean we no longer send all control messages on the control stream...

@mengelbart
Copy link
Contributor

oh and the separate stream also requires exposing the stream ID...

@martinduke
Copy link
Contributor Author

Another option I could think of is to send SUBSCRIBE_DONE on a separate last stream which must not be reset. But that would mean we no longer send all control messages on the control stream...

Even if that stream has a very low priority, I don't think we can guarantee that it arrives after any object streams that are sent around the same time.

@martinduke
Copy link
Contributor Author

I think RESET_STREAM_AT might work for QUIC, but if I understand WebTransport correctly, it doesn't expose it to the application and only uses it for the WebTransport session IDs.

It's not clear to me if WebTransport is specifying the APIs, but to me this is less of an IETF question and more a "what are WebTransport implementers willing to do" question.

@martinduke
Copy link
Contributor Author

@vasilvv points out that the Webtrans API has a signal when the stream FIN has been ACKed. This makes everything way easier.

@afrind
Copy link
Collaborator

afrind commented Nov 12, 2024

Individual Comment:

I don't like using FIN-ACK:

  1. It moves the waiting period from the subscriber to the publisher. The high-scale/long wait cases are likely when relays publish fan-out to client subscribers, and the relay is the one with reason to free up resources faster. The publisher otherwise has no reason to keep this state around but the subscriber might.
  2. Receiving a FIN-ACK does not mean the MoQ layer has received the stream, only the QUIC stack, which could be in the peer's kernel (or someday, NIC). The SUBSCRIBE_DONE could still be racing delivery of the QUIC data and result in missed objects. HTTP/3 explicitly chose not to use QUIC signals in QPACK for these reasons.
  3. The subscriber doesn't have to wait around after SUBSCRIBE_DONE if it is uninterested in remaining data, but by withholding the signal, it doesn't get a choice.

I still prefer using RESET_STREAM_AT -- the downside from my perspective is pushing WebTransport to include this surface, which I think they should anyway, and perhaps that raw QUIC implementations MUST negotiate this extension. Timers are inelegant and a waste when everything was already received.

Copy link
Collaborator

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I assume this works with existing WebTransport APIs?

@martinduke
Copy link
Contributor Author

LGTM. I assume this works with existing WebTransport APIs?

It's in both the spec and our implementation.

Copy link
Contributor

@TimEvens TimEvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments... but I'm unclear on how SUBSCRIBE_DONE is to be used with Object Status type 0x4 which indicates end of track. If the publisher sends end of track object status, isn't that the same as SUBSCRIBE_DONE?

@@ -1358,8 +1358,7 @@ See {{priorities}}.
A subscriber issues a `UNSUBSCRIBE` message to a publisher indicating it is no
longer interested in receiving media for the specified track and Objects
should stop being sent as soon as possible. The publisher sends a
SUBSCRIBE_DONE to acknowledge the unsubscribe was successful and indicate
the final Object.
SUBSCRIBE_DONE to acknowledge the unsubscribe was successful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird to me that it's named SUBSCRIBE_DONE for what is defined as UNSUBSCRIBE_ACK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just an UNSUBSCRIBE_ACK. It's also used when the SUBSCRIBE reaches its end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that redundant with object status 0x4/end of track?

Copy link
Contributor

@TimEvens TimEvens Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't believe it adds much value to ack unsubscribe. It should be assumed that it was successful as it was transmitted over a reliable control stream, which had to be processed by the remote otherwise the control stream would be invalidated. If the remote doesn't do anything with the unsubscribe (e.g., noop the message) and continues to publish data, the relay should then respond with error. I don't recommend using protocol violation in this case as that would close a connection causing a potential large impact that spans more than one track. This is a track error and should, IMHO, only affect that track. With QUIC, either side can reset a stream regardless of being bidir or unidir, right? I would suggest that we need a track level error message that is followed by a RESET of the incoming stream. RESET of stream could indicate error code as well.

Out of scope for this PR but keep in mind as it relates, what happens when an app sends end of track or end of group object status and continues to send data that would violate that status???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not redundant to end of track, as other objects could be delivered later and out of order, and sometimes the subscribe ends before the track does


1. All DATAGRAMs sent for the subscription have reached their expiration time.

2. All data streams have either been reset, or the stream API has provided a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stream API clarified? Suggest rewording as that makes an assumption of an API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you reword it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the context here is the QUIC transport stack/API, correct? If so, I would recommend "QUIC transport" instead of "stream API"

notification that the FIN has been ACKed.

3. Any data streams closed with RESET_STREAM_AT have received notification from
the stream API that the reliable size has been acknowledged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same regarding using "stream API". It suggests an implementation that this protocol doesn't appear to define.

@martinduke
Copy link
Contributor Author

Minor comments... but I'm unclear on how SUBSCRIBE_DONE is to be used with Object Status type 0x4 which indicates end of track. If the publisher sends end of track object status, isn't that the same as SUBSCRIBE_DONE?

No. If the EndOfTrack object is delivered, there might still be other streams out there that are open and delivering earlier objects, or even streams that haven't been opened yet.

@TimEvens
Copy link
Contributor

TimEvens commented Nov 14, 2024

Minor comments... but I'm unclear on how SUBSCRIBE_DONE is to be used with Object Status type 0x4 which indicates end of track. If the publisher sends end of track object status, isn't that the same as SUBSCRIBE_DONE?

No. If the EndOfTrack object is delivered, there might still be other streams out there that are open and delivering earlier objects, or even streams that haven't been opened yet.

@martinduke , that needs to be clarified as that's unclear as described. If I understand your statement above, would it be correct to rephrase it as:

"End of track" (EoT) indicates that this group, subgroup, and object id indicates the last in the track. The track alias cannot be reused after this status. Any group, subgroup, object id that is greater than what was indicated at the time of EoT are invalid and would result in a track error. Relays that receive EoT should mark the track as done, but linger it for some period of time to allow for missing/gap data that is inflight and as mentioned in #587? If by chance the relay or receiving client is tracking gaps and detects no gaps upon receiving EoT, they can immediately close the track without having to wait a linger period for inflight groups/subgroups pending.

Based on object status EoT, I'm for us removing SUBSCRIBE_DONE as it really isn't needed for an ACK to unsubscribe and it doesn't add any more value than EoT. Unless.... you are saying that SUBSCRIBE_DONE is an abrupt close (reset/purge) of an active track... If we want to immediately close all inflight transmissions and not wait for them to finish, then that is a difference from EoT as I understand it , but I don't think that is the case here. IMO, more is clarification needed.

@martinduke
Copy link
Contributor Author

After more thought, I really think #628 (Alan's stream-counting PR) is a superior approach, as it gives the option for the client to short-cut its timer. Some were concerned about messing up counting; worst case is it falls back to using the timer, which is this proposal.

@ianswett ianswett added the Design Issues or PRs that change how MoQ works including the wire format. label Dec 10, 2024
@ianswett ianswett merged commit 8911b3c into main Dec 11, 2024
2 checks passed
ianswett added a commit that referenced this pull request Dec 11, 2024
This is an alternative to #609 that requires a dependency on
RESET_STREAM_AT.

Some lines reflowed so the delta looks larger than it is.
@englishm englishm mentioned this pull request Feb 25, 2025
39 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design Issues or PRs that change how MoQ works including the wire format.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SUBSCRIBE_DONE logic is gruesome SUBSCRIBE_DONE final object SUBSCRIBE_DONE and terminal state
7 participants