Skip to content

Do something useful with SUBSCRIBE_DONE #609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 11, 2024
42 changes: 22 additions & 20 deletions draft-ietf-moq-transport.md
Original file line number Diff line number Diff line change
Expand Up @@ -739,7 +739,7 @@ For successful subscriptions, the publisher maintains a list of
subscribers for each track. Each new OBJECT belonging to the
track within the subscription range is forwarded to each active
subscriber, dependent on the congestion response. A subscription
remains active until the publisher of the track terminates the
remains active until soon after the publisher of the track terminates the
subscription with a SUBSCRIBE_DONE (see {{message-subscribe-done}}).

A caching relay saves Objects to its cache identified by the Object's
Expand Down Expand Up @@ -1357,8 +1357,7 @@ See {{priorities}}.
A subscriber issues a `UNSUBSCRIBE` message to a publisher indicating it is no
longer interested in receiving media for the specified track and Objects
should stop being sent as soon as possible. The publisher sends a
SUBSCRIBE_DONE to acknowledge the unsubscribe was successful and indicate
the final Object.
SUBSCRIBE_DONE to acknowledge the unsubscribe was successful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird to me that it's named SUBSCRIBE_DONE for what is defined as UNSUBSCRIBE_ACK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just an UNSUBSCRIBE_ACK. It's also used when the SUBSCRIBE reaches its end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that redundant with object status 0x4/end of track?

Copy link
Contributor

@TimEvens TimEvens Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't believe it adds much value to ack unsubscribe. It should be assumed that it was successful as it was transmitted over a reliable control stream, which had to be processed by the remote otherwise the control stream would be invalidated. If the remote doesn't do anything with the unsubscribe (e.g., noop the message) and continues to publish data, the relay should then respond with error. I don't recommend using protocol violation in this case as that would close a connection causing a potential large impact that spans more than one track. This is a track error and should, IMHO, only affect that track. With QUIC, either side can reset a stream regardless of being bidir or unidir, right? I would suggest that we need a track level error message that is followed by a RESET of the incoming stream. RESET of stream could indicate error code as well.

Out of scope for this PR but keep in mind as it relates, what happens when an app sends end of track or end of group object status and continues to send data that would violate that status???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not redundant to end of track, as other objects could be delivered later and out of order, and sometimes the subscribe ends before the track does


The format of `UNSUBSCRIBE` is as follows:

Expand Down Expand Up @@ -1781,9 +1780,26 @@ FETCH_ERROR

## SUBSCRIBE_DONE {#message-subscribe-done}

A publisher sends a `SUBSCRIBE_DONE` message to indicate it is done publishing
Objects for that subscription. The Status Code indicates why the subscription ended,
and whether it was an error.
A publisher sends a `SUBSCRIBE_DONE` message to indicate it will not open any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strongly prefer the "will not send any more objects" version of this.

additional streams or send any more datagrams for a subscription. When all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So lets say I have a track with a single group and single sub group and a million objects in the group. (this might be a common patter for deliver software updates or AI models updates. This new text say the DONE would be sent as soon as the subscribe was received. This seems very wrong and will cause weird race conditions when the DONE was sent after the publisher opened the last stream, but the subscriber got the DONE before they new stream had been opened.

Copy link
Contributor Author

@martinduke martinduke Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not what it says at all. A few paragraphs down, it says:

"A sender MUST NOT send SUBSCRIBE_DONE until it has closed all streams it will
ever open, and has no pending datagrams, for a subscription."

Race conditions are inevitable, even with that rule, because we can't guarantee the order in which QUIC will deliver things. That's why there has to be either a timer or a careful accounting of what streams have been opened.

Closing streams gives the receiver all the information it needs, except if there are more streams pending, hence the SUBSCRIBE_DONE. Delaying sending it till the sender has closed the streams just reduces the race condition. It doesn't eliminate it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So

"A sender MUST NOT send SUBSCRIBE_DONE until it has closed all streams it will
ever open, and has no pending datagrams, for a subscription."

and

"A publisher sends a SUBSCRIBE_DONE message to indicate it will not open any
additional streams or send any more datagrams for a subscription."

Seem to contradict each other about when DONE is sent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is confusing and I'll fix it.

What I'm getting at is this: we don't need a FETCH_DONE because it's all on one stream, so the FIN/RESET is adequate to communicate the end of the FETCH. If the subscriber somehow knew for sure how many streams would open for a SUBSCRIBE, then stream FIN/RESETs would be adequate to know when to destroy subscription state.

SUBSCRIBE_DONE is only needed because it doesn't know for sure; the sender has to tell it that it's done opening streams, so once the current crop of streams is closed it's safe to tear down.

Unfortunately, it is quite likely that SUBSCRIBE_DONE outraces the last stream openings. So either there has to be a timer, or there has to be some indication of how many streams there were (which would levy new requirements on QUIC/WebTransport)

streams for the subscription are fully closed, each endpoint can destroy its
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the phrase "fully closed" , do you mean stream fin or does it cover stream reset cases too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It covers Reset -- I'm not sure how it could work just on FIN

subscription state.

Note that some objects in the subscribed groups might never be delivered,
because a stream was reset, or never opened in the first place, due to the
delivery timeout.

A sender MUST NOT send SUBSCRIBE_DONE until it has closed all streams it will
ever open, and has no pending datagrams, for a subscription. After sending
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ever open, and has no pending datagrams, for a subscription. After sending
ever open, and has no unsent datagrams, for a subscription. After sending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC Datagrams can't be pending at the application level. If QUIC can't take them, they just drop on the floor?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quicr relay queues datagram packets at the applications layer and only passing them to QUIC layer when there is space in the CWIN for them. This allows the relay to control the times outs and priorities for object sent with datagram. I think that will deliver a much better user experience than something that just drops them on the floor (thought both implementations are valid spec compliant AFAIC )

SUBSCRIBE_DONE, the sender can immediately destroy subscription state, although
stream state may persist until delivery completes.

A subscriber that receives SUBSCRIBE_DONE SHOULD set a timer of at least 2 seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds sounds insanely fragile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to suggest a different timeout, or one of the other proposals discussed in this thread? I'm open to any of them but was reluctant to impose more requirements on the QUIC/WebTransport stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect of the timer expiring early is that some objects may be dropped on the floor instead of delivered to the application.

If the receiver is interested in objects beyond that, it should set a longer timer.

in case some datagrams or unopened streams are still inbound due to
prioritization or packet loss. Once the timer has expired, the receiver destroys
subscription state once all open streams for the subscription have closed. A
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Once the timer has expired, the receiver can cleanup its subscription state"

I don't think we need to have "once al open streams for the subscriptions have closed".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a stream were still open, it would be smart to continue, no?

subscriber MAY discard subscription state earlier, at the cost of potentially not
delivering some late objects to the application.

The format of `SUBSCRIBE_DONE` is as follows:

Expand All @@ -1795,9 +1811,6 @@ SUBSCRIBE_DONE Message {
Status Code (i),
Reason Phrase Length (i),
Reason Phrase (..),
ContentExists (8),
[Final Group (i)],
[Final Object (i)],
}
~~~
{: #moq-transport-subscribe-fin-format title="MOQT SUBSCRIBE_DONE Message"}
Expand All @@ -1808,17 +1821,6 @@ SUBSCRIBE_DONE Message {

* Reason Phrase: Provides the reason for subscription error.

* ContentExists: 1 if an object has been published for this subscription, 0 if
not. If 0, then the Final Group and Final Object fields will not be present.
Any other value is a protocol error and MUST terminate the session with a
Protocol Violation ({{session-termination}}).

* Final Group: The largest Group ID sent by the publisher in an OBJECT
message in this track.

* Final Object: The largest Object ID sent by the publisher in an OBJECT
message in the `Final Group` for this track.

## MAX_SUBSCRIBE_ID {#message-max-subscribe-id}

A publisher sends a MAX_SUBSCRIBE_ID message to increase the number of
Expand Down
Loading