Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(s2n-quic-transport): Fixes s2n-quic TLS state incongruity when s2n-tls returns pending frequently #2574

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

maddeleine
Copy link
Contributor

@maddeleine maddeleine commented Mar 28, 2025

Release Summary:

Resolved issues:

resolves #2484

Description of changes:

Basically #2484 hit a debug assert failure where the TLS implementation had confirmed the handshake, but the s2n-quic space manager had not yet discarded its handshake space. In a normal handshake, the function handle_handshake_packet() calls update_crypto_state(), which eventually completes the handshake. Then the function on_processed packet() discards the handshake space.
However, in a weird handshake with a bunch of wakeups and Pendings returned from the TLS implementation, the handshake completes during the call to update_crypto_state() from the on_wakeup() function. And in that case the handshake space doesn't get discarded.

To fix this, I added code to discard the handshake space in the on_wakeup() function.

Call-outs:

An alternative approach that works is to move the code to discard the handshake space from on_processed_packet() to update_crypto_state(). I don't have strong intuition on which approach is better.

Testing:

Added a new integ test that reproduces the issue. Therefore, we can assert that this fix does solve the issue.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@suresr-aws suresr-aws requested a review from Copilot March 28, 2025 15:44
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request addresses issue #2484 by ensuring that the handshake space is correctly discarded when a TLS handshake completes during unexpected wakeup events.

  • Added logic in on_wakeup() to discard the handshake space when confirmed
  • Inline documentation referencing RFC 9001 for context
Comments suppressed due to low confidence (1)

quic/s2n-quic-transport/src/connection/connection_impl.rs:1142

  • [nitpick] Consider adding a unit test that simulates multiple wakeups with pending TLS responses to verify that the handshake space is discarded as expected.
if self.space_manager.handshake().is_some() && self.space_manager.is_handshake_confirmed() {

@maddeleine maddeleine marked this pull request as ready for review April 2, 2025 17:46
@maddeleine maddeleine requested a review from camshaft April 7, 2025 18:58
Comment on lines +12 to +13
pub server_endpoint: Option<E>,
pub client_endpoint: Option<E>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to have 2 different types here. Nor do they need to be Options. I'd also prefer to make the field private with a new function, just so we have flexibility to add fields in the future.

Suggested change
pub server_endpoint: Option<E>,
pub client_endpoint: Option<E>,
endpoint: E,

}

fn max_tag_length(&self) -> usize {
todo!()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be delegated to the inner endpoint type, which is easier if you just have the one.

// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0

#[cfg(feature = "s2n-quic-tls")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test the default one since these features aren't always enabled.

.expect("Server should exist")
.new_server_session(transport_parameters);
SlowSession {
defer: 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we picking 10? I think even 1 is probably good enough for most cases. Maybe we should take this as a parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Poll::Pending from TLS Session impls breaks internal state assumptions
2 participants