-
Notifications
You must be signed in to change notification settings - Fork 643
QUIC: generate unique server_name for client connections #7260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QUIC: generate unique server_name for client connections #7260
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7260 +/- ##
=======================================
Coverage 82.8% 82.8%
=======================================
Files 803 803
Lines 364439 364447 +8
=======================================
+ Hits 302068 302079 +11
+ Misses 62371 62368 -3 🚀 New features to boost your workflow:
|
Can you share details about token mixed up? What is the impact? |
@lijunwangs Agave clients have been using the same hostname for all servers it connects to. It also has a "TokenStore" that caches address validation tokens received by servers, but it is keyed by hostname. In other words, all servers share the same "TokenStore" entry. When dialing a new connection, the Agave client will send the last received address validation token in the Initial packet "Token" field. Note that this "Token" field can be interpreted ambiguously since the schema is defined by the server's implementation. Since all Agave servers use the same implementation, they all interpret this incoming garbage token as an address validation token, and just ignore it per spec. So the impact to Agave-Agave connections is zero. Firedancer, however, interprets this garbage token as a "Retry token". The spec mandates that the Initial is dropped or rejected in this case. So, Agave-Firedancer connection attempts fail frequently, but this condition is recoverable, since the Agave client will omit the token on the second conn attempt. Firedancer deployed a workaround (ignore these Agave tokens / parse them as address validation tokens). |
/// since Solana does not rely on DNS names, but we need to provide a unique | ||
/// one to ensure that we present correct QUIC tokens to the correct server. | ||
pub fn socket_addr_to_quic_server_name(peer: SocketAddr) -> String { | ||
format!("{}.{}.sol", peer.ip(), peer.port()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would have been cleaner to use the identity key here. But anything works, as long as it's unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identity key stays unchanged during identity transfer, so we would be again connecting to a different host while using incorrect keys. So, can not use just pubkey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From server name,
/// Attempt to make a ServerName from a string by parsing as a DNS name or IP address.
impl<'a> TryFrom<&'a str> for ServerName<'a> {
type Error = InvalidDnsNameError;
fn try_from(s: &'a str) -> Result<Self, Self::Error> {
match DnsName::try_from(s) {
Ok(dns) => Ok(Self::DnsName(dns)),
Err(InvalidDnsNameError) => match IpAddr::try_from(s) {
Ok(ip) => Ok(Self::IpAddress(ip)),
Err(_) => Err(InvalidDnsNameError),
},
}
}
}
, it is expecting either a DNS name or IP, can we simply use the IP address? Why do we need a port number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not simply use the IP address because it is perfectly legitimate to run more than one node on the same IP address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identity key stays unchanged during identity transfer, so we would be again connecting to a different host while using incorrect keys. So, can not use just pubkey.
@alexpyattaev I'm confused, sorry ... So in Agave, set-identity does not update keys for all its QUIC clients and servers? Isn't that an implementation bug? Since this is a network protocol change, shouldn't we fix the implementation than work around in the protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set identity does update the keys, but on the host where identity is transplanted to. As a random client connecting via e.g. tpu-client-next, I'd have no idea that the identity has been transplanted, I'm just connecting to the server using SocketAddress. For the client it is also indistinguishable from a multihomed server changing its primary IP (in which case the old tokens would still work). But since we can not know for sure, we assume it is an identity transplant (i.e. old tokens are invalid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ripatel-fd if you mean Admin RPC call to change the identity, it changes the all client identities.
I think Alex means that if a validator is running on host H_1
, it may decide to switch to the other host H_2
. He will use the same identity but on a different host simultaneously stopping the validator on the old host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, and now if I use identity as the key for tokens, this key will be the same while host is different => problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me some time to review how the server name is used in Quinn and rustls
- Resumption token for crypto handshaking -- our configuration essentially disabled it.
- SNI extensions in the initial packet -- server will need to provide a certificate with the name when server certificate validation is done -- we are not validating it now. Our server always have the SAN (subject alternative name) to localhost.
- The token in the initial packet for address validations (retry and future connection address validation) mentioned by @ripatel-fd.
4 ECH (encrypted client hello) -- our implementation does not use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add some tests, but might be a bit elaborated to create one
Do you mean to test that when we transition to another server the token is reused? Yeah it would be a decent test around streamer crate (since there we have better access to the internals of the server to check if it actually had to validate the address). |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
generate unique server_name for QUIC connections (cherry picked from commit eb42788)
Problem
Summary of Changes
Fixes #7197