-
Notifications
You must be signed in to change notification settings - Fork 3
Custom TLS verification logic #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Few observations about the
Since the verify callback will be called multiple times, we'll have to handle that some how. I think |
Remember that callback will have to be synchronous function. NAPI will be executing it from the rust side. |
I've been prototyping how to handle the callback. It has been a frustrating process since none of this is documented very well. I've identified 4-ish ways of calling a callback.
Method 1.This should be the simplest method. The callback can be defined and used as so. #[napi]
fn test_callback<T: Fn(String, String) -> Result<bool>>(callback: T) {
let result = callback("Hello".to_string(), "olleh".to_string()).unwrap();
println!("result: {}", result);
} It will work synchronously and should be thread-safe? I'm not sure. The problem with this however is that it doesn't work in a class factory method. @CMCDragonkai has made an issue for it at napi-rs/napi-rs#1374. So I don't think we can use this method here. Method 2.Method 2 is similar to 1 but uses We need thread safety to call the callback within the boring validate function. #[napi]
pub fn callback_two(env: Env, callback: JsFunction) -> napi::Result<()> {
let result = callback.call(None, &vec![env.create_string("hello").unwrap()]);
let val = result
.unwrap()
.coerce_to_number()
.unwrap()
.get_int64()
.unwrap();
println!("result was: {}", val);
Ok(())
} Method 3.I can't create a working example of this. The documentation gives examples of this using functions using the I don't think this method would work, I don't think it's thread-safe? Method 4.I think due to the thread safety, we have to use a #[napi]
pub fn call_threadsafe_function(callback: JsFunction) -> Result<()> {
let tsfn: ThreadsafeFunction<(u32, String), ErrorStrategy::CalleeHandled> = callback
.create_threadsafe_function(0, |ctx| {
let(num, s) = ctx.value;
println!("value: {}", num);
println!("value: {}", s);
ctx.env.create_string("Hello!").map(|v| vec![v])
})?;
println!("Making native call");
thread::spawn(move || {
tsfn.call(
Ok((100, "hello!".to_string())),
ThreadsafeFunctionCallMode::Blocking,
);
println!("Result!: {}", result.unwrap());
});
Ok(())
} We should be able to use this within boring's The |
Using |
Ideally we should not introduce any rust based IO runtimes like Tokio. If it cannot be used in a factory method, perhaps it should be set after construction of the rust struct? This is supposed into the config object right? Then why not call a method on the config object or does it have to be part of the boring SSL context? |
Acording to https://napi.rs/docs/compat-mode/concepts/thread-safe-function, any function has to be converted to a This means it's not possible to use any js callback functions within the
Note that this is only a problem if we want to get the result of the callback. It is possible to call the callback synchronously and provide information such as the cert PEMs and the internal validation result. So now I think we need to think about a work-around. Strictly we only need to override the verify failing due to a missing CA cert. The cert chain can be verified on the application level and any errors and connection rejection can be handled there. The callback can used to signal the secured event and provide the certificate information. So we can side-step this and move forward with the following steps.
|
It's not ideal but I have something workable.
The information being provided to the callback has the following format. A depth of {
preSuccess: true,
errorMessage: 'ok',
depth: 0,
length: 2,
cert: '-----BEGIN CERTIFICATE-----\n' +
'MIIDLjCCAhagAwIBAgIRAOP1Hv5+GXJMxyBvMi6IFSAwDQYJKoZIhvcNAQELBQAw\n' +
'EDEOMAwGA1UEAxMFcnNhQ0EwHhcNMjMwNDE5MDU0ODU4WhcNMjQwNDE4MDU0ODU3\n' +
'WjAPMQ0wCwYDVQQDEwRyc2ExMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC\n' +
'AQEAu9DbhhyDmXQZhcUxmJMbUPd8w1YPnZjNTIDREWMtjRaLmeKylPL1cuL8Wmsj\n' +
'wvFMbx2FhDjSXsQDzjPmFNIbT3F+i2VmolAvKUHG4JqUyfcN3E+rty1Lytvu3vDV\n' +
'3OC8i1o0Y89QTXiRJIjfSBktPy3y8hskKJ6nAAbL0MQKRyKpzIV2k6epLd1Xx6dl\n' +
'RtA4QHihcD5HedbD8OaVRLBbeJ/nx38Kb/vW1G6Uivd9yVdLOi306epKlks60bSW\n' +
'1U5Ffy/2MkVW35/8JSyoAExOHBNix7LYaNB2ajCjLCjcrQI3AFcGuep8OPIS0MvD\n' +
'y/srxR/PR1nWl3muhClGChkP+wIDAQABo4GDMIGAMA4GA1UdDwEB/wQEAwIFoDAd\n' +
'BgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHQYDVR0OBBYEFFFzBNyAicB+\n' +
'yUT8xh+isRcSixzkMB8GA1UdIwQYMBaAFPl6JPRYyNrrxk30v1MV8LTYwqLLMA8G\n' +
'A1UdEQQIMAaCBHJzYTEwDQYJKoZIhvcNAQELBQADggEBAEQydSNhxW1v+f5J6v0K\n' +
'vhWSqJmwTwhxhAvQctnophPPJb2cuyfcc0X1x+JlZFJD4eEHu5qvaYMVv7EY4Is8\n' +
'xliZbQj4L9x0246q7j44XuRlK67rj4moIoT/hLBMwRM300LIABVUHvZjAy0Fl+hs\n' +
'OuVl7m1iASHoJ07mTHoKkh4tdSjIYiMoF3Nbag4hu01iJYJIK48NzvrPxCzOb4zh\n' +
'eMhQG5C/2bijWMSPI112ksCV1PT9skuj4/R0bCuMzdJ/RWxOl1sF9pXACwpi/y6M\n' +
'dEPebTEWIOJtHB05IMb4nLy5DoOwW6o8ZrCjVZqdNps3LCJG2PrMkYVeKYYC3ecU\n' +
'Pug=\n' +
'-----END CERTIFICATE-----\n',
chain: [
'-----BEGIN CERTIFICATE-----\n' +
'MIIDLjCCAhagAwIBAgIRAOP1Hv5+GXJMxyBvMi6IFSAwDQYJKoZIhvcNAQELBQAw\n' +
'EDEOMAwGA1UEAxMFcnNhQ0EwHhcNMjMwNDE5MDU0ODU4WhcNMjQwNDE4MDU0ODU3\n' +
'WjAPMQ0wCwYDVQQDEwRyc2ExMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC\n' +
'AQEAu9DbhhyDmXQZhcUxmJMbUPd8w1YPnZjNTIDREWMtjRaLmeKylPL1cuL8Wmsj\n' +
'wvFMbx2FhDjSXsQDzjPmFNIbT3F+i2VmolAvKUHG4JqUyfcN3E+rty1Lytvu3vDV\n' +
'3OC8i1o0Y89QTXiRJIjfSBktPy3y8hskKJ6nAAbL0MQKRyKpzIV2k6epLd1Xx6dl\n' +
'RtA4QHihcD5HedbD8OaVRLBbeJ/nx38Kb/vW1G6Uivd9yVdLOi306epKlks60bSW\n' +
'1U5Ffy/2MkVW35/8JSyoAExOHBNix7LYaNB2ajCjLCjcrQI3AFcGuep8OPIS0MvD\n' +
'y/srxR/PR1nWl3muhClGChkP+wIDAQABo4GDMIGAMA4GA1UdDwEB/wQEAwIFoDAd\n' +
'BgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHQYDVR0OBBYEFFFzBNyAicB+\n' +
'yUT8xh+isRcSixzkMB8GA1UdIwQYMBaAFPl6JPRYyNrrxk30v1MV8LTYwqLLMA8G\n' +
'A1UdEQQIMAaCBHJzYTEwDQYJKoZIhvcNAQELBQADggEBAEQydSNhxW1v+f5J6v0K\n' +
'vhWSqJmwTwhxhAvQctnophPPJb2cuyfcc0X1x+JlZFJD4eEHu5qvaYMVv7EY4Is8\n' +
'xliZbQj4L9x0246q7j44XuRlK67rj4moIoT/hLBMwRM300LIABVUHvZjAy0Fl+hs\n' +
'OuVl7m1iASHoJ07mTHoKkh4tdSjIYiMoF3Nbag4hu01iJYJIK48NzvrPxCzOb4zh\n' +
'eMhQG5C/2bijWMSPI112ksCV1PT9skuj4/R0bCuMzdJ/RWxOl1sF9pXACwpi/y6M\n' +
'dEPebTEWIOJtHB05IMb4nLy5DoOwW6o8ZrCjVZqdNps3LCJG2PrMkYVeKYYC3ecU\n' +
'Pug=\n' +
'-----END CERTIFICATE-----\n',
'-----BEGIN CERTIFICATE-----\n' +
'MIIC8DCCAdigAwIBAgIRAJCwiiQ3TTulJhS9EK55cXUwDQYJKoZIhvcNAQELBQAw\n' +
'EDEOMAwGA1UEAxMFcnNhQ0EwHhcNMjMwNDE5MDU0ODEyWhcNMjQwNDE4MDU0ODEy\n' +
'WjAQMQ4wDAYDVQQDEwVyc2FDQTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC\n' +
'ggEBAOPvjy4GlvFbX7iCAyt/jd0+eHlEFx/zMFwBPLZmqIA3HPJwj2cp5TY5NEWk\n' +
'MM42qQLzCOmO/K4f9m6WUZw1J9tKGIEc7b+RQ/JisHz8iiFpEDayROrV4Cs/Nd0r\n' +
'wvTautTXl6bd37rI4V8o0yHB0E/WcazAUJQDwvl7bVL942qB86LQqsEfxMIM7Ddq\n' +
'S+9dyr67RG1MMw5Qm0889l3yDKkKOKy3I6w4dAdfCZSeZFWXl7GeIVPj1Rc6Fxt8\n' +
'Tg79Cm5lgMnmqzGBZUS4eitJq6BhTyQfoylHayCkpqhcojVDbcN5D1bH/YTUh/38\n' +
'klDaOwqCajpdu6VUr1eRGrl55A0CAwEAAaNFMEMwDgYDVR0PAQH/BAQDAgEGMBIG\n' +
'A1UdEwEB/wQIMAYBAf8CAQEwHQYDVR0OBBYEFPl6JPRYyNrrxk30v1MV8LTYwqLL\n' +
'MA0GCSqGSIb3DQEBCwUAA4IBAQDIACzucf/ePefGgLVTWMVvZdfW9cXuf0Ib3zWC\n' +
'6RYbA+cBBIgn0SGToY613vIeKV1uPhFpzYJd+3eFVj9x1gVHTK7Hk6o5Tf780ZKN\n' +
'd4TkoTp9WER/boaOB+77nH9WopvE+ien1GM2HTgOBUrClPGCZ2OAJBq4xxDDeU2T\n' +
'rxlRD06692CdVh5uSUtCIsqoCUTi7s+C9p5WG1gOVIkafFmyR2i28E2n0rKlNzXQ\n' +
'UTo3936xFx8cdIJVzvPFOvMhKrOTzCCcMD8NiXXHlX51HMy3fua16W8dfLKhJp4N\n' +
'1PFD57chAacE/HKlnB4LugDLCl+MBhpuyxt+Ifk0geqeV15p\n' +
'-----END CERTIFICATE-----\n'
]
} The callback has the type Now to do verification on the application level we can do the following.
I think this is very similar to the validation process the proxy did. |
One problem with this is that the connection can establish before TLS validation is completed. On the application level we need to make sure not to start any streams until the following conditions.
I'm not sure of a good way to know that the peer has verified, possibly the first stream can signal this? |
We talked about this before. There is a trick about the exact quic packets that is received. On the whiteboard somewhere. |
It seems that the verify callback is called for each error in the chain. At minimum that means the callback the called at least once for each cert in the chain if it succeeded with error message 'ok'. Or multiple times if the cert had an error. This means that it's not clear that any callback is the last one to be made on the chain. Unless it's depth 0 and succeeded with an 'ok' message. I'll need to add a done flag or done callback to signal it. But I need a way in the internal callback to know. I'll do some more digging. This can be very useful. It means we can override specific verification errors such as 'self signed certificate in certificate chain', 'unable to get local issuer certificate' or 'unable to verify the first certificate'. I think given this, we can re-create our Polykey custom verification using a combination of two primitive overrides that we can hard code.
This should make verification failures part of quiches internal logic. but it doesn't solve a timing problem with the client connections. The client doesn't know when the server has finished verifying it. So in reality for any connections there are 3 events for when the connection can be considered fully established.
If I recall, the servers certs are verified early in the connection and happens before the connection is established? So we still need a way to know from both sides that both TLS verification steps have completed. This is a problem weather we hard-code the verification logic or handle it on the application level. Worst case we'll need to communicate this somehow on the application level. This could just be opening the first stream after verifying locally. |
Ah I remember that. Ok so we'll have a way of knowing that event on both ends. There is one thing I need your feedback on. I have outlined two was of handling the verification now.
Option 2. likely works better with the connection lifecycle events. but it's simple enough to do option 3. |
Useful reference for verification error messages. Error codes are also defined in https://github.com/google/boringssl/blob/e1b8685770d0e82e5a4a3c5d24ad1602e05f2e83/include/openssl/x509.h#L2654 So an error such as 'self signed certificate in certificate chain' refers to... |
Based on the discussion just now. We're going to cut out the verify_callback and allow failure override and handle the custom TLS verification on the application level. It needs the following changes
I'll need to look at the new native tests to divine the stage when certs should be available. Hopefully this doesn't vary based on certificate type or chain size. |
Client certs are not sent unless we try to verify the client. So |
How about this? Set verify peer to be true but then override the callback to return true if the peer's certificate is fine without the identity check. Then apply the application level check on top. Only do this if one gives a custom JS callback in the first place. |
That way the library is compatible outside PK and still works for what we need inside PK. |
Wait I don't get this. How does the client know not to send the client server if server doesn't enable verify peer? |
After reviewing the tests connection lifecycle tests I can see that After each side verifies the TLS and it succeeds, the send a short frame. If the TLS failed, it sends a handshake frame with the error instead. This means we can use receiving a short frame as an indication that the peer succeeded all TLS checks. The peer certs are available after receiving the initial frame. So the custom TLS verification can be done after receiving the initial frame. This needs to be done synchronously before sending the next frame out. If a connection receives a short frame then we can reasonably assume the peer will not fail any verification and the connection is fully established. Two changes need to be done
|
Ok, The custom TLS verification callback has been implemented. It is a simple callback that is called with the remote cert chain in PEM format Now that I think about it, It could just return We have a working secured event that triggers when the peer has verified without error. It is triggered by the first short header received that doesn't trigger the draining state. Creating the client awaits the secure event before resolving, resulting in an error if it failed. I need to update this so it only triggers a short frame has be received AND sent. Otherwise the server side connection can trigger the secured event even when client verification fails. In this case the client accepts the server's certs and sends the short frame. Ideally the connection is secure when both sides give the thumbs up. It's a simple change. I added some tests for demonstrating custom TLS success and failure. I also fixed two skipped tests that depended on the secured event. Some things that still need to be done.
One last thing. I've been working off of the staging branch here but the changes need to be applied to the PR. There are two ways I can go about this.
I'm leaning towards option 1 here. It keeps better history but is a little more annoying to PR. Not too bad though. |
When this is done I can go two ways.
Extra note: I need to update js-async-init version for locks dependency, same for polykey. async-init should manually publish. |
There may be a problem with OKP certs. If you fail the custom TLS verification callback on the client side, the server side doesn't respect the close frame with the error and will time out instead. I'm not sure why this happens, but it may be another issue with quiche. I'll dig into it more after transplanting the changes into the main PR. |
Can you check that the normal quiche based TLS failure works and examine the received failure code on the close frame and ensure that you are doing the exact code and whatever other information when you force a close frame from JS-based TLS failure. Based on my native tests the OKP works fine when quiche fails TLS normally. |
The only difference would be what the close frame/packet looks like to the other side. And I'm sure that if this is made exactly equal, we the same behaviour is expected since programs are deterministic. |
Unless we manually craft the packet with the error we can't get the same result. A normal TLS failure results in a handshake frame containing the failure. Us closing the connection with the same error code results in a CC frame to close the connection. |
That's strange, I thought during TLS failure we still get a CC frame? Did you check this on Wireshark? If truly there's no CC frame, then you want to force close if that what is happening. Check the native tests that I've done, there is certain conditions where it does wait to timeout. Alternatively we can craft the packet if we have all the information to send into the UDP socket. |
It's also because when the other side checks the actual error object, it gets the TLS code and other information. Where does information come from if not from a close frame? |
I need to test some expectation using the native connection tests. Based on the discussion, it seems that the handshake needs to complete before we can close the connection. What I was doing was closing the connection immediately after receiving the peer certs and verifying them. What I should do is wait for the handshake to complete before verifying and closing the connection. This means receiving and sending the initial frames. I'll need to expand the native tests to check this expectation for each type of cert. |
Curious, I wasn't aware of this but it seems that
|
I've updated the native TLS tests in
As an example, for a
For a
|
I think that is why it has to be called multiple times. And also even when 0, you still must call it.
21 June 2023 13:00:24 Brian Botha ***@***.***>:
…
Curious, I wasn't aware of this but it seems that *connection.timeout()* is actually aware of time passing. Calling it multiple times will return an updated time. Calling *connection.onTimeout()* will not cause a state transition unless the time has actually passed.
* console.log
----client before timeout----
established: true,
draining: true,
closed: false,
resumed: false,
earlyData: false,
peerCerts: Avaliable,
timeout: 404,
at Object.log (tests/native/quiche.tls.test.ts:2732:15)
console.log
----client after sleeping 50ms----
established: true,
draining: true,
closed: false,
resumed: false,
earlyData: false,
peerCerts: Avaliable,
timeout: 353,
at Object.log (tests/native/quiche.tls.test.ts:2735:15)
console.log
----client after sleeping timeout----
established: true,
draining: true,
closed: true,
resumed: false,
earlyData: false,
peerCerts: Avaliable,
timeout: null,
at Object.log (tests/native/quiche.tls.test.ts:2738:15)
*
—
Reply to this email directly, view it on GitHub[#30 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAE4OHMRZOECMB2SYIL7SYLXMJPUPANCNFSM6AAAAAAZACLQJU].
You are receiving this because you were mentioned.[Tracking image][https://github.com/notifications/beacon/AAE4OHOCMBB2JVAWWXHLJ5TXMJPUPA5CNFSM6AAAAAAZACLQJWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS7LYJNM.gif]
|
In particular you have to call it wait 0, and call `onTimeout` possibly multiple times.
21 June 2023 15:29:22 Roger Qiu ***@***.***>:
… I think that is why it has to be called multiple times. And also even when 0, you still must call it.
21 June 2023 13:00:24 Brian Botha ***@***.***>:
>
> Curious, I wasn't aware of this but it seems that *connection.timeout()* is actually aware of time passing. Calling it multiple times will return an updated time. Calling *connection.onTimeout()* will not cause a state transition unless the time has actually passed.
>
> * console.log
>
> ----client before timeout----
> established: true,
> draining: true,
> closed: false,
> resumed: false,
> earlyData: false,
> peerCerts: Avaliable,
> timeout: 404,
>
> at Object.log (tests/native/quiche.tls.test.ts:2732:15)
>
> console.log
>
> ----client after sleeping 50ms----
> established: true,
> draining: true,
> closed: false,
> resumed: false,
> earlyData: false,
> peerCerts: Avaliable,
> timeout: 353,
>
> at Object.log (tests/native/quiche.tls.test.ts:2735:15)
>
> console.log
>
> ----client after sleeping timeout----
> established: true,
> draining: true,
> closed: true,
> resumed: false,
> earlyData: false,
> peerCerts: Avaliable,
> timeout: null,
>
> at Object.log (tests/native/quiche.tls.test.ts:2738:15)
>
>
> *
> —
> Reply to this email directly, view it on GitHub[#30 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAE4OHMRZOECMB2SYIL7SYLXMJPUPANCNFSM6AAAAAAZACLQJU].
> You are receiving this because you were mentioned.[Tracking image][https://github.com/notifications/beacon/AAE4OHOCMBB2JVAWWXHLJ5TXMJPUPA5CNFSM6AAAAAAZACLQJWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS7LYJNM.gif]
>
|
Ok, Looks like everything is working now. The main changes were.
I'll need to review and clean the changes but I think this is addressed now. Changes are still in the |
Just looked at the spec, I really need to update it with details of the new changes. |
Closed by #26. |
Specification
We need the ability to override the TLS verification logic. This is essential for
Polykey
to check the identity of a connection by verifying the cert chain. We need to fail any connection that can't prove it's the Node we're trying to connect to.The boring
SslContextBuilder
has a method called set_verify_callback which we can use to set a verify function. We need to translate a rust function into an async call we provide in the js space.The method we provide it will be given the normal verification result as well as a x509 store reference we can use to inspect the cert chain. We return a bool corresponding to our verification result. This result will override the normal verification result.
The
set_verify_callback
also sets the verification mode. I think this would override whatset_verify
sets for the mode.Additional context
js-quic
integration and Agent migration Polykey#525Tasks
set_verify_callback
works.The text was updated successfully, but these errors were encountered: