Skip to content

Draft wasi_ephemeral_crypto.witx module #231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ueno
Copy link

@ueno ueno commented Feb 26, 2020

This adds a small set of interface functions that allow host implementations of cryptographic primitives for certification / acceleration purposes. Currently, our primary target is to enable TLS-like encrypted streams based on those primitives, namely: AEAD, hashing, MAC, HKDF, and shared secret derivation.

Comments appreciated.

Related to #65.

@ueno ueno force-pushed the wip/dueno/crypto-api branch from b741eb4 to e6749a9 Compare February 26, 2020 16:37
@tniessen
Copy link
Member

tniessen commented Feb 26, 2020

I have been thinking about this, too, and I imagined a very low-level API, that doesn't make strong assumptions about the algorithms themselves. Key management will also be a major issue.

  1. Currently, our primary target is to enable TLS-like encrypted streams based on those primitives

    I am not familiar with witx, but the current proposed AEAD API does not seem to provide any low-level streaming capabilities? Only the ability to reuse a key with different nonces.

  2. For AEAD, why is the key part of the handle creation, but the nonce part of the encryption/decryption process?

  3. For AEAD, the operation is described as "in-place encryption". How would that handle algorithms where the ciphertext is longer than the plaintext?

  4. For hash functions, would it not make sense to plan for modern XOF hash functions? I also understand if people don't consider them "real" hash functions.

@jedisct1
Copy link
Member

jedisct1 commented Feb 26, 2020

This is a really good starting point, and we definitely need such an interface.

However, the proposed interface seems a little bit limited, and not future-proof. Supporting modern constructions and proposals wouldn’t be possible without a major revision of this interface.

  • AEAD: maybe the nonce and AD should be part of aead_open() instead of the encryption/decryption operations, so that these can eventually support multi-part messages. Also, maybe we should not limit the parameters to a nonce and AD. Some constructions require additional parameters such as a context and a degree of parallelism (ex: keyak).
  • HASH: we definitely need to be able to provide more parameters here, or else even standard constructions such as cSHAKE cannot be supported. Some can apply to a wide range of functions (ex: parallelism), others only apply to some functions (ex: context/domain/personalization/customization, parameters for tree hashing as in e.g. the BLAKE2 function family).
  • MAC: same issues about the fact that some functions and standard constructions (e.g. kMAC) may require more parameters.
  • HKDF: this API is a little bit too specific, as key derivation is now being commonly done using XOFs, or more generally with keyed hash functions supporting variable output length and additional parameters (ex: BLAKE2 salt). Maybe we should define something more generic, in which the HKDF construction can fit.
  • key_share_derive: that seems like a very dangerous API to expose. If it does a scalar multiplication, it should probably be named after that operation instead, as the resulting point should generally never be used as a shared secret without hashing. As defined, the key_share_derive API is also incompatible with KEMs, which is quite of an issue. If it is only usable in a DH-like scenario, we should give it a less generic name.
  • Session-based encryption, and sponge-based constructions cannot be used with the proposed API. Maybe a dedicated object type would make sense in that context, but even if this is not present in the first version of the API, we should think about it early to avoid major breaking changes or API bloat later.

For AEAD/HASH/MAC, I’d suggest avoiding the open() function to accept any parameters at all beyond an algorithm identifier (not even a key, as some algorithms will produce a different output with an empty key and with no key), and add a function to set individual parameters.

Also, should we have ways to list the set of supported algorithms?

To summarize, before jumping the gun and implementing an API, we should make sure that it can support current and forthcoming algorithms. Or else, extending it to do so may be hacky and ugly.

The semantics also need some further discussion. For example, what happens after hash_digest() is called? Is the state cleared? Or can we keep updating it and squeezing more output later (something that spongy constructions and modern functions such as BLAKE3 safely support by design)? That can be discussed as a second step, but having the exact same behavior across all implementations is going to be critical for security.

@tniessen
Copy link
Member

When I first thought about this, I thought a low-level API might make more sense. Maybe something similar to what OpenSSL is doing internally: Generic APIs to create contexts, perform operations on contexts, and destroy contexts, where the context types and operations are not precisely specified and up to the algorithm.

Ideally, it should be possible to use existing crypto libraries which are implemented on top of the WASI crypto interface, instead of having to rewrite all applications and libraries to use this specific high-level API.

@jedisct1
Copy link
Member

Maybe something similar to what OpenSSL is doing internally: Generic APIs to create contexts, perform operations on contexts, and destroy contexts, where the context types and operations are not precisely specified and up to the algorithm.

Sounds like a sensible approach for a stable and future-proof API. And it doesn't prevent safer and simpler abstractions a la libsodium to be built later on top of such a low-level API, as a different module.

@tniessen
Copy link
Member

That's exactly my motivation. I had a somewhat detailed draft somewhere, that would allow most crypto operations to be implemented, but I struggled with asymmetric key management, and that is absolutely necessary.

@ueno ueno force-pushed the wip/dueno/crypto-api branch from e6749a9 to 7f72f16 Compare February 27, 2020 10:15
@ueno
Copy link
Author

ueno commented Feb 27, 2020

Thank you for the thoughtful comments!

@tniessen:

I would like to clarify that, with the terms "low-level" and "high-level", do you mean:

?

I'm a bit confused because a generic API is usually built upon a specific API in crypto library implementations. Each approach has pros and cons (flexibilty, complexity, error proneness, etc), so I was aiming at somewhere in the middle.

I am not familiar with witx, but the current proposed AEAD API does not seem to provide any low-level streaming capabilities?

The idea is to provide a way to directly feed the input/output of sock_recv and sock_send (scatter-gather I/O) to those encrypt and decrypt functions.

For AEAD, why is the key part of the handle creation, but the nonce part of the encryption/decryption process?

This is because IV can be per message, while the same key can be used for encrypting multiple messages. Conceptually, this can be seen as a simplified version of PKCS#11 message based encryption/decryption model, but I agree that moving nonce (and AAD) to the handle creation makes more sense if it is the typical use-case.

For AEAD, the operation is described as "in-place encryption". How would that handle algorithms where the ciphertext is longer than the plaintext?

Would you mind providing examples of such algorithms?

For hash functions, would it not make sense to plan for modern XOF hash functions? I also understand if people don't consider them "real" hash functions.

In PKCS#11, XOF falls into the category of key derivation mechanism, as those operations are quite similar.

@jedisct1:

key_share_derive: that seems like a very dangerous API to expose. If it does a scalar multiplication, it should probably be named after that operation instead, as the resulting point should generally never be used as a shared secret without hashing. As defined, the key_share_derive API is also incompatible with KEMs, which is quite of an issue. If it is only usable in a DH-like scenario, we should give it a less generic name.

Yes, this is meant to be used in a DH scenario only, to support the key_share extension negotiation in TLS. In the latest commit I concealed the private parameter with a new handle type, but I agree that the interface is not generic enough.

For AEAD/HASH/MAC, I’d suggest avoiding the open() function to accept any parameters at all beyond an algorithm identifier (not even a key, as some algorithms will produce a different output with an empty key and with no key), and add a function to set individual parameters.

I tend to agree, but if we go too further with this approach, the semantics of calling those function would become more complex; so I guess we need a compromise at some point.

Also, should we have ways to list the set of supported algorithms?

I think that makes sense.

The semantics also need some further discussion. For example, what happens after hash_digest() is called? Is the state cleared? Or can we keep updating it and squeezing more output later (something that spongy constructions and modern functions such as BLAKE3 safely support by design)?

I expanded the documentation of this particular case (hash_digest); it would be particularly useful that the hash keeps the state for implementing TLS transcript hash.

@jedisct1
Copy link
Member

The idea is to provide a way to directly feed the input/output of sock_recv and sock_send (scatter-gather I/O) to those encrypt and decrypt functions.

That would imply that the underlying encryption function would do far more than encryption, and implement a complete protocol handling fragmentation and rekeying. Essentially implementing something like TLS.

While it would be a nice thing to have and would solve your particular use case, it goes way beyond what the initial layer of a WASI crypto module should do.

Encrypting a new message should probably go through an open(), update(), finalize() and close() dance. open() can take the input and output pointers, while finalize() outputs the MAC. That would support in-place encryption, sessions, and the same interface can be used for signatures.

@jedisct1
Copy link
Member

While an API that doesn't require a context and uses a fixed set of parameters would be easier to use, WASI is not meant to be directly used by applications. It is rather akin to a set of system calls, and its API stability is critical, especially as we need implementers to build an ecosystem around it.

Which is why I'd strongly advocate for something that can be extended without breaking changes, or quirks (open(), open2(), open3(), ...), as well as standardizing low-level APIs rather than something very specific to a protocol or customer use case.

@jedisct1
Copy link
Member

For AEAD, the operation is described as "in-place encryption". How would that handle algorithms where the ciphertext is longer than the plaintext?

Would you mind providing examples of such algorithms?

Ignoring the MAC part, that would be the case for any block cipher used in a mode that is not counter-based or doesn't do ciphertext stealing. CBC is an example.

Depends if you allow traditional AE to be supported in the AEAD interface and if you trust all future AEADs to never require any padding.

@tniessen
Copy link
Member

I would like to clarify that, with the terms "low-level" and "high-level", do you mean

I would consider anything low-level that would allow existing libraries such as OpenSSL (and its variants), libsodium etc. to be implemented on top of it, without having to implement the primitives themselves, and in a future-proof way, meaning that the design shouldn't arbitrarily limit algorithm interfaces, options etc.

This is because IV can be per message, while the same key can be used for encrypting multiple messages.

Technically, you can also fix an IV and just use a different key for each message.

For AEAD, the operation is described as "in-place encryption". How would that handle algorithms where the ciphertext is longer than the plaintext?

Would you mind providing examples of such algorithms?

I don't think any of the "real" AEAD algorithms would be in that category, but my inability to come up with an example shouldn't be a justification for an arbitrary limitation.

While an API that doesn't require a context and uses a fixed set of parameters would be easier to use, WASI is not meant to be directly used by applications. It is rather akin to a set of system calls, and its API stability is critical, especially as we need implementers to build an ecosystem around it.

@jedisct1 I agree.

@npmccallum
Copy link
Contributor

@ueno This is a really great start and is definitely needed. Others have provided good feedback. So I'll focus on things that haven't already been said.

  1. We need the ability to incrementally provide authenticated plaintext for aead. Currently, you can only pass in the entire plaintext to aead_open().This is restrictive for contexts in which this data is produced incrementally.

  2. I'd like to see more type safety. Low-level crypto operations are basically a state machine. I'd like to see the states enforced by types. For example, in the current proposal there is a single aead type. I think we need at least four to cover the four states: protecting plaintext, verifying plaintext, encrypting, decrypting. Another example of this is in mac_set_nonce(). If one function needs to be called before another, there is an implicit state change. This should be an explicit type change.

  3. One important detail that pretty much every crypto implementation gets wrong is eager discarding of cryptographic material. Please be sure to define, for example, that key_share_derive() using ECDH returns both points, and not just x because that's what's used in TLS. Other algorithms require the use of both points.

  4. In general, I find the key_share API to be simultaneously too high-level and too low-level. If it were a high-level interface, I would expect key_share_derive() to return a key object (that I can't access the raw materials for) that can only be used for a particular key derivation algorithm (which in turn produces a key object that can only be used for my operation (i.e. encrypt, decrypt, hmac, etc.). If it were a low-level interface, I'd expect it to return a big int or an EC point that I could further manipulate.

This is a really great start, though.

The problem of key management is definitely something we should tackle sooner rather than later. For example, in the Enarx project, we're going to need to do crypto operations on keys that reside in hardware (i.e. we don't have access to the raw bytes). One strategy we might consider is to define (a) key type(s) that can be used everywhere and can always be constructed in the standard way using bytes. Then implementations can provide additional ways of constructing the key types for their specific use cases. I'm just brainstorming though.

@jedisct1
Copy link
Member

jedisct1 commented Mar 3, 2020

Cryptography modules for WASI require more discussions around their design in order to get the APIs as good as possible for the vast majority of applications.

If you are interested in working on WASI crypto APIs design and their implementations, please join the working group:

https://groups.google.com/d/forum/wasi-crypto see below.

@sunfishcode
Copy link
Member

@jedisct1 I certainly recognize the need for focused discussions for a Cryptography API. However, the Google Group is outside of the WebAssembly CG Umbrella, which may affect community participation and make it harder for us to be sure that contributions are covered by the W3C Community Group CLA.

If we created a new Github repository for this (within the WebAssembly organization), or set up a new mailing list for this, would those work for you?

@jedisct1
Copy link
Member

jedisct1 commented Mar 3, 2020

Yes, that would be perfect!

ueno added 2 commits March 4, 2020 10:56
This adds functions needed for implementing TLS 1.3 as a start,
namely: symmetric cipher, hashing, MAC, key generation and shared
secret derivation.
@ueno ueno force-pushed the wip/dueno/crypto-api branch from 7f72f16 to 1b97eb5 Compare March 4, 2020 09:57
@ueno
Copy link
Author

ueno commented Mar 4, 2020

While I'm not completely sure how better mailing list would fit than using github issues/PRs, I agree that this topic needs some high level discussion with interested parties before moving forward.

In any case, I've updated this PR to v3, with the following changes:

  1. introduced a $private_key type, that can represent a key used for either MAC, symmetric encryption, or key derivation. An open question here is to what degree we want to protect the underlying key data (e.g., do we want key wrapping?)
  2. the overall interface is now stateful, based on Nathaniel's comment
  3. the aead operation has been renamed to cipher, with the following changes:
    1. instead of requiring in-place operation, the encrypt and decrypt functions take distinct arguments for plaintext and ciphertext, which can overlap
    2. each function take a raw pointer to a byte array instead of iovec_array
  4. the generate_key and derive_key operations have been added to replace the hkdf and key_share operations to be more generic
  5. the hash operation has been modified to allow XOF, by adding the digest_len parameter to hash_digest

@jedisct1
Copy link
Member

jedisct1 commented Mar 4, 2020

Thanks for the update! This is looking really good.

A GitHub repository maybe a more convenient, then. So that we can start from a document, and keep improving it, similar to what is being done on the hash to curve document.

A single, very long thread such as comments on a PR makes the discussions very difficult to follow where multiple topics need to be addressed.

@sunfishcode would it be possible to get a dedicated wasi-crypto repository in the WebAssembly organization?

@sunfishcode
Copy link
Member

Yes, I think there's sufficient motivation here to create a new repo, so I've now requested and gotten approval for one, and have now created https://github.com/WebAssembly/WASI-crypto, so we can continue development there.

@npmccallum
Copy link
Contributor

@ueno I suspect you should re-submit this to the new repo. We can continue reviews there.

@sunfishcode
Copy link
Member

I believe this PR is subsumed by WebAssembly/wasi-crypto#5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants