-
Notifications
You must be signed in to change notification settings - Fork 399
[WIP] MSC3814: Dehydrated devices with SSSS #3814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
80243a4
ed2c5eb
703281e
0a149c5
a4e87a6
3827bc0
12acd43
f756db3
6223db4
e3c9ac8
7f24f0d
f85c18d
4954c27
087154a
e7c8266
cf5ae99
d751d33
11149e4
1500897
5742c52
a58288a
21a3d67
6be9078
ec17903
9d6d059
75cf622
8f84545
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,224 @@ | ||
# MSC3814: Dehydrated Devices with SSSS | ||
|
||
[MSC2697](https://github.com/matrix-org/matrix-doc/pull/2697) introduces device | ||
dehydration -- a method for creating a device that can be stored in a user's | ||
account and receive megolm sessions. In this way, if a user has no other | ||
devices logged in, they can rehydrate the device on the next login and retrieve | ||
the megolm sessions. | ||
|
||
However, the approach presented in that MSC has some downsides, making it | ||
tricky to implement in some clients, and presenting some UX difficulties. For | ||
example, it requires that the device rehydration be done before any other API | ||
calls are made (in particular `/sync`), which may conflict with clients that | ||
currently assume that `/sync` can be called immediately after logging in. | ||
|
||
In addition, the user is required to enter a key or passphrase to create a | ||
dehydrated device. In practice, this is usually the same as the SSSS | ||
key/passphrase, which means that the user loses the advantage of verifying | ||
their other devices via emoji or QR code: either they will still be required to | ||
enter their SSSS key/passphrase (or a separate one for device dehydration), or | ||
else that client will not be able to dehydrate a device. | ||
|
||
This proposal introduces another way to use the dehydrated device that solves | ||
these problems by storing the dehydration key in SSSS, and by not changing the | ||
client's device ID. Rather than changing its device ID when it rehydrates the | ||
device, it will keep its device ID and upload its own device keys. The client | ||
will separately rehydrate the device, fetch its to-device messages, and decrypt | ||
them to retrieve the megolm sessions. | ||
|
||
## Proposal | ||
|
||
### Dehydrating a device | ||
|
||
The dehydration process is the same as in MSC2697. For completeness, it is | ||
repeated here: | ||
|
||
To upload a new dehydrated device, a client will use `PUT /dehydrated_device`. | ||
Each user has at most one dehydrated device; uploading a new dehydrated device | ||
will remove any previously-set dehydrated device. | ||
Comment on lines
+39
to
+40
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is there a limitation of one dehydrated device per user? |
||
|
||
`PUT /dehydrated_device` | ||
|
||
```jsonc | ||
{ | ||
"device_data": { | ||
"algorithm": "m.dehydration.v1.olm" | ||
"other_fields": "other_values" | ||
}, | ||
"initial_device_display_name": "foo bar" // optional | ||
} | ||
``` | ||
|
||
Result: | ||
|
||
```json | ||
{ | ||
"device_id": "dehydrated device's ID" | ||
} | ||
``` | ||
|
||
After the dehydrated device is uploaded, the client will upload the encryption | ||
keys using `POST /keys/upload/{device_id}`, where the `device_id` parameter is | ||
the device ID given in the response to `PUT /dehydrated_device`. The request | ||
and response formats for `POST /keys/upload/{device_id}` are the same as those | ||
for `POST /keys/upload` with the exception of the addition of the `device_id` | ||
path parameter. | ||
|
||
uhoreg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Note: Synapse already supports `POST /keys/upload/{device_id}` as this was used | ||
in some old clients. However, synapse requires that the given device ID | ||
matches the device ID of the client that made the call. So this will be | ||
changed to allow uploading keys for the dehydrated device. | ||
|
||
### Rehydrating a device | ||
|
||
To rehydrate a device, a client first calls `GET /dehydrated_device` to see if | ||
a dehydrated device is available. If a device is available, the server will | ||
respond with the dehydrated device's device ID and the dehydrated device data. | ||
|
||
`GET /dehydrated_device` | ||
|
||
Response: | ||
|
||
```json | ||
{ | ||
"device_id": "dehydrated device's ID", | ||
"device_data": { | ||
"algorithm": "m.dehydration.v1.olm", | ||
"other_fields": "other_values" | ||
} | ||
} | ||
``` | ||
|
||
If no dehydrated device is available, the server responds with an error code of | ||
`M_NOT_FOUND`, http code 404. | ||
|
||
If the client is able to decrypt the data and wants to use the dehydrated | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we can say something like: the server is allowed to discard any non- |
||
device, the client retrieves the to-device messages sent to the dehydrated | ||
device by calling `POST /dehydrated_device/{device_id}/events`, where | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why include a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that the device ID is redundant. Though since this MSC has been written a new use-case for this endpoint has been found. In the sliding sync world we have split out the fetching of to-device events into a separate sync loop. Namely one of the biggest problems of the existing To-device events are one of those things that are not directly related to the things that a client will want to display in a room or room list, so putting it into a separate sync loop allows the main loop to quickly send updates while to-device moves along in the background. More info here: matrix-org/matrix-rust-sdk#1928 I think that old sync could handle such a split as well, so I would suggest here to rename the endpoint to become There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds quite racy to me -- how does the server know that one dehydrated device is claimed? How would the client know to make a new one instead of claim the old one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's OK for multiple clients to rehydrate the same device (unlike in the previous proposal), because it never becomes a real device. So the server can just wait until some client fetches all the events before dropping the device.
Making a new device and rehydrating an old one are two different use cases. Rehydration happens after you log in, and you're setting up encryption and trying to get keys. It only happens once in the device's lifetime. Creating a new dehydrated device would happen after you've already set up your encryption and already attempted to rehydrate a device. |
||
`{device_id}` is the ID of the dehydrated device. Since there may be many | ||
uhoreg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
messages, the response can be sent in batches: the response can include a | ||
`next_batch` parameter, which can be used in a subsequent call to `POST | ||
/dehydrated_device/{device_id}/events` to obtain the next batch. | ||
|
||
``` | ||
POST /dehydrated_device/{device_id}/events | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this a POST and not a GET like /sync and /messages? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC, the rationale was because the call has side-effects (deleting the device). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a bit weird that it doesn't follow the pattern of /messages, /events or /sync imo. I'll try implementing it as a GET without the device deletion first and see how that works out, I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A GET endpoint with side-effects seems like a big no-no to me. Everyone expects a GET request to have approximately zero side-effects. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, but we're also proposing removing the side-effects? SGTM in that case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, the current implementation no longer automatically deletes the device on the server side, but relies on the client to delete/create a new device. So we're going to try to make this a GET. |
||
{ | ||
"next_batch": "token from previous call" // (optional) | ||
} | ||
``` | ||
|
||
Response: | ||
|
||
```jsonc | ||
{ | ||
"events": [ | ||
// array of to-device messages, in the same format as in | ||
// https://spec.matrix.org/unstable/client-server-api/#extensions-to-sync | ||
], | ||
"next_batch": "token to obtain next events" // optional | ||
} | ||
``` | ||
|
||
Once a client calls `POST /dehydrated_device/{device_id}/events`, the server | ||
can delete the device (though not necessarily its to-device messages). Once a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why should the server delete the device? Shouldn't this rather be done by the client explicitly in a delete call? Imo it is not quite obvious that fetching the events should "break" the device. A client might fail to properly restore and now you lost all the intermediate sessions. instead the client should replace the device once it is somewhat sure it restored successfully and has uploaded the megolm keys to online backup. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea is that when the client starts getting events, it means that the client is signalling its intention to use the dehydrated device, and it has been "claimed", so it shouldn't be used by anyone else. At this point, there isn't much that can be done if the client, e.g. fails to decrypt some messages. If it fails to decrypt messages with the dehydrated device, it's unlikely that leaving the device around will fix anything in the future -- any future attempts would likely fail as well. So the best thing to do is to replace the dehydrated device with a new one anyways. I'm not insistent on this endpoint deleting the dehydrated device, but I think that once you start using a dehydrated device, you'll want to create a new device no matter what. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's more that if the device fetches the first few events but then the user closes the browser and it never gets to upload the devices, then you have effectively thrown the dried fish out the window without properly getting to use it. So in that case it should either only delete the device, when it deletes the first few messages (i.e. by the client paginating with a next token), or just wait for the user to send a new device. Since you CAN still use the same dried device from another device, I think. All of the messages will be PRE KEY messages, so you can decrypt them as long as you haven't deleted the one time keys from the pickled device. So even if a client downloads the first batch of messages and then starts with the next batch and the first batch gets deleted, a different client should still be able to pick up from there. I agree that you want to create a new device no matter what, but that can just be done by uploading a new one instead of implicitly doing it when receiving messages. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, the issue of a client starting to load events and then dying somehow is possible, but seems like it would be extremely rare. I think another consideration is that a client could forget to replace the dehydrated device. If the device gets deleted automatically, then it makes it obvious that the client didn't do that. In any event, I think it's fine to try it out with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's also an issue if there's a connection problem during the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, the idea is that the dehydrated device is "deleted" in the sense that no other client can claim it, and if a client queries for the dehydrated device, it won't be returned. But the events associated with it are still there and can be retrieved (until the events get deleted as described elsewhere in the MSC), so if the client re-tries the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the problem is just that if the client fails to replace the device, then a failed login attempt will break the device dehydration until the next successfull login, since there is no way to receive messages in the meantime (while that would work fine if the device is just kept). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually ran into another race condition here in production. We currently have it implemented that the PUT of a new device removes the old device. However, since uploading a new device takes several requests (claim new device, upload keys, sign it, upload encrypted device), we run into a race condition, where the user closes the browser window during one of the steps and maybe only signs back in later. That means we have an unhydrateable device and we again lose messages over the gap. Ideally there would be some way to make this atomic to prevent this race condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that having this be a As it stands the flows can't be resumed once you start fetching the to-device events, and the fetching of the to-device events will be by far the longest operation here. What would allow perfect resumption is:
No 2. would ensure that, even if a device that attempts a rehydration gets stopped and deleted mid-rehydration, another, new device can restart the rehydration process.
Agree here as well, |
||
client calls `POST /dehydrated_device/{device_id}/events` with a `next_batch` | ||
token, the server can delete any to-device messages delivered in previous | ||
batches. It is recommended that, for the last batch of messages, the server | ||
still send a `next_batch` token, and return an empty `events` array when called | ||
with that token, so that it knows that the client has successfully received all | ||
the messages. | ||
uhoreg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Device Dehydration Format | ||
|
||
TODO: define a format. Unlike MSC2679, we don't need to worry about the | ||
dehydrated device being used as a normal device, so we can omit some | ||
information. So we should be able to get by with defining a fairly simple | ||
standard format, probably just the concatenation of the private device keys and | ||
the private one-time keys. This will come at the expense of implementations | ||
such as libolm needing to implement extra functions to support dehydration, but | ||
will have the advantage that we don't need to figure out a format that will fit | ||
into every possible implementation's idiosyncrasies. The format will be | ||
encrypted, which leads to ... | ||
|
||
#### Encryption key | ||
|
||
The encryption key used for the dehydrated device will be randomly generated | ||
and stored/shared via SSSS using the name `m.dehydrated_device`. | ||
uhoreg marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if I'm reading the iOS implementation correctly, the key is encoded with unpadded base64 (as is done with the other keys in secret storage) |
||
|
||
## Potential issues | ||
|
||
The same issues as in | ||
[MSC2697](https://github.com/matrix-org/matrix-doc/pull/2697) are present for | ||
this proposal. For completeness, they are repeated here: | ||
|
||
### One-time key exhaustion | ||
|
||
The dehydrated device may run out of one-time keys, since it is not backed by | ||
an active client that can replenish them. Once a device has run out of | ||
one-time keys, no new olm sessions can be established with it, which means that | ||
devices that have not already shared megolm keys with the dehydrated device | ||
will not be able to share megolm keys. This issue is not unique to dehydrated | ||
devices; this also occurs when devices are offline for an extended period of | ||
time. | ||
|
||
This may be addressed by using fallback keys as described in | ||
[MSC2732](https://github.com/matrix-org/matrix-doc/pull/2732). | ||
uhoreg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
To reduce the chances of one-time key exhaustion, if the user has an active | ||
client, it can periodically replace the dehydrated device with a new dehydrated | ||
device with new one-time keys. If a client does this, then it runs the risk of | ||
losing any megolm keys that were sent to the dehydrated device, but the client | ||
would likely have received those megolm keys itself. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we doing this [replacing the dehydrated device periodically] or not? It seems like both have serious downsides. If we do replace it, we have a very racy operation that is certain to cause UTDs in practice. If we don't replace it, then we'll end up with no remaining OTKs at all, and an incredibly long list of to-device messages all of which have to be downloaded and decrypted by any new clients. |
||
|
||
Alternatively, the client could perform a `/sync` for the dehydrated device, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this sill works with v2? can we still sync on the dehydrated device? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can't sync as a different device in this proposal. You can fetch the events for that device, but this proposal implicitly deletes the device in that case, which means you can't keep the device alive after that. So imo your only option is to replace it (which is somewhat easy to do, but you might need to authenticate the new signature upload/device?). |
||
dehydrate the olm sessions, and upload new one-time keys. By doing this | ||
instead of overwriting the dehydrated device, the device can receive megolm | ||
keys from more devices. However, this would require additional server-side | ||
changes above what this proposal provides, so this approach is not possible for | ||
the moment. | ||
|
||
### Accumulated to-device messages | ||
|
||
If a dehydrated device is not rehydrated for a long time, then it may | ||
accumulate many to-device messages from other clients sending it megolm | ||
sessions. This may result in a slower initial sync when the device eventually | ||
does get rehydrated, due to the number of messages that it will retrieve. | ||
Again, this can be addressed by periodically replacing the dehydrated device, | ||
or by performing a `/sync` for the dehydrated device and updating it. | ||
|
||
## Alternatives | ||
|
||
As mentioned above, | ||
[MSC2697](https://github.com/matrix-org/matrix-doc/pull/2697) tries to solve | ||
the same problem in a similar manner, but has several disadvantages that are | ||
fixed in this proposal. | ||
|
||
Rather than keep the name "dehydrated device", we could change the name to | ||
something like "shrivelled sessions", so that the full expansion of this MSC | ||
title would be "Shrivelled Sessions with Secure Secret Storage and Sharing", or | ||
SSSSSS. However, despite the alliterative property, the term "shrivelled | ||
sessions" is less pleasant, and "dehydrated device" is already commonly used to | ||
refer to this feature. | ||
|
||
The alternatives discussed in MSC2697 are also alternatives here. | ||
|
||
|
||
## Security considerations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section should contain a discussion of malicious dehydrated devices injected by the server. The MSC in its current form is almost purely focused on the mechanics of creation and rehydration of dehydrated devices. There is very little discussion of how these devices should be treated by message senders. And yet, these are not ordinary devices, as evidenced by the fact that we are proposing special UI/UX for them in the section concerning the The long-term plan is to move to a model where a user's devices must be signed by the user's cryptographic identity in order to be considered valid (see MSC4153) . Given that context, and the fact that dehydrated devices are a completely new feature, I strongly recommend that this MSC should require that a dehydrated device MUST be signed by a pinned (TOFU-trusted) user identity in order to be considered valid. If the dehydrated device is not signed, or is signed by a user identity which is not the one that is currently pinned by the client, the dehydrated device MUST be ignored by senders as if it it does no exist. That is, clients MUST NOT send any to-device messages to such a device nor accept any to-device messages from it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Dehydrated devices shouldn't be sending to-device messages, so it's probably safe to say that we should not accept any to-device messages from any devices marked as dehydrated. |
||
|
||
The security consideration in MSC2697 also applies to this proposal: If the | ||
dehydrated device is encrypted using a weak password or key, an attacker could | ||
access it and read the user's encrypted messages. | ||
|
||
## Unstable prefix | ||
|
||
While this MSC is in development, the `/dehydrated_device` endpoints will be | ||
reached at `/unstable/org.matrix.msc3814.v1/dehydrated_device`, and the | ||
`/dehydrated_device/{device_id}/events` endpoint will be reached at | ||
`/unstable/org.matrix.msc3814.v1/dehydrated_device/{device_id}/events`. The | ||
dehydration algorithm `m.dehydration.v1.olm` will be called | ||
`org.matrix.msc3814.v1.olm`. The SSSS name for the dehydration key will be | ||
`org.matrix.msc3814` instead of `m.dehydrated_device`. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Client implementation: https://gitlab.com/famedly/company/frontend/famedlysdk/-/merge_requests/1111 Server implementation: matrix-org/synapse#13581 Both not merged yet and notably missing is the dehydrated device format. |
||
## Dependencies | ||
|
||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should maybe write down how we know if the server supports the feature. AIUI the sample impl calls
GET /dehydrated_device
and checks for anM_UNRECOGNISED
error?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be an optional feature for servers?