Description
disclaimer: i am not an encryption expert. there are possibly big mistakes in this proposal, due to my lack of in-depth understanding of encryption mechanisms. maybe this proposal doesn’t make sense for security reasons.
as far as i understand how e2ee in matrix works, encryption keys depend on the devices (sessions) of each user in a room. this means that each time a user adds (and removes?) a device, encryption keys must change. i think that this causes many problems. i have been using matrix heavily for more than 3 years now, and here are the problems that i encountered (even recently), which are mainly caused by the fact that e2ee is handled per device.
current issues
encryption is brittle and breaks too easily
this is the main issue.
countless times did people who i use matrix with tell me that they cannot decrypt messages. for some, it happens on some messages from time to time. for others (recently, for all the people i know who were on ios), all new messages were suddenly non-decryptable.
there were people who just gave up on matrix completely because it just didn’t work for them (despite trying several times). the messages could never be decrypted.
for my part, i had some problems too, but thanks to the fact that i use many devices, i was always able to eventually decrypt the messages. however, one time i had to ask a friend to send me his keys. what happened was that my homeserver was offline for a short period of time (less than an hour), but during that time, a friend created a matrix account on another server and joined a room i was in. because his server could not join mine, it had no access to my device list, so none of his messages could be decrypted by me (but could by others).
device verification is complex
when a user logs in with a new device, they will have to verify it, or it will be marked as not trusted, and this will be visible by other users. the device verification process works well most of the time (by scanning a qr code or comparing emojis), but is nevertheless a complex technical process, and it sometimes fails. several people i know had to reset their cross-signing state and start over. also, this process must be implemented by all clients who want to support end-to-end encryption.
the device list can be empty
if a user logs out of all of their devices, their device list will be empty. this means that all messages that are sent to them while their device list is empty will never be decryptable by them. this is, i think, a flaw in the protocol design.
encryption keys take too much space
after more than 3 years of matrix usage, i have now 7283 encryption keys, taking up 4,2 mib in json format. this is simply too much for about 30 rooms and less than 50 people. if the encryption system does not change, this will continue to grow over the years, and i will need to keep them to be able to read older messages. as more and more people join matrix, this will grow faster and faster. is this really what we want?
device lists are a privacy issue
when you know the mxid of a user, you can access their device list (without asking for permission), which contain a human readable description of each device (“app.element.io (firefox, ubuntu)”). this can possibly be a privacy issue, as it gives information about what kind of devices the user has and which clients they are using.
improvements?
over the years, e2ee handling has already improved. 2 years ago, people had to manually verify all devices of all users on all their devices. thanks to the hard work of the element team, this is now a thing of the past (which surely nobody misses ☺). thanks to cross-signing, verifying a user now means verifying only one thing, regardless of the number of devices. this is already much better, but to me it still feels like a big workaround, which makes the whole system even more complicated.
what if we handled end-to-end encryption per user instead of per device? i’ve always felt that having devices show up in the protocol was too low-level and strange. what if there was no such thing as a device in the matrix protocol? this would be much simpler.
there could be only one key per user that would be used to create the megolm session. this would mean that a session would change only when a user joins or leaves a room, which happen much less often than a change in devices. having the session change less often would decrease the chance that messages could not be decrypted.
this user key could be cached by the servers the user communicates with. this would avoid problems in case their homeserver is unreachable for some time.
what about security?
surely, the current system is more secure, as devices could be individually deleted or marked as not trusted. but do we really need this? isn’t all this working against us, causing more problems than solving them? with per-user e2ee, if a user thinks that one of their devices could be compromised, they could change their password and key, which would cause all of their devices to be logged out, then they could simply log in again on the devices they still control. this is similar to how most online services work.
how to transition to this?
this change is of course a breaking change, but if i understand correctly, it could be handled progressively by using a new room version.