I want to be able to build privacy focused applications that are as powerful and as user friendly as the “traditional” web apps we’re so used to today.
Privacy focused means the application developers never get to see the content their users are creating - your data is yours only. Powerful means real-time collaboration between multiple users, search, notifications. User friendly means it’s easy to access or recover your data on a new device and easy to invite collaborators.
And that’s really hard today. Really really hard.
Why do it?
Why should we even care about building privacy focused applications? Why does privacy matter? Privacy might sometimes feel like a philosophical topic, but it’s one with very real life altering and threatening implications (including the Holocaust). I won’t be tackling this question in more depth here and if you want to learn more about the importance of privacy I recommend reading Identity Reboot: Reimagining Data Privacy for the 21st Century by Arwen Smit.
“Privacy focused” can mean many things and is a really broad topic. This article looks specifically at collaborative content creation tools such as Google Docs and Slack - tools that individuals, teams and communities use to create private digital content. End-to-end encryption or removal of centralised servers could offer a technical solution to enhanced privacy in such applications. Public digital content creation like Wikipedia and Twitter, ambient content collection such as Amazon Alexa and Tesla car recordings and many other digital privacy issues are not considered in this article.
Building collaborative content creation applications such as Google Docs, Asana or Slack, but in a way where all the content is end-to-end encrypted and only accessible to the content creators is technically very difficult. We will look at some of the technical challenges in more depth in the remainder of the article, but here are a few:
- added UX challenges of handling encryption keys
- handling real-time collaborations on encrypted data
- search over encrypted data is difficult to impossible
- debugging, optimizing and supporting customers all become more difficult
Technical overhead alone removes a lot of the incentive for any company to even try. What’s more, having access to the raw data provides companies with extra value. Data can be used to power features (e.g. Gmail’s smart compose) and data is what created today’s most profitable businesses by fuelling Ad targeting.
There’s also the fact that end-to-end encryption is not applicable to every type of application. E.g. building a web search engine, wikipedia or a social network does not require all of the content to be encrypted end-to-end. The fact that this problem is not universal further reduces the incentive to tackle it.
And so we have a chicken and egg problem, where lack of incentives means progress on technological solutions is slow, mostly being pushed forward by decentralisation hackers and academic researchers, while many of the world’s engineers are being paid to do the opposite - extracting more value from the data.
To make matters worse, end-to-end encryption tech could become more and more government regulated at any point. Perhaps anecdotal, but COVID-19 created an incentive to build a privacy focused exposure notification system (great!), but it’s also up to the governments to opt into it, and not all of them are doing so.
Encryption trade offs
To be clear, I’m not saying that encrypting all of the content is necessarily the best thing to do. Amassing data centrally can be of value not just to the companies, but to the users and society at large. While it might be possible to power a lot of innovative technology in a privacy focused manner, such as utilising on-device machine learning, it’s probably more efficient to develop say self driving cars if you can collect real video data from as many cars as possible.
And even if you’re not extracting value from data, storing unencrypted data is simpler and more efficient. It’s also more convenient for the users as the data is always backed up and easy to access from any device.
All that, coupled with the technical difficulty of creating end-to-end encrypted applications, it might well be the case that we will always have to lean on privacy policies, government regulations and put trust in companies when it comes to protecting our data.
Privacy focused software landscape
Nevertheless, high quality tools that are built in a privacy focused manner do exist. Perhaps most impressively - WhatsApp with its mass deployment of end-to-end encryption tech (Facebook ownership aside), as well as similar messaging apps, Signal and Wire. Then there’s ProtonMail providing the end-to-end encrypted email service. And 1Password, a cloud backed encrypted password manager.
Another, less known tool built on cutting edge tech is Scuttlebutt a fully decentralised (and fully functional) p2p social network. Scuttlebutt, which is similar to Twitter, does not utilise end-to-end encryption (for the social networking functionality), but it’s built in a way that means it works entirely without any servers (!). It’s completely p2p and it’s impressive that they made it work so well. BitTorrents are p2p and they work too, but that’s static file sharing, where Scuttlebutt is a living, breathing, dynamic social network with messages getting posted and propagated to the relevant peers all the time. I bring up Scuttlebutt, because as we see later it can inform the architecture of building privacy focused collaborative software.
Tangent: Speaking of privacy, election meddling and Scuttlebutt… Scuttlebutt can be viewed as an alternative to Facebook’s centralised approach. Facebook’s data is behind a login and access is monitored and limited (not always true as we’ve seen with the Cambridge Analytica). While Scuttlebutt’s data is completely open for anyone to mine, archive and analyse. But Scuttlebutt doesn’t have profit motives or server costs and so does not need to create targeted Ad tools or even algorithmic feeds that were used in election meddling. Which approach is better?
Finally, Bear or Things take a different privacy focused approach. Even though these apps do not encrypt the data, they store it in your private iCloud account, which means the app developers never get to see it. However, these apps do not offer collaboration. They are meant for single user only (with the exception of having to sync data across all of your devices, which they don’t always handle gracefully).
Building a Google Docs alternative
Is it possible then to build a multi user, collaborative, end-to-end encrypted, privacy focused Google Docs alternative? I picked Google Docs, because it’s so familiar and features what to this date remains to be such an impressive piece of tech – the real-time collaborative editing with shared cursors and comments. I don’t think anybody has done collaborative text editing better than Google Docs since it’s launch, at least not in such a widely used tool or in such a useful way.
To be able to answer if this is possible, let’s make a list of requirements.
1. End-to-end encrypted
Nobody, but the content creators get to see the content.
It should be as close as possible to the real-time collaborative editing experience of Google Docs – shared cursors and commenting. Not only that, but documents, entire collections of documents and application state, such as document tags and folder structures, should be easily shareable between multiple people.
3. Easy sharing
Sharing a document, a collection of documents, or inviting other people into your workspace should be easy enough so that anyone can do it (e.g. invite by email address as opposed to copying and pasting public keys).
I should be able to search through all of the documents I have access to.
5. Offline capable
I’d like to be able to access and edit documents offline, without internet connection and sync later. Including any text edits and comment interactions.
If you drop your phone in the mud with all the documents, you can restore them from an encrypted cloud backup.
7. High availability
If I make changes to a document on my phone, those changes should propagate to other collaborators reliably and quickly even if they were offline at the time of me typing those changes. That is, collaborators don’t have to be online at the same time to sync their changes. Similarly, if you invite a new member to your workspace with some documents shared, they should be able to retrieve and decrypt those documents without any other collaborator being online.
8. Stretch goal: p2p mode
Ideally, the software, accepting some limitations (i.e. no high availability), should be able to work completely peer to peer. That is you should be able to opt out of any server participation and still be able to use the application in full even if the service provider shuts down, a longevity oriented approach.
Not all of the above requirements are strictly necessary for building privacy focused software, but they do interconnect in unexpected ways. For example, since search can not be done server side on the encrypted data, that implies having to store all the data on the client, making offline capability a useful side effect. In addition to privacy, these requirements try to incorporate longevity, data ownership and other ideas of Local-first software.
I can think of at least three distinctly different approaches to building an application like this. Let’s see how each of the approaches holds against our requirements.
Traditional server/client web app
You can most definitely create Google Docs as a web app (it is already a web app, duh). But once you add the data privacy or end-to-end encryption requirement, that’s when things get tricky.
For starters, powering a feature like Search becomes nearly impossible with encrypted content on the server. To be able to search through document content you have to have access to it. This is, for example, why ProtonMail can not search through message contents, something other email providers, such as Gmail, can do with ease.
It’s not entirely true that you can’t search encrypted content. In my research I learned of Homomorphic encryption, a form of encryption that allows to perform calculations on encrypted data and produces computation results in encrypted form without disclosing the encryption key (🤯). I don’t know what stage this research is at, but it’s definitely not “out of the box” available. Impressively, there exist npm modules such as node-seal produced by Morfix that allow you to create such encrypted data and privacy preserving computations today! It seems difficult to imagine being able to search over encrypted data, but even that seems to be an active area of research.
If the server can not search, can the client? Yes, but that means downloading and storing all of the documents in your browser’s local storage. To my knowledge, browsers, even with the modern persistent storage APIs have quotas around how much data can be stored making it somewhat impractical.
Similarly, while offline capabilities of browsers are better than ever with Service Workers, due to the storage limitations, the thought of building a fully offline capable, fast and reliable application to run entirely in the browser seems slightly out of reach.
All in all, here’s how this approach stacks against our requirements:
✅ 1. End-to-end encrypted ✅ 2. Collaborative ✅ 3. Easy sharing ❌ 4. Search ❌ 5. Offline capable ✅ 6. Backups ✅ 7. High availability ✅ 8. Stretch goal: p2p mode
I’m not saying it’s not possible to build this application in the browser, Iwith certain trade off on features it might well be feasible. And in fact, many privacy focused applications could run very well in the browser (e.g. WhatsApp does). But ultimately, in the spirit of Local-first software, native desktop/mobile applications can unlock more powerful use cases (and it’s no coincidence Scuttlebutt is not a web app).
A wide range of efforts are being worked on to bring more privacy focused capabilities to the browsers. For example, Solid (and it’s spiritual predecessor Unhosted, happy 10th birthday!) are working on enabling separating data storage from web apps themselves. It’s unclear whether collaborative applications (like Google Docs) could be powered by the proposed protocols. Beaker Browser is another distinctly different approach experimenting with enabling browsers to run a serverless p2p web.
Fully decentralised p2p app
We could instead try to build this as a Local-first app with p2p communication, without using any servers. One amazing benefit of a fully p2p approach would be that this could in theory (funding of development aside) be a free or open source product without any server hosting costs. We had a lot of open source productivity software created by the Linux (and many other) communities, but that software never got the networked collaboration capabilities to match the SaaS tools and therefore doesn’t come anywhere close in usage.
The p2p approach brings its challenges. Since there is no central server to hold on to the data, all peers have to be online at the same time to exchange the data. This is especially difficult with mobile devices (which might be one of the main reasons p2p tech did not take off more in the last decade). Say I make changes to a document on my phone, lock the phone, and then open my laptop and expect to see the latest changes - that is not going to work, because the phone can not run a permanent server in the background. Only with the help of a central server (or some decentralized “blockchain” computer..) could you hold on to the changes temporarily to propagate them to all the devices.
This is not a big problem for an application like Scuttlebutt. They have a large network of users that can share data amongst themselves (someone is always online), which is not the case for team focused private applications which have to work well even with a small number of collaborators. And secondly, the nature of the social network allows asynchronous style of communication. If I post some updates to my feed, it’s not that important how much delay there is before my friends see that post. Delay is much more important to avoid in a productivity tool like Google Docs.
Secondly, sending email notifications without a server component becomes challenging. If you leave a comment, or want to invite someone to collaborate on a document via email, that’s not something you can do from within a native app without the help of the server. At best you could copy the sharing link and paste that into an email you send manually.
✅ 1. End-to-end encrypted ✅ 2. Collaborative ❌ 3. Easy sharing ✅ 4. Search ✅ 5. Offline capable ✅ 6. Backups ❌ 7. High availability ✅ 8. Stretch goal: p2p mode
Both of these issues are something that could be addressed with time (goes back to the lack of incentives). For example, the previously mentioned Solid project is solving both personalised private storage as well as handling notifications.
Thick native client, thin central server
This brings us to the hybrid approach. A thick Local-first native client that can work offline and provide search, with a server component for facilitating sharing and high availability.
This model has been shown to work by WhatsApp. The WhatsApp messenger is a thick client that is using the server for high availability in a privacy preserving way by passing only encrypted data through the server. Trusting their servers and app implementations or their usage of the metadata is another topic for another day (is total privacy in today’s world an illusion, a lost privilege?).
The local-first approach also lets us consider a p2p mode. If we build the server-client communication in a way that also works peer-to-peer, the paid (or self hosted) server component could be entirely optional. The app could sync data peer to peer and backup to your personal cloud (iCloud, Google Drive, Time Capsule, etc.) or it could use the optional server to provide high availability and improved user experience.
✅ 1. End-to-end encrypted ✅ 2. Collaborative ✅ 3. Easy sharing ✅ 4. Search ✅ 5. Offline capable ✅ 6. Backups ✅ 7. High availability ✅ 8. Stretch goal: p2p mode
What tech do we need to build such an application? Ideally you want an out of the box, easy to plug, local-first database that can sync with a server and directly with other collaborators, allows merging application state and document edits without conflicts, supports permissioning of data access, end-to-end encryption and backups. That’s a doozy! And I don’t think that kind of database exists today.
I’m not the only one that wishes they had such a database, one recent article by Euandre in The database I wish I had and the HN discussion proposes a set of similar requirements. And a slightly older article by Jared Forsyth in In Search of a Local-First Database reviews the pros and cons of some of the existing local-first database solutions.
Let’s break this down a bit.
At the core of collaborative (or even single user multi-device) software is the ability to reliably share and merge application state between multiple parties. A truly magical data structure, CRDTs (Conflict-free replicated data types) enable such collaboration. CRDTs are effectively a set of shared data types, where multiple people can update them at the same time and the changes can be reconciled in a consistent way. The most basic example would be starting with an empty array , you pushing 1 locally, me pushing 2 locally, syncing up and both ending up with [1,2] (in that order on both devices). Regardless of when we exchange our updates, we will both eventually arrive to the same result, because the updates are sorted based on our client IDs and local update counters and applied in order.
CRDTs are fairly new and were invented after Google Docs was created (which uses Operational Transformation and not CRDTs). There’s a good introduction of CRDTs and in particular a great research backed overview of their strong suitability for building collaborative apps in the Local-first software article (with further interesting insight in HN1, HN2). CRDTs are already enabling a new wave of products, such as Figma’s collaborative editor.
What makes CRDTs so appealing is that they do not require any central server to be merged into a globally consistent state. This enables p2p use cases because peers can communicate deltas of their updates amongst themselves and see a consistent world view. Furthemore, individual CRDT updates could be end-to-end encrypted even if a central server is involved. Imagine if instead of WhatsApp messages you were sending CRDT objects, the server doesn’t need to know what’s in them as it’s not involved in merging these messages, while still providing high availability and an authentication/authorization layer. Finally, CRDTs can work really well for shared document editing, not just for slow changing data. All in all, CRDTs tick a lot of boxes and could be a good foundation for building privacy focused applications.
Conclusion: CRDTs are a great data structure that could power privacy focused collaborative applications. The local-first database should natively support this data type.
Data storage and sync
CRDTs are only a data representation. How this data gets stored on disk or gets synced to the server or other peers is another question.
You could store the CRDT update log (or compacted log) in SQLite, LevelDB or flat files to name a few examples. In addition to storing the raw CRDT data, you’d have to build and maintain secondary indices to be able to efficiently access this data. And finally, you’d need to be able to sync this data with other collaborators (either p2p or with the help of a thin server).
A local-first CRDT based database should handle all of these requirements. How one might approach implementing such a database? Let’s look at the p2p scene for inspiration.
Hypercore is a protocol designed specifically for storing and securely transmitting data in a p2p network. I saw Matthias, the protocol development lead, once describe it as “BitTorrent 4.0”. That is, not only does Hypercore allow you seed data and download chunks of it from untrusted peers, it goes beyond that by allowing you to continuously update this data and propagate changes to peers in real-time.
Hypercore is built on the ideas of append only logs and merkle trees. A merkle tree is a tree of hashes of hashes of data. Merkle trees allow you to efficiently verify that chunks of data downloaded from untrusted peers are untampered with and are part of the data set you’re after. Verification is done by computing the hash of the data chunk and checking that it correctly hashes into the root hash of the merkle tree. Merkle trees are used in BitTorrent 2.0 and Bitcoin among many other applications.
Hypercore is a solid battle tested implementation, but it is also a work in progress. For example, hypercore feeds are single user by default. Multi user systems are at a draft proposal stage. Some work on this has been done outside the core project. For example, kappa-db/multifeed is an implementation of multi user hypercore and powers a decentralised p2p Slack like application Cabal.
To create a multi user system you’d have to combine hypercore feeds from multiple people to reconstruct a unified view. Imagine a hypercore powered p2p Twitter. Everyone would post updates to their own hypercore feed, and you could then download feeds of all the people you follow from the p2p network and combine them into a unified feed view. Conceptually, this is in essence how Scuttlebutt works.
This same technique could work well in our Google Docs alternative. First, for a given document every collaborator would append CRDT updates to their personal per document feed. Then, every collaborator would download the personal feeds of other collaborators and keep them in sync (both in real-time or asynchronously, hypercore is capable of both). The CRDTs would then get combined to reconstruct a consistent view of the document. Additionally, in a p2p setting, hypercore’s merkle tree signatures would be useful for verifying that updates were not faked. We could make sure that someone with read only access to a document is not able to impersonate other users and fabricate updates. This while still allowing any peer to seed the full data set to any other member of the workspace.
This all sounds almost too good to be true. And unfortunately that might well be the case. As mentioned previously, while CRDTs are conceptually an append only system, to make the storage efficient it is important to apply compacting techniques as done in Yjs. And hypercore feeds are immutable append only logs. This makes it difficult to reconcile the ideas behind hypercore and efficient implementations of CRDTs. If you write every individual CRDT update to a hypercore feed, such feeds will grow in size too quickly.
The idea of combining CRDTs and hypercore has been implemented by the Automerge team in Hypermerge. However, I’m not aware of any discussions around the storage implications mentioned above.
Yjs team has also started work on a hypercore storage/transport implementation in y-dat, but the implementation does not actually use the core feature of the hypercore’s append only merkle tree signed log due to the mutable nature of the Yjs CRDT implementation. This removes the key benefits of hypercore discussed above. Namely the ability to sync changes asynchronously and providing redundancy by resharing feeds of other collaborators. Yjs has a separate mechanism for computing deltas between peer states, which can be used to efficiently sync with peers upon reconnection, but that’s a distinctly different approach from hypercore. To be more specific, if our Google Docs alternative had 100 users regularly using the application, all of them would have to be online at the same time to propagate all of the changes in a pure p2p mode if we had to diff deltas between each peer, a problem that hypercore solves more elegantly where every peer adds redundancy and can reshare content of any other collaborator.
Conclusion: Hypercore provides a great approach for securely and efficiently syncing changing data among many users in a p2p network. However, compatibility between hypercore and CRDTs requires further investigation. Since our application can lean on a thin server for adding high availability, a sync via server approach could be used instead. The specifics of how to sync via server remain to be investigated.
Using a thin server for syncing data between collaborators and storing data backups will require end-to-end encryption if we don’t want the application developers to be able to access this data.
The thin server model has been shown to work well in WhatsApp with some technical details available in the WhatsApp Security Whitepaper.
I was surprised to find out that every individual message (more or less) sent in WhatsApp is encrypted with a unique ephemeral key using the Double ratchet algorithm.
This is done for added security, a property called Forward secrecy. If the same encryption key was being used to encrypt all messages, someone intercepting and recording all of the encrypted communication, could one day hope to get a hold of the private key and decrypt all past (and future) messages. Forward secrecy is especially important in the https encryption (introduced only recently in TLS 1.3) of web traffic, because https traffic is encrypted using the private keys stored on the server. If those keys get compromised (as was the case with the Heartbleed), without Forward secrecy in place the attackers are able to decrypt any past recorded traffic.
It’s worth noting that Hypercore has transport encryption built in using the Noise protocol (the same tech powering WhatsApp). This is important in hypercore since it’s designed for p2p use outside the browser and so is not protected by https as traditional web apps are. I’d be interested to find out if Hypercore provides Forward secrecy too.
It is unclear to me at this time if similar considerations need to be made when encrypting the CRDT updates passing through our thin server. If we were only passing some CRDT updates temporarily through our servers (like WhatsApp), then Forward secrecy would be an added benefit. But we have to also consider what happens when a new collaborator is added to a workspace, or if someone loses their phone in the mud and wants to restore all of the data. Traditional web apps provide this benefit with ease since they already store all data centrally and unencrypted. The privacy focused version of Google Docs could also store all of the data centrally, but encrypted. But since this data is stored permanently, that seems to remove the benefits of Forward secrecy.
Potentially a more private alternative would be to design our app in a way were the encrypted backups get stored in user’s personal storage, such as iCloud or Google Drive, while using the server for propagating the CRDT updates only. This, however, leaves bootstrapping new users an open question.
Storing Yjs CRDT deltas efficiently when they’re encrypted is also difficult. You could store the full history of individually encrypted CRDT updates, but that’s inefficient as we’ve seen to be the case with hypercore.
Conclusion: End-to-end encryption in a high availability setting has been proven viable by WhatsApp. Reconciling encryption with efficient CRDT exchange and storage remains an open question. A combination of ephemeral encrypted CRDT streams with permanent compacted encrypted backups hints towards a possible solution, but requires further investigation.
Other notable projects
In my research I came across a number of other projects that have overlapping goals, but are either at proof of concept stage or do not satisfy enough of the requirements.
PouchDB - a database created ground up to power local-first applications and one of the most polished and production ready projects listed in this article. However, Pouch and Couch do not incorporate CRDTs for state merging or text editing and it is not clear at this time if it could address the needs of building collaborative multi user applications as listed in this article.
Fluid Framework has just been open sourced by Microsoft. Fluid is a library for building real-time collaborative web apps and powers some new features in their Office product. Some architectural details have been shared by Matt Aimonetti and Steven Lucco. While Fluid looks really interesting, it is designed for efficiently powering web apps and not local-first, offline capable software. If the offline requirement (and search functionality by association) were to be dropped from our list of requirements, Fluid could potentially be used for building privacy focused in browser web apps. Adding end-to-end encryption into Fluid remains to be investigated.
Earthstar - a distributed, syncable document database for making p2p apps.
Gun - a collection of libraries for building encrypted CRDT based apps. It provides a lot of the right components discussed in this article, but assembling them efficiently into a working application is left entirely to the developer and all of the challenges discussed thus far apply equally when using Gun.
Permissioning or access control remain a topic for further investigation. How do you share encryption keys between collaborators on a per document or per collection of documents basis? How do you reconcile read and write access control with the distributed nature of CRDTs? That is, the problem of making edits to a document offline where your write permissions were revoked by another user needs to be addressed.
In this article, we have looked at the viability of developing privacy focused collaborative applications, more specifically a privacy focused Google Docs alternative.
There exists a lot of prior work around building privacy focused web apps, local-first applications, p2p decentralised applications and end-to-end encrypted messaging. This article took a privacy centric view on combining ideas from existing projects with overlapping goals.
To achieve the total privacy of user created content, we need to utilise end-to-end encryption or transmit data p2p (or do both). Using either of the approaches puts restrictions on application architecture, leading to a conclusion that a native desktop/mobile local-first application would provide the best experience today. This is primarily due to the need of having to store all of the data locally in full.
We then looked at how a local-first CRDT based database could serve as the foundational core for enabling collaborative applications. We considered the challenges of efficiently and securely transmitting CRDTs between collaborators as well as efficiently storing backups of CRDT data. While building a local-first CRDT database with centralised sync and backup seems to be possible, adding p2p or encryption requirements introduces significant challenges.
All in all, the full set of requirements presented in this article are highly ambitious and the viability of such a project remains an open question. I am optimistic, however, given the truly impressive achievements of some of the existing projects presented in this article. I predict a new wave of local-first databases as well as real-time CRDT based web frameworks emerging in the near future. However, adding end-to-end encryption into the mix requires additional careful and dedicated effort.
Technology keeps delighting and surprising in how powerful and society shaping it is. Even if the incentives for creating more privacy focused software remain misaligned, I’m excited to see the field pushing ahead and I am looking forward to the next decade of innovation in the local-first, privacy focused, decentralised and collaborative tech.