Moving message content out of the sigchain
(aka #offchain-content aka #blob-content)
The building blocks of the scuttleverse are the user's sigchains. Each time you post a message, it is appended to your identity's sigchain, a sort of personal blockchain. This is accomplished by running the whole message - both metadata and content - through a hash function, and using those hashes as the cypherlinks that build the signchain. This post talks about a possible change to that scheme: What if instead of running the message content through the hash function, you'd only use a hash of the content to compute the message's hash? This suggestion has been brought up independently at multiple points in the past. In this post, I'll summarize pros and cons of this alternate approach.
A tiny parenthesis on terminology: The #blob-content hashtag has been proposed to refer to this concept. The actual implementation will very likely not involve blobs at all, so I'm using #offchain-content instead, which is descriptive of what precisely the proposal is about, rather than referring to implementation details.
If ssb only wanted to do efficient data replication, it would not need a sigchain, any linked list would do. What the sigchain gives us, is data integrity. Given any message in a feed, ssb can (and does) verify its authenticity by traversing the list of previous messages, checking that all cypherlinks are indeed correct. These checks necessarily need to acces all the data that was used to compute those hashes. In particular, this currently means the content of all messages. All content has to be stored, else message verification can not work. But if the cypherlink of a message only depended on its metadata and a hash of its content, than you'd only have to store the hash of the content to preserve verifiability of the whole sigchain. From this observation, we can derive the arguments that support offchain content
Pros
Selective Deletion
Sometimes, you may want to delete a message, maybe because it offends you, or because its content is illegal in your nation's jurisdiction. But under the current system, you can not delete message content, so you'd have to delete the whole feed. This is not always appropriate. With offchain content, you'd get that ability. You could give a list of banned hashes to your ssb implementation , and whenever it receives a message whose content hashes to one of those banned values, it does not get saved. You could even get fancy and store socially curated banlists on ssb itself.
Users can be required by law to delete specific content. Right now, that would mean blocking the whole feed, so these laws would wipe out full ssb identities, even if they only posted a single message that violates these laws. offchain content fixes that.
Faster Initial Sync
When downloading data from the scuttleverse for the first time, or after a longer offline period, you need to wait a long time for all downloads to finish. In many cases, you might want to look at the newest messages first. But ssb has to verify their integrity before displaying them, and thats a good thing, let's not optimize performance by forfeiting security guarantees. But to verify the new messages, we first have to download all previous messages. And in particular, we have to download all previous message's content. With offchain content, the total amount of data needed for verification would shrink significantly. When syncing, the ssb server could first request all the metadata (including content hashes), and then the content in reverse chronological order. This could drastically reduce the time needed for apps to become functional when performing a large sync.
This proposed replication scheme is not appropriate in all cases, but there's nothing forcing us to always use it. And we could do clever rpcs like "Sync in chronological order, unless it is more than n
messages, in that case send metadata first and then the reverse-chronological contents.". Nothing forces us to implement any of these at once, but there is vast potential for speeding things up.
Another useful potential replication rpc here: "Sync in chronological order, but skip all message contents larger than k
bytes, I'll get those later on demand, if I really need them."
Lifting the Message Size Limit
There is currently a hard(-ish, unicode and ssb are weird) limit on the maximum size of ssb messages. That's the reason why I will have to split this post into multiple parts. The makes sense, a reasonably-sized sigchain is essential for ssb to work. But if we only stored a hash of the content, then the content size would not affect the sigchain size any more. For most user of the current ssb applications, the practical impact will be writing posts without having to worry about hitting a size cap. To developers in general, it simplifies the API for sending messages, without the need to handle the case of messages that are too large.
Whether it still makes sense to enforce a new limit, albeit much higher than the current one, will need more careful discussion. Blobs also currently have a size cap, even tough they don't bloat the sigchains. I think #bpmux will make it possible to completely lift the limit (or rather to set it to 2^64 - 1
), which in practice amounts to the same as having no message size limit at all, since the operating system would terminate any application that would deal with values of that size anyways.
Feed Compression, Garbage Collection, and Forgetting Content
Some feeds may get very large. But you probably won't be constantly accessing all the data from years ago. With the change, you could locally delete old, rarely accessed data, and simply download it from the network again, if you do happen to need access again.
Now, there are risks involved with this - if everybody did this, data might get lost. So doing this on all devices might not be a good id. But just having the option for doing this on devices with limited hard drive space expands the range of settings where ssb can be useful. Looked upon another way, these very risks are a feature in the case of illegal content.
And maybe, you might want some content of yours to disappear from the network. Ssb is a distributed setting, so you can't force anyone to delete it. But you can ask nicely. And since ssb works by surrounding yourself with trusted "friends", there's a good chance they will respect your wish and delete the message content and add its hash to their local list of banned hashes.
continued in the next post...