Merkle Trees for SSB Feeds
Or: Enabling partially subscribable feeds, but with a lot of words before I get to that point.
Like offchain content, this is another idea that has been floating through the scuttleverse at various points in time. A compact summary that is only useful if you already know what this is about: By changing the metadata of ssb messages, we could create merkle trees rather than linked lists. This allows verifying the k
-th message of a feed by traversing only log(k)
many messages, rather than the full k
messages of the linked list in the current model.
This is an issue that
- touches the very core of the protocol
- seems to be both rather abstract and hammering on pointless detail at first
- but turns out to have far-reaching consequences
This post will at times be very technical and somewhat computer-sciency, but I'll try to explain enough that anybody interested can follow the gist of it. It's also a pretty long read, but I hope a few of you can still take the time to read it. Some of it my look complicated at first, but once you gain some intuition, it all becomes manageable. It's definitely not too complicated for real-world use - dat uses a data structure like this internally.
Outline:
- background on ssb security guarantees
- why would we need merkle trees
- how merkle trees work
- implementing it via ssb metadata
- complexity analysis
- further optimizations
- conclusion
CC protocol humans: @Dominic, @dinosaur,
@Piet, @cel, @Christian Bundy, @cryptix, @keks
Background: SSB's Security Guarantees
In order to meet its security guarantees, ssb verifies certain properties for all messages. The most basic one is to check that it indeed belongs to its claimed feed. This is done by checking the signature attached to every message. If that were all the guarantees ssb provided (and such a protocol is interesting in its own right), then there would be no need for the backlinks in each message's metadata. The backlink is simply a secure hash of the previous message. The first property gained through backlinks that might come to mind is immutability. Even the feed owner can not retroactively change the feed, since that would break the linked list of hashes. Once somebody knows the first k
messages, the feed owner can not convince that person that any message up to k
was different than that person first thought.
At a first glance, sequence numbers seem to give immutability already. If there were no backlinks, but you had a set of k
messages, each with a different sequence number between 0
and k - 1
, then you could still restore the correct order and you could detect any retroactive changes by just checking whether two different messages claim the same sequence number, at which point you'd declare the feed broken (just like a forked feed in ssb's actual model is considered broken). So why go through the trouble of a linked list?
If ssb only used sequence numbers, then a malicious feed could hand out different copies of itself to different peers, and they could not tell that something weird was going on until they compared feeds. But they's never have an incentive to compare feeds, why would they request a message they already have?
Why do backlinks solve this, whereas sequence numbers fail? The list of hashes guarantees that the messages have actually been created in the order of their claimed sequence numbers.
Suppose you just asked me for the fifth message in my feed, and there were no backlinks. I could just make up some message content, give it seqnum 5
, sign it, and hand it over to you. I would not need to commit to the content of any previous messages. If some time later you asked for the third message in my feed, I could again freely invent something. You have no way of knowing that the third message has actually been authored at a later point in time.
Backlinks prevent this and forces everyone to adhere to the linear temporal order of messages. If I made up the fifth message, I'd also need to give it a backlink. But if I freely invented that backlink, and later you asked me for message four, I couldn't come up with a message that has that invented hash (if I could, the hash function would be broken and I could do much more profitable things than lying about ssb messages). So instead, to invent message five, I'd also need to invent the content of message four first. But since that one again has a backlink, I need to commit to the content of mesage three before that, and so on. As a result, ssb gives the guarantee that the order of messages in a feed corresponds to temporal order in real life. That's pretty amazing once you think about it.
This temporal order guarantee also translates into truly immutable feeds, where it is impossible to hand out more than one version of the feed to different parties.