Summary
I propose the following metadata format: <binary backlink><hsdt content><binary signature>
. It is slightly more efficient and simple than the current one.
On the surface, these proposals involve some radical changes to the very core of ssb. But keep in mind that this only changes the "implementation details" of the protocol, not its "api". Some of these ideas shift complexity to other parts of the ssb stack, e.g. by giving the database the responsibility to track feed ids and sequence numbers. Still, I think this is more than just moving complexity around. Keeping the core as simple as possible is more valuable than making e.g. the database easier. The database is an actual implementation detail, whereas the message format is not. In true implementation details, iterative improvement is much more realistic than in the core protocol. The more complexity is moved into these details, the more empowered are we to tackle the challenges.
And aside from this lofty, abstract bullshit, there are real performance gains. Just ballpark the current size of the metadata and compare it to the size of a single binary-encoded backlink and a binary-encoded signature. That might be a decimal order of magnitude. There are a lot of tiny messages (e.g. votes
aka "likes", contact
, etc.) where the metadata far outweights the content. So with these changes, I get my conceptual purity, and @Dominic gets the performance. Everybody wins. Well, except the people who need to adapt the database and the rpc protocols. But that might include me, as I really want to see these changes incorporated.
I'm deeply excited that a new hash-suffix gives us the opportunity to introduce all these changes in a backwards-compatible way.
A quick addendum regarding ssb's rpc protocol: Just like messages, muxrpc has both metadata of a fixed scheme, and free-form content. The metadata could be optimized as discussed in this post, the actual data could use hsdt. I'd prefer to postbone changes to muxrpc right now and focus on messages. I already have sketches for a format #bpmux that improves over packet-stream (more space-efficient, easier to encode/decode, built-in backpressure, more consistent, clearer separation between multiplexing and rpc layer, no type information, optional datagram based implementation with truly independent backpressure of substreams). Since packet-stream and muxrpc are somewhat intermingled (e.g. backpressure was added as part of muxrpc, not packet-stream), it makes sense to tackle both of these at once. But I won't open this discussion right now, I'm already spamming the network enough with #hsdt. And the current discussion is already taking too much of my time anyways.