%wS07I+2kNlzrM92xRKifaUgK7omIdgWxVt+rlXxmhwc=.sha256

@aljoscha6 years ago %wS07I+2kNlzrM92xRKifaUgK7omIdgWxVt+rlXxmhwc=.sha256

Feed ID and Backlink to Previous Message

The backlinks are the heart of the personal blockchain of each ssb user (aka the feed). Together with the message signatures, they deliver the cryptographic guarantees of ssb. So surely there's nothing to change about these, right? Wrong.

First of all, cypherlinks could be changed to a binary encoding. A byte tagging the type (currently done by an ASCII sigil), followed by a byte indicating the hash function (currently suffixed as a free-form string). In case of healthy paranoia, only use the first four of these 16 total tag bits for the sigil, that leaves 12 bits for the hash function, giving us 16 possible sigils and 4096 possible hashes. If ssb ever uses all of those, it should be burned to the ground anyways. The hash function tag implicitly specifies the length of the hash, so these two tagging bytes can be directly followed by the hash value (in binary, not in base64).

The binary encoding would be simpler to parse than the current encoding (no base64 fiddling, no indeterminate length (putting the suffix at the end makes parsing unnecessarily complicated)), and more efficient (can directly dump the binary hash data, more compact representation of it, much shorter metadata). But this was more of a digression, the actual encoding of cypherlinks is orthogonal to the metadata encoding. It's just easier to roll out both at the same time.

But there's another thing we could improve, and that's related to the feed id metadata. I argue that we can completely drop it, by making a tiny change to the backlink: The first backlink of a feed should not be null, but the cypherlink to the feed id. That way, given any message, you can theoretically determine the feed id by following the linked list. To get the abstract model of ssb working, it is not required to store feed ids with each message. Especially since ssb assumes to always know all previous messages of a feed (else, full verification is impossible), so there can be no "broken" cypherlinks. Don't worry, I'll discuss out-of-order messages ("ooo") in a few paragraphs.

Traversing a cypherlinked list of messages to find the feed id is of course a bad idea in practice. A real implementation would store it, for example by keeping all feeds in a map from ids to sigchains. Or by storing the feed id together with each message. That's how current implementations do it, because the current signature scheme mandates it. But by moving this out of the required metadata, implementations get to chose how to deal with this. The conceptual model becomes more simple, and concrete implementations gain the freedom to improve.

Ssb does have a scenario where we can't traverse the full linked list, at that's ooo. A database can ask its peers for specific messages, without requiring the full feed of the author. Instead of verification, you trust the identities that linked to the requested message that it is indeed valid. Does this mean that the feed id just has to be in the metadata of each message after all? No, because it can be separately transmitted as part of the ooo response. Again, this moves the complexity out of the rigid conceptual model into the flexible implementation details. The rpcs are less flexible than the db implementations, because different peers need to agree on the same rpc calls. But they are still much more flexible than the hash-enforced representation of every single message on ssb up to this point.

Sequence Numbers

Everything I said about the feed id also applies to the sequence number, except that it is even simpler to replace (no need to change the first backlink to null). Let's get rid of sequence numbers!

Hash

The hash metadata must match the hash suffix that is already included in the signature. The information is redundant -> drop it. It is already optional) anyways.
Just for completeness: Since this is conceptually part of the signature, the current default representation of hash followed by content followed by signature is suboptimal for parsing reasons. Also for completeness: Just like cypherlinks could be encoded in binary, so could be signatures, so that the hash data becomes either a single or maybe two bytes of data.

Timestamp

There's already been discussion on these, both recent and some time ago. They are still mandatory, but the requirement for them to increase monotonically has been lifted. So now you can generate random timestamps, without the protocol punishing you. And if it can be done, than it will be done. There are enough people who randomize their git commit times for anonymization already.

Still including mandatory timestamps just gives client devs the illusion that they could rely on them, when they really shoudn't. Client devs have been burned by this in the past, and it will happen again, unless timestamps are completely removed.

In a distributed system, it is impossible to get true global timestamps, both logically and through limitations of physics. And why would ssb even want global timestamps, if it is about subjectivity? What we do want, is a total order on messages, which is why timestamps even use floats rather than integers. But we already have this total order, it is given by the sigchain. And unlike timestamps that can be made up, this order is cryptographically verified.

Justifying timestamps because they are needed for database queries is dangerous. They might be made up, so the queries might not make sense. And not every message type needs this, so why enforce it on everyone? If you want timestamps for your message, add a timestamp to its content, and use a flume view for database indices. Also, there will be other implementations, that might not want to provide timestamp-based queries (because they are completey useless in a ditributed setting anyways), so the core ssb model should not force them upon devs.

If your ssb-related code relies on timestamp queries, it is already broken.

Content

Yeah, let's not remove that one. I do have some ideas for a better encoding than json though...

Signature

Let's not remove these either. But a binary representation would be sensible, following the same arguments and design as the binary cypherlinks. If cypherlinks use the encoding of 4 sigil bits and 12 hash function bits, then signatures would also need to have 12 hash funcion bits, thus requiring two bytes as a tag and leving 4 bits unused. That's totally ok, just mentioning it for completeness.

continued in next post...