Compact Legacy Message Representation (CLMR)
I wrote up a spec for a compact representation of legacy messages. It uses cbor for the content data, and an efficient binary encoding for the metadata. Also the first byte is never the same as that of a json encoding, which might come in handy later (and I intend to also keep that property for the metadata redesign).
This is semantically identical to the currently used messages, but it takes up less space and is simpler to parse. Intended uses:
- replication rpcs can exchange this form of the data (CC @Dominic)
- database can store this instead of json (CC @arj, @keks)
The format will not be used to compute signatures, so it is not tied heavily into the core of the protocol. That's why I simply went ahead and wrote up the full thing. None of it is set in stone (yet) either, so feel free to criticize.
There's a rust implementation for the serialization of this.
Size of my (somewhat recent) feed in bytes, for different encodings:
- signing encoding (json with whitespace): 2160151
- json without whitespace: 1951148
- clmr: 1620807
Considering how much of my feed consists of json strings that clmr can't compress further, that's not too bad. More importantly, it is much simpler to both produce and deserialize than json. And the same binary varints, identifiers, etc. will also be used in the new metadata format. Ideally, the json can be fully deprecated, except for signature computation and UIs.
Coming soon: rust deserialization, testing of the rust implementation, test data set.
Coming soon-ish: js bindings.
A small detail, the handling of private boxes, is still under active discussion. For the rust implementation, I simply encoded the private box format specifier with a single zero byte, which is what my suggested approach to the general multibox encoding would do.