Protocol Changes - A Todo List
For general transparency, added resilience against falling coconuts, and admittably as a defensive reaction to this exchange, I typed up my current set of scribbled notes on what still needs doing to upgrade the ssb protocol. It's very rough, but I'd rather start working on the entries themselves, than spending more time packaging them. There are a few open questions in there, a few more things I have yet to properly write about, and a bunch of preferred outcomes of mine.
Metadata
Timestamps
- keep timestamps as they are, make them optional, or drop them?
- properly engage with mix on optional timestamps
- give better background on the problems than just this post
- timestamps as floats or ints?
- ints please, since we dropped monotonicity and float suffer year 2038 problem
- how to encode optional timestamps in the binary metadata of hsdt messages
- omit, or special value? Depends on other metadata considerations
Other Medadata
- more in-depth writeup on possible paths forward (dedicated post)
- figure out cost/benefits ratios for the things we could roll out in one go, suggest a reasonable package, probably keeping
feedid
andsequencenumber
- figure out cost/benefits ratios for the things we could roll out in one go, suggest a reasonable package, probably keeping
- figure out how to get
type
in to the metadata, but still encryptable- suggest increasing size of type field so that it can hold 512 bit hash digests once 256 bit hashes become broken
- use the changes to set a new message size limit, or at least its calculation, based on actual number of bytes
- including or excluding metadata?
- content as "blob" (dedicated post)
- metadata could include hash of data, not the data itself
- personal blockchain then consists purely of metadata
- allows deleting and blocking specific messages without blocking a whole feed (GDPR!)
- replication rpcs can concatenate metadata and actual data, no need for additional blob roundtrips
- how does this work with the type field?
- how does this interact with encrypted messages?
- drawbacks:
- additional hash verification
- takes some work to implement
- changes rpcs
- changes db
- db can mabye "cheat" and store things as usual, converting as needed?
- additional failure mode in the client api: can have a message's metadata but not its content
- work out binary encodings cyoherlinks, hashes and signatures
- could use ipfs multihashes, but should use a different type table
- could roll our own where the hash type implies the length, and only future hash types must encode length explicitly
- can preallocate ids for as-of-yet undetermined hash functions of a specific digest size
- not a lot of work
- once all this is decided upon, find a binary encoding
Data
HSDT
- add integers (yes)? add cypherlinks (yes maybe)? add sets (no maybe)? and arbitrary-key maps (no maybe for now)? (dedicated post)
- how can new data types can be introduced to the ssb-client js api? (dedicated post)
- simply break things?
- preferred: use non-enumerable properties, maybe
toJSON
if needed - this is solvable - migrate stringly-typed cypherlinks to hsdt cypherlinks?
- find out how much new data types require changes to sbot's database
- Infinity, -Infinity, -0 and NaN are probably fine?
- what about integers?
- what about cypherlinks?
- find out whether some db data migration is needed when rolling out hsdt
- fix the last errors in the testing-area implementation
- update the implementation with all additional datatypes
- prepare a body of afl-generated test cases for other implementations to test against
- figure out the conflict between keeping the protocol extensible by ignoring certain data vs ensuring canonicity
- encode collection sizes in bytes or items? (dedicated post)
- leaning towards items, esp. since bytes don't really make decoding easier. When decoding an array, you'd like to know how many items to allocate, not their total size.
- considerations: async parsing, malicious size indication, highly nested input, when does encoding and decoding happen, compactness of data, ease of implementation, efficiency of implementation (benchmarks, ask arj or dominic for json data sets and convert json to cbor?), transport format framing data, does the binary metadata include the content size in bytes, streaming encoding and decoding
- leaning towards items, esp. since bytes don't really make decoding easier. When decoding an array, you'd like to know how many items to allocate, not their total size.
Semicanonical Json Replacement
- all of the above again, with fewer decisions but more ugly workarounds
- not looking forward to this, but open to help dominic with the design
RPCs
- read up on the replication rpcs
- how should hsdt messages be added?
- should clients be able to only request non-legacy msgs?
- figure out which rpc updates semicanonical json requires
- in general, try to make rpcs take encodings as args and work across all of them
- propose an encoding-agnostic rpc for asking for messages in the format that gets hashed, not the wire format
- new server impls only need to implement this and can then replicate the whole set of messages
- as efficiency demands it, they can update to direct handling
- nb: with the current hsdt proposal, hash encoding and transport encoding are identical
- nb: for legacy json message, hash ecoding and transport encoding differ, and obtaining the hash encoding is painful
- nb: future ssb versions might want to add more efficient transport encodings (there's always the option of compressing the data), so this rpc should work with future formats as well
Other
- figure out a whole rollout plan, what needs to be done, and coordinate things so it gets done, do as much as needed myself (but trying to stay clear from the database layer...)
- figure out how to present a summary of everything for dominic after burning man
- and write it (dedicated post)
- write my dream spec for ssb, ignoring backwards compatibility (suggested by @Piet) (dedicated post)
- write down spec for leylines, similiar to ssb, except: (dedicated post)
- feeds are trees rather than lists, allowing partial subscription and replication
- untrusted gossiping mode with temporary ids over a fully random overlay
- write down the full bpmux spec (thinking is already done) (dedicated post)
- work on personal projects, not just ssb
- draw more birds