I don't know what you mean. I thought we are designing the spec right here right now?
Ok, let me rephrase: The scuttlebutt protocol needs encodings for exactly two purposes: Message signing, and message exchange. Nothing else is strictly necessary.
but being able to copy-paste (and edit by hand) human-parseable multibox representations is important to me.
This would be a feature of your application (e.g. go-sbot), but not part of the protocol. I can write my own ssb implementation and have it talk to go-sbot, while completely ignoring your human-friendly representation.
I'm fine putting in some sort of "recommended human-friendly encoding" into the spec, and I like your suggestions for this purpose, but it would be purely that - an optional recommendation.
The only option for truly making this "first class" I can see at this point, is mandating that all server must be able to decode messages received in this format. But the protocol only describes data-exchange between machines, and those should use the binary encoding instead. So this would feel very forced. And I'd strongly prefer keeping the minimum stuff you need to do to call your program an ssb implementation as small as possible. It violates the principle of minimality.
This discussion is now also getting very close to the minimal feature set of an ssb server. If we want to specify a general interface for user-server interaction, then something like this starts making more sense.
I think of the ssb protocol(s) as three layers:
- message format and signing encoding
- communication between servers for replication
- communication between server and other processes
The topmost layer is the most important one. Two applications that agree on this can work with data from the scuttleverse, even if they communicate by carrier pidgeon (i.e. ignore the lower layers). You could have a scuttleverse that uses a completely different replication layer, and that is fine.
Everything that implements the topmost and middle layer is part of the main scuttleverse (modulo shs appkey). We don't care how exactly @vendan authored or read data from their database, but still they were part of the scuttleverse.
The third layer is a domain where we can only give recommendations, but can't enforce anything. All the stuff we can enforce lives in the first two layers (and even then "enforcing" is not the correct terms, we are just not communicating (directly) with those who implement their own second layer, but they can still be happily building their own islands that can bridge into the main verse). The human-readable format should in my opinion live in the third layer, not above. It makes a lot of sense to define one, to foster interoperability, but it just won't be as "first class" as the signing format (topmost layer) or the transport encoding (middle layer).
When I started this discussion, I wanted to define details of layer one ("Is it ok to restrict the multibox identifier space to 2^64 values?" and "Can we find a sensible way to encode those 2^64 identifiers in the signing format?"). I feel like you (and also @Dominic) are fine with an identifier space of size 2^64. I also feel like we agree that a simple bijection between those identifiers and short strings is the way to go for the signing encoding.
So at this point, I am sufficiently unblocked, and I think we don't need to finalize a decision of the human-friendly format just yet.
A short summary of the outcome (or at least the current state) from my perspective:
- a multibox consists of a cyphertext and a natural number between 0 and 2^64-1 inclusive that identifies the en/de-cryption algorithm
- private-box has identifier 0
- to allow people to use their own, non-official formats, we promise to never use an odd identifier for the main protocol(s)
- if there is a strict advantage in using the most-significant bit instead that please tell me, I'm not aware of one
- using the least-significant bit however allows people who are choosing there identifiers non-randomly to use shorter ones
- the legacy signing encoding can use the hexadecimal encoding (lower-case) of the identifier
- with the metadata redesign we'll switch to a binary signing encoding and deprecate the legacy one, so in practice there won't be (m)any messages for which the larger string-length compared to base32/58/64 is an issue, but everyone still needs to support it, and base16 is simpler
- we will at some point define human-friendly representations for all the data formats. For multiboxes, the scheme proposed here looks reasonable.
If you have good reasons for partitioning the identifier space differently, or for not using hexadecimal for the identifier encoding, I'm open to switching - those are just my (slightly) preferred choices.