@Dominic I lack the energy for a thorough response to this, but here are a few immediate reactions. I apologize in advance if these turn out too raw or direct. I hope this does not come out as too negative, I really do appreciate you taking the time to pick up this conversation.
fork from @aljoscha's proposal [...]
Just to clarify: This was not intended as a proposal for ssb, this was simply me going down a fun rabbit hole about choosing a syntax for a lisp. This started by thinking about highly dynamic languages with content-addressable module loading, see #ssb-annah for the origins of this (I remember mentioning a "lisp with an event loop" somewhere in that channel, that's what I was fleshing out yesterday). But I'll take the bait and consider the potential usage for ssb in this response.
Had I intended this as an ssb proposal, I would have suggested a binary format instead, together with a human-readable, non-canonic display format.
I think the SDN proposal just goes to show how difficult it is to design a cannonical format, [...]
I disagree with this being difficult, it took less than 24 hours to come up with that - all the difficult stuff has already been solved elsewhere. The equality relation and the total order take some space to write down, because they are inductively defined over all data types, and there's a bunch of data types (more than ssb would need btw). But none of the cases are really complicated - the sole exception being floats. But canonical floats become easy in a binary format - IEEE 754 defines binary exchange formats. Disallow all but one NaN
representation, done.
[...] and the benefit of that is just the ability to parse to an intermediate data structure, then serialize again, and have the hash still be the same.
There's a more important part: Any two programs which happen to generate the same data will serialize it to something with the same hash. That's one of the first properties I'd want a content-addressed system to have.
Given the kind of data structures different languages and runtimes might prefer are not necessarily exactly the same enough to preserve all features required for cannonicity, this can be awkward...
ssb enforces data structures, no matter whether its data format has a canonical representation or not. It currently enforces a data structure where maps are ordered sequences of pairs, which is awkward.
as it was that the JSON implementation which did support
json==JSON.stringify(JSON.parse(json))
cannonicity in V8 (but not in the JSON spec)
I'm not sure if I parsed this sentence correctly, but the property you state does not hold in V8, and everything would be horrible if it did hold.JSON.stringify(JSON.parse("2.00000000000000000000000000000001")) == "2"
, and that's a good thing. Else, node would be forced to always remember the corresponding source string to any json it parses. Which defeats some of the points in parsing.
If it doesn't require a lot of code, it requires a lot of specification...
Code in total, or code one needs to implement oneself?
On the other hand, a serialize once design is simpler overall (the signer is the only one to serialize a structure, other readers parse, but keeps a handle on the raw bytes, and just write that instead of reserializing) and doesn't introduce requirements to parse it into any particular type of structure.
This completely gives up on the idea that equivalent data structures should produce the same hash. And even stronger, it gives up on the idea that identical data structures should produce the same hash. To me, this is completely absurd.
Implementations can choose their own optimizations, but that is difficult with a cannonical format. And also, a non-cannonical format can be designed for other desirable features, such as read performance, in-place access, or ease-of-implementation.
Does a canonical format inherently conflict with performance and in-place access?
As for the ease of implementation: I don't think the complexity is too big (assuming non-textual floats). There just isn't a format with a readily available implementation (although XML comes close). In js, we have easy access to highly optimized json parsers. These optimizations might make them more complex, than a straightforward implementation fo something like sdn would be.