On the topic of alternative ways to go: There's also the option of clearly separating encoding and hash function indicators in cypherlinks. The current proposal (well, it's not really a proposal yet, more an opinionated exploration of the design space) mashes these two together, an id signifies both the hash function and the encoding that was used to turn the logical value into hashable bytes. That's how we can introduce new formats in the current, string-based cypherlink encoding, too. But hsdt could instead decouple those completely, tracking different ids for encodings and hash functions. And now that I think about it, nothing stops the json-string encoding from appending two suffixes, one for the hash and one for encoding (with an empty encoding suffix indicating the default json hashing encoding).
Separating those would mean adding another varint to the generic hsdt encoding, so we'd get <sigil-varint><encoding-varint><hash-varint><length><content>
. The cost of this would be a single additional byte, but we'd reinforce the conceptual distinction. And having to manually assign all possible combinations of (encoding, hash) pairs into combined ids would be really annoying, as would be the resulting ad-hoc disambiguation code in implementations. In some sense, the same arguments for giving sigils their own varint also apply here. The more I think about this, the more I'm leaning toward separating encoding and hash.
CC @Dominic since this could be done with the json encoding as well.