A bunch of responses, in no particular order:
With the part of the identity function, I didn't mean literally taking the bytes of the string, but I see how I failed to communicate this - "identity" was not the right word. What I actually meant is having an (essentially arbitrary) bijection between uint64s and some set of json strings, and only allowing those strings as suffixes. A reasonable way of doing that is using the hexadecimal encoding of the uint64 without leading zeros.
Actually, I suggest reserving all ids that start with a 0 bit
Larger numbers get larger varint encodings, so that makes the non-preassigned ones second-class (in fact, all of them would need 9 bytes to encode). So what about reserving all ids that end with a 0 bit (aka even numbers)? And also, do we really need to reserve 2^63 identifiers?
For hashing, I think I would use the rightmost 53 bits though, so the hex is the same, except in 50% of the cases the first digit. Otherwise we would use the first 64 bits except the first bit (weird) or the first 63 bits - but prepending a 1 shifts all the bits and so the hex will look completely different (also weird).
I don't understand this part. Wouldn't you just take the string, hash it to an identifier, then encode the identifier in whatever format you need? Also I don't get where the 53 comes from. And the "hex will look completely different" from what?
I think just shifting in a 1
bit into the digest would work fine.