[...] the collision probability with 1000 64bit hashes is pretty darn low (<10^(-13)).
This is exactly the same as directly choosing a random number between 0 and 2^64-1. And pretty much the same as choosing a number between 2^32 and 2^64-1, if we want to reserve a few trillion ids. Well, or between 255 and 2^64-1 (not that it makes much of a difference).
As for the hashing, I just don't see how this provides any advantage over just allowing any 64 bit id without caring about how it was created.
That way the database can compute the index number, even though it doesn't know the format string.
If you directly encode the index instead, you can also compute the index number: by applying the identity function. That's even simpler that hashing.
The difference here is whether the set of sub protocols and formats is open or closed - lets say that hash id of a format is actually derived from the hash of it's spec document, such that you can request that spec for protocols you don't understand.
How realistic is it that this will get implemented? How realistic is it that such a lookup will be simpler than a lookup by numeric id? And say we could look up the corresponding spec for an identifier. What then? If the machine can't act on it alone, then we didn't gain anything.
@keks if we reserved two bytes we'd capture every thing currently in ipfs's multiformats table
None of which is a multibox format. So why?
And if this is not about only multiboxes: Why should all formats share the same table? A key would need to reject any non-key formats anyways. Also, the clmr proposal uses compact type-length-value encodings that can skip encoding the length for certain identifiers. This isn't set in stone of course, but why should we drop that property?
So I think the simpler way is an official table, but also allow hash based ids, so that experimenting with a new type can't be blocked by a committee.
The format does not need to care about this! It receives an unknown id, it stores the message, done. No need to care about whether that identifier was created by incrementing a counter, by choosing a random number, or by hashing some specific sequence of bytes.
So I think the simpler way is an official table, but also allow hash based ids, so that experimenting with a new type can't be blocked by a committee.
This doesn't make sense. How is a committee able to block the numeric identifier 937931567
but unable to block the format whose hash is 937931567
?
A general note: Hashes distribute evenly over all 2^64-1 possible values, making varints useless for compression. Assigning small numeric identifiers allows us to keep the encodings smaller via varints.
And a final remark: This is such a tiny detail, it's just not worth adding any complexity. The gains are so ridiculously small, we should just go with the simplest possible solution.