...continued from previous post
VarU64
The VarU64 encoding for unsigned 64 bit integer works as follows:
first byte | value |
---|---|
0b0000_0000 - 0b1111_0111 | The numeric value of the byte |
0b1111_1000 | The byte is followed by an 8 bit unsigned big-endian integer |
0b1111_1001 | The byte is followed by a 16 bit unsigned big-endian integer |
0b1111_1010 | The byte is followed by a 24 bit unsigned big-endian integer |
0b1111_1011 | The byte is followed by a 32 bit unsigned big-endian integer |
0b1111_1100 | The byte is followed by a 40 bit unsigned big-endian integer |
0b1111_1101 | The byte is followed by a 48 bit unsigned big-endian integer |
0b1111_1110 | The byte is followed by a 56 bit unsigned big-endian integer |
0b1111_1111 | The byte is followed by a 64 bit unsigned big-endian integer |
Each integer may only be encoded using the smallest possible number of bytes. When decoding, violations of that constraints must be reported as errors.
This format (compared to the current ipfs varuint):
- restricts the domain to 64 bit unsigned integers
- indicates the length of the value in the first byte
- admits exactly one valid encoding per number
- can be parsed very efficiently
- optimizes for small values (can store 248 different values in a single byte, compared to 128 for the ipfs varuint)
- pays for these advantages by leaving quite a few byte strings unused (the encodings that do not use the smallest possible number of bytes)
- if it ever becomes absolutely necessary to extend this format to handle integers of larger (or even arbitrary) size, these unused values can enable such an extension
I'll revisit the #yamf formats as necessary once I'll need them in the rust ssb implementation.