And an encoding:
Default Encoding
This encoding of hsdt values is based on a canonical subset of cbor, extended to express the non-cbor values (sets, multifeeds, multihashes). It is used for both message signing and the exchange of messages between processes.
The default encoding is defined as follows:
Encoding Null
This is identical to cbor.null
is encoded as the byte 0b111_10110
(0xf6
).
Encoding Booleans
This is identical to cbor.true
is encoded as the byte 0b111_10101
(0xf5
).false
is encoded as the byte 0b111_10100
(0xf4
).
Encoding Integers
Integers are encoded with a tag byte indicating size and signedness, followed by the big-endian representation of the integer, using two's complement for signed integers. Note that this differs from how cbor handles integers.
The tag byte is taken from the following table:
Integer Type | Tag Binary | Tag Hex |
---|---|---|
u8 | 0b000_11000 | 0x18 |
u16 | 0b000_11001 | 0x19 |
u32 | 0b000_11010 | 0x1a |
u64 | 0b000_11011 | 0x1b |
i8 | 0b001_11000 | 0x38 |
i16 | 0b001_11001 | 0x39 |
i32 | 0b001_11010 | 0x3a |
i64 | 0b001_11011 | 0x3b |
Encoding Floating Point Numbers
This is identical to cbor.
A 32 bit floating point number is encoded as the byte 0b111_11010
(0xfa
), followed by the four bytes of the number (sign, exponent, fraction in that order).
A 64 bit floating point number is encoded as the byte 0b111_11011
(0xfb
), followed by the eight bytes of the number (sign, exponent, fraction in that order).
Encoding Multifeeds
A multifeed is encoded as the byte 0b111_11100
(0xfc
), followed by the compact encoding of the multifeed.
Encoding Multihashes
A multihash is encoded as the byte 0b111_11101
(0xfd
), followed by the compact encoding of the multihash.
Encoding Byte Strings
Byte strings are encoded as in cbor, with the following restrictions:
- no indefinite length byte strings (additional type
31
is not allowed) - byte strings must use the shortest possible encoding of their length
Encoding Utf-8 Strings
Utf-8 strings are encoded as in cbor (called "text strings" there), with the following restrictions:
- no indefinite length utf-8 strings (additional type
31
is not allowed) - utf-8 strings must use the shortest possible encoding of their length
Encoding Arrays
Arrays are encoded as in cbor, with the following restrictions:
- no indefinite length arrays strings (additional type
31
is not allowed) - array must use the shortest possible encoding of their length
Encoding Sets
Sets are encoded just like arrays, except the second-most-significant bit of the first byte is 1
rather than 0
(e.g. the empty set is 0b110_00000
or 0xc0
).
The entries of the set must be sorted lexicographically by their encoding. When decoding, encountering an entry that is not lexicographically greated than its predecessor must be treated as an error. Note that this also disallows duplicate entries.
Encoding Maps
Maps are encoded as in cbor, with the following restrictions:
- no indefinite length arrays strings (additional type
31
is not allowed) - maps must use the shortest possible encoding of their length
The keys of the map must be sorted lexicographically by their encoding. When decoding, encountering a key that is not lexicographically greated than its predecessor must be treated as an error. Note that this also disallows duplicate keys.
Non-Canonic Encoding
The non-canonic hsdt encoding lifts the restriction that sets entries and map keys must be sorted lexicographically. Note that:
- duplicate set entries or map keys are still disallowed
- no other restrictions are lifted (in particular, length indicators for strings/collections must still be as small as possible)