%2OU+8wxtzrfZpbapZhuiHZyrdBMMkr2KAP51+Ngbpoc=.sha256

@aljoscha5 years ago %2OU+8wxtzrfZpbapZhuiHZyrdBMMkr2KAP51+Ngbpoc=.sha256

And an encoding:

Default Encoding

This encoding of hsdt values is based on a canonical subset of cbor, extended to express the non-cbor values (sets, multifeeds, multihashes). It is used for both message signing and the exchange of messages between processes.

The default encoding is defined as follows:

Encoding Null

This is identical to cbor.
null is encoded as the byte 0b111_10110 (0xf6).

Encoding Booleans

This is identical to cbor.
true is encoded as the byte 0b111_10101 (0xf5).
false is encoded as the byte 0b111_10100 (0xf4).

Encoding Integers

Integers are encoded with a tag byte indicating size and signedness, followed by the big-endian representation of the integer, using two's complement for signed integers. Note that this differs from how cbor handles integers.

The tag byte is taken from the following table:

Integer Type	Tag Binary	Tag Hex
u8	0b000_11000	0x18
u16	0b000_11001	0x19
u32	0b000_11010	0x1a
u64	0b000_11011	0x1b
i8	0b001_11000	0x38
i16	0b001_11001	0x39
i32	0b001_11010	0x3a
i64	0b001_11011	0x3b

Encoding Floating Point Numbers

This is identical to cbor.

A 32 bit floating point number is encoded as the byte 0b111_11010 (0xfa), followed by the four bytes of the number (sign, exponent, fraction in that order).

A 64 bit floating point number is encoded as the byte 0b111_11011 (0xfb), followed by the eight bytes of the number (sign, exponent, fraction in that order).

Encoding Multifeeds

A multifeed is encoded as the byte 0b111_11100 (0xfc), followed by the compact encoding of the multifeed.

Encoding Multihashes

A multihash is encoded as the byte 0b111_11101 (0xfd), followed by the compact encoding of the multihash.

Encoding Byte Strings

Byte strings are encoded as in cbor, with the following restrictions:

no indefinite length byte strings (additional type 31 is not allowed)
byte strings must use the shortest possible encoding of their length

Encoding Utf-8 Strings

Utf-8 strings are encoded as in cbor (called "text strings" there), with the following restrictions:

no indefinite length utf-8 strings (additional type 31 is not allowed)
utf-8 strings must use the shortest possible encoding of their length

Encoding Arrays

Arrays are encoded as in cbor, with the following restrictions:

no indefinite length arrays strings (additional type 31 is not allowed)
array must use the shortest possible encoding of their length

Encoding Sets

Sets are encoded just like arrays, except the second-most-significant bit of the first byte is 1 rather than 0 (e.g. the empty set is 0b110_00000 or 0xc0).

The entries of the set must be sorted lexicographically by their encoding. When decoding, encountering an entry that is not lexicographically greated than its predecessor must be treated as an error. Note that this also disallows duplicate entries.

Encoding Maps

Maps are encoded as in cbor, with the following restrictions:

no indefinite length arrays strings (additional type 31 is not allowed)
maps must use the shortest possible encoding of their length

The keys of the map must be sorted lexicographically by their encoding. When decoding, encountering a key that is not lexicographically greated than its predecessor must be treated as an error. Note that this also disallows duplicate keys.

Non-Canonic Encoding

The non-canonic hsdt encoding lifts the restriction that sets entries and map keys must be sorted lexicographically. Note that:

duplicate set entries or map keys are still disallowed
no other restrictions are lifted (in particular, length indicators for strings/collections must still be as small as possible)