You are reading content from Scuttlebutt
@aljoscha %2OU+8wxtzrfZpbapZhuiHZyrdBMMkr2KAP51+Ngbpoc=.sha256
Re: %BTnweXwHe

And an encoding:

Default Encoding

This encoding of hsdt values is based on a canonical subset of cbor, extended to express the non-cbor values (sets, multifeeds, multihashes). It is used for both message signing and the exchange of messages between processes.

The default encoding is defined as follows:

Encoding Null

This is identical to cbor.
null is encoded as the byte 0b111_10110 (0xf6).

Encoding Booleans

This is identical to cbor.
true is encoded as the byte 0b111_10101 (0xf5).
false is encoded as the byte 0b111_10100 (0xf4).

Encoding Integers

Integers are encoded with a tag byte indicating size and signedness, followed by the big-endian representation of the integer, using two's complement for signed integers. Note that this differs from how cbor handles integers.

The tag byte is taken from the following table:

Integer Type Tag Binary Tag Hex
u8 0b000_11000 0x18
u16 0b000_11001 0x19
u32 0b000_11010 0x1a
u64 0b000_11011 0x1b
i8 0b001_11000 0x38
i16 0b001_11001 0x39
i32 0b001_11010 0x3a
i64 0b001_11011 0x3b
Encoding Floating Point Numbers

This is identical to cbor.

A 32 bit floating point number is encoded as the byte 0b111_11010 (0xfa), followed by the four bytes of the number (sign, exponent, fraction in that order).

A 64 bit floating point number is encoded as the byte 0b111_11011 (0xfb), followed by the eight bytes of the number (sign, exponent, fraction in that order).

Encoding Multifeeds

A multifeed is encoded as the byte 0b111_11100 (0xfc), followed by the compact encoding of the multifeed.

Encoding Multihashes

A multihash is encoded as the byte 0b111_11101 (0xfd), followed by the compact encoding of the multihash.

Encoding Byte Strings

Byte strings are encoded as in cbor, with the following restrictions:

  • no indefinite length byte strings (additional type 31 is not allowed)
  • byte strings must use the shortest possible encoding of their length
Encoding Utf-8 Strings

Utf-8 strings are encoded as in cbor (called "text strings" there), with the following restrictions:

  • no indefinite length utf-8 strings (additional type 31 is not allowed)
  • utf-8 strings must use the shortest possible encoding of their length
Encoding Arrays

Arrays are encoded as in cbor, with the following restrictions:

  • no indefinite length arrays strings (additional type 31 is not allowed)
  • array must use the shortest possible encoding of their length
Encoding Sets

Sets are encoded just like arrays, except the second-most-significant bit of the first byte is 1 rather than 0 (e.g. the empty set is 0b110_00000 or 0xc0).

The entries of the set must be sorted lexicographically by their encoding. When decoding, encountering an entry that is not lexicographically greated than its predecessor must be treated as an error. Note that this also disallows duplicate entries.

Encoding Maps

Maps are encoded as in cbor, with the following restrictions:

  • no indefinite length arrays strings (additional type 31 is not allowed)
  • maps must use the shortest possible encoding of their length

The keys of the map must be sorted lexicographically by their encoding. When decoding, encountering a key that is not lexicographically greated than its predecessor must be treated as an error. Note that this also disallows duplicate keys.

Non-Canonic Encoding

The non-canonic hsdt encoding lifts the restriction that sets entries and map keys must be sorted lexicographically. Note that:

  • duplicate set entries or map keys are still disallowed
  • no other restrictions are lifted (in particular, length indicators for strings/collections must still be as small as possible)
Join Scuttlebutt now