@aljoscha6 years ago %tLojVSXLYahw/XFXim76QL6dxnOtuYag0Ff/zBrt32Q=.sha256

YAMF-Hash

Yet Another Multi Format - Hash Digest Edition

A future-proof binary encoding for hash digests, for use in structured data.

This is basically the same as yamf-pubkey. The only difference is that the minimum size of a secure hash is 256 bits (rather than the minimum 32 bits of a public key). Thus the length of a hash with an id between 0 and 127 is computed as 2 ^ (8 + (tag >> 3)) (whereas yamf-pubkey does 2 ^ (5 + (tag >> 3)))

A quick, self-contained definition of the binary encoding:

A binarily encoded yamf-hash begins with a canonical varint called the tag.

if tag >= 128, the bytes following the tag are a varint encoding of the length of the digest, followed by that many bytes
else, the length is 2 ^ (8 + (tag >> 3)) and the tag byte is followed by that many bytes of digest data

The tag indicates what cryptographic algorithm has been used to compute the hash.

The only currently assigned tag is 0 for sha256 digests.

For human-readable encodings, the tag 0 is given the name "sha256".

User has not chosen to be hosted publicly

@aljoscha6 years ago %kve6gSAQVk072++2ImbzpB8WnhtQyrjwg0iM+kWaMbk=.sha256

#yamf

Re: %cLnq+v5N6

Linking so I don't forget this when coming back to this: %gezYf2X...

User has not chosen to be hosted publicly

@aljoscha6 years ago %cLnq+v5N6771kniz97k/j52qh1F6mUssWCGrYPwKLkA=.sha256

#yamf

YAMF-Pubkey

Yet Another Multi Format - Public Key Edition

A future-proof binary encoding for public keys, for use in non-schemaless (schemafull?) data. Expected initial uses are in:

This is fairly unexciting, but it needs to specced out somewhere. Note that it is conceptionally distinct from encoding feed ids, as those may carry more information than just a public key.

Related work:

Requirements

Support different cryptographic primitives
- which may only become known in the future
- of arbitrary key length
Be efficient for machines do deal with
Admit a human-readable representation
Allow infering data length even if the cryptographic primitive is unknown to the implementation
Admit a bijection between logical values ((key_type, data)) and valid encodings

Non-goals:

Encode what the public key is used for
Encode whether the crypto scheme is symmetric or asymetric
Any of the other stuff that kept the ipfs people busy for nearly three years without settling on a format
- type, length, value is all we need

Binary Encoding

Reuse the multihash format. Done. I told you this would be unexciting.

Ok, maybe there is a bit more.

To keep things unique, varints must use the shortest possible encoding (yeah, the ipfs varints are ambiguous...).

As an optimization, we can omit the length for well-known primitives. E.g. the type tag that identifies ed25519 keys does not need to give a length, as all ed25519 public keys have a length of 32 bytes. We can also omit lengths for future primitives by preassigning lengths to certain type tags. I propose the following scheme:

if tag >= 128, the bytes following the tag are a varint encoding of the length of the key
else, the length is 2 ^ (5 + (tag >> 3)) and the tag byte is followed by that many bytes of key data

This way, the 1-byte tags are split up to represent keys whose lengths are powers of two from 2 ^ 5 (32 bit, what ed25119 needs) to 2 ^ 20 (enough for large post-quantum keys), with 8 different primitives per power of two.

The only currently assigned tag is 0 for ed25519 keys.

So the overall format is <canonical_varint_tag><varint_length_if_tag_smaller_128><raw_public_key>.

Human-Readable Encoding

Each well-known primitive is assigned a short, descriptive name string consisting of alphanumeric characters. For ed25519, this string is "ed25519" (surprise!).

The human-readable encoding is <type>.<data>, where <data> is the rfc4648 Base 64 Encoding with URL and Filename Safe Alphabet encoding of the key without padding =s, and <type> is obtained as follows:

if the primitive is well-known, use its name
else, use the decimal representation of the tag, followed by "~", followed by the decimal representation of the length (even for tags below 128 - the human-readable format strives for simplicity rather than optimizing space usage)

This choice of encoding is fairly arbitrary, but it is ASCII, url safe, mostly filename save (some legacy file systems might complain about the dot or the tilde), and covers all cases.

Note that this encoding is not future-proof, since a decoder of this format might not know about a name string. Hence the next section.

Canonic Human-Readable Encoding

Same as the default human-readable encoding, but always stores tag + length rather than using names. This is actually more of a machine-readable format rather than a human-readable one, for use in URLs, URIs, file names, and other settings that disallow binary encoding, where it is important that the same key is always encoded the same way and that future-proof encoding/decoding works.