YAMF-Pubkey
Yet Another Multi Format - Public Key Edition
A future-proof binary encoding for public keys, for use in non-schemaless (schemafull?) data. Expected initial uses are in:
This is fairly unexciting, but it needs to specced out somewhere. Note that it is conceptionally distinct from encoding feed ids, as those may carry more information than just a public key.
Related work:
Requirements
- Support different cryptographic primitives
- which may only become known in the future
- of arbitrary key length
- Be efficient for machines do deal with
- Admit a human-readable representation
- Allow infering data length even if the cryptographic primitive is unknown to the implementation
- Admit a bijection between logical values (
(key_type, data)
) and valid encodings
Non-goals:
- Encode what the public key is used for
- Encode whether the crypto scheme is symmetric or asymetric
- Any of the other stuff that kept the ipfs people busy for nearly three years without settling on a format
- type, length, value is all we need
Binary Encoding
Reuse the multihash format. Done. I told you this would be unexciting.
Ok, maybe there is a bit more.
To keep things unique, varints must use the shortest possible encoding (yeah, the ipfs varints are ambiguous...).
As an optimization, we can omit the length for well-known primitives. E.g. the type tag that identifies ed25519 keys does not need to give a length, as all ed25519 public keys have a length of 32 bytes. We can also omit lengths for future primitives by preassigning lengths to certain type tags. I propose the following scheme:
- if
tag >= 128
, the bytes following the tag are a varint encoding of the length of the key
- else, the length is
2 ^ (5 + (tag >> 3))
and the tag byte is followed by that many bytes of key data
This way, the 1-byte tags are split up to represent keys whose lengths are powers of two from 2 ^ 5
(32 bit, what ed25119 needs) to 2 ^ 20
(enough for large post-quantum keys), with 8 different primitives per power of two.
The only currently assigned tag is 0
for ed25519 keys.
So the overall format is <canonical_varint_tag><varint_length_if_tag_smaller_128><raw_public_key>
.
Human-Readable Encoding
Each well-known primitive is assigned a short, descriptive name
string consisting of alphanumeric characters. For ed25519, this string is "ed25519"
(surprise!).
The human-readable encoding is <type>.<data>
, where <data>
is the rfc4648 Base 64 Encoding with URL and Filename Safe Alphabet encoding of the key without padding =
s, and <type>
is obtained as follows:
- if the primitive is well-known, use its
name
- else, use the decimal representation of the tag, followed by
"~"
, followed by the decimal representation of the length (even for tags below 128 - the human-readable format strives for simplicity rather than optimizing space usage)
This choice of encoding is fairly arbitrary, but it is ASCII, url safe, mostly filename save (some legacy file systems might complain about the dot or the tilde), and covers all cases.
Note that this encoding is not future-proof, since a decoder of this format might not know about a name
string. Hence the next section.
Canonic Human-Readable Encoding
Same as the default human-readable encoding, but always stores tag + length rather than using names. This is actually more of a machine-readable format rather than a human-readable one, for use in URLs, URIs, file names, and other settings that disallow binary encoding, where it is important that the same key is always encoded the same way and that future-proof encoding/decoding works.