You are reading content from Scuttlebutt
@aljoscha %f7tqoQpSfPXt+poGLGlI27DFqRXI2BsaC1CPhy4Rak8=.sha256

Extending the ssb message format

Before you can talk about encodings of ssb messages, you need to know what to encode. Currently, the set of values in ssb messages is the following:

  • null (the single value of the unit type, representing absence of information)
  • a boolean (true or false)
  • a utf8 encoded string (may include null bytes)
  • an IEEE 754 double precision floating point number, except the NaNs
  • an ordered sequence of values, called an array, not necessarily homogenous
  • an unordered mapping from strings to values, called an object. An object may not contain the same key multiple times. Values are not necessarily homogenous

Introducing a new encoding can serve as an opportunity to also extend that set of values. I invite everyone to think about how and which additional data types can improve the experience for people developing on top of the ssb protocol. CC #ssb-show-and-tell, #ssb-learing and #ssb-grants for visibility to devs working with ssb.

Adding a bunch of features makes ssb more complicated though, so features should be well-justified. Adding support for complex numbers probably isn't worth it.

Another consideration is how ssb-client APIs can expose the data to the user. In statically typed languages, there won't be any problems, you just define a type for everything. In dynamically typed languages like js, this is a different story. The current set of message values has been chosen because it has a direct mapping to built-in javascript types. Adding new types without direct js equivalent will make the js API feel less natural.

On the one hand, ssb is currently tightly coupled to the flexible and rapid programming experience this allows. Adding e.g. various fixed-size integers would feel off in that environment. On the other hand, ssb is not tied to js in particular. In statically typed languages, the lack of different integer types (any integer type at all) is baffling and unreasonable.

I think adding binary strings to the message format would be worth it. Currently, you need to use base64 encoded strings for binary data in messages. The js API can simply hand buffer objects to consumers, no need to introduce non-standard types.

Integers might be more controversial. So far we did fine without integers (floats can represent any signed 32 bit integer anyways), but not having any integer type might seem absurd to anyone not from js-land. I currently lean towards including a single integer type (signed, 64 bit), unless the js crowd surprises me by advocating for the full set (8, 16, 32, 64 bit, signed and unsigned).

There is a big caveat with integers though: The js API can not parse them into normal js numbers.

Js numbers can store signed 32 bit integers without rounding errors. But if storing integers as floats (as js does), parsing the integer 2 and the float 2.0 would result in the same logical value, which would then have to be encoded the same way. So we can't do that. Adding integers would thus require js implementations to store them nonidiomatically. On the plus side, that means we could do 64 bit integers. And also, having different runtime representations for integers and floats would be idiomatic in most other languages.

Other data types to consider are sets (which are currently represented as arrays in many message types), and arbitrary precision numbers, bot integers and rationals. But I'm most interested in your opinions on integers, as well as sensible data type I completely missed.

@aljoscha %IMRY4WM/GqpmfGtnZEc/Ha7tbWuUW0vtpCjsyGp5Feo=.sha256

Here's another potential extension I did not mention here yet: Maps with arbitrary keys, not just strings. A nice thing about them is that we could emulate sets with them, by using null as the entries. Sets are great, because you can quickly test whether something is in them. In ssb messages, we only have that capacity for strings, by doing objects with null values. But arbitrary-key objects generalize sets, and could thus replace them. And of course, they are useful on their own.

Cbor maps already support arbitrary keys, so this would not change the encoding at all. But then the js api could not use objects any more, but would have to use Maps. That would be a breaking change to the ssb-client js module. Would that be acceptable? I personally think that the protocol design should not be restricted because some implementation would need to change it's api.

@aljoscha %I97qRXW3GuTaAyl0Vh1R1meRx18Jjlcedm0QFdi3Qho=.sha256

And another idea: a special type for cypherlinks.

These are special enough that a decoder should be able to recognize them and give them their own runtime type. Js currently just uses strings and needs awkward, text-based functions for detecting them and extracting data.

@Dominic %qSmbcexCBt2QGlu4ro9gvpTpvTxVe4g4cip8qn8ix14=.sha256

You could interpret the values represented as base64 as a binary type.
they all have a type suffix. .sha256 .ed25519 .box and .sig.ed25519'. Types that are ids, i.e. refer to something, also have a sigil.&for blobs,@for feeds, and%` for messages. That seems to be enough to parse into a static type.

A binary encoding that couldn't encode binary strings would be silly. I think the solution is to have binary strings that are length delimited without escapes (were as utf8 strings need to have escapes). These could start with a varint to indicate the type. After that whatever binary data represents that type. multiformats would be a good option here, I know using this would make the ipfs people happy.

A transitional format would have a few additional restrictions. I think you'd just use a subset of the expanded format, so that binary types would be restricted to the tags currently in use.

@aljoscha %nPJPwr0g15RzzdhPjT1chd+HekmiWaj/jAi90QvG7ZY=.sha256

Cbor and hsdt already have binary strings. I was pondering not representing cypherlinks as those, but via dedicated types.

@Dominic %d56qOTD9ddiw8DrTmb+HHgK4zoQMksI9+5ezIVXe+k0=.sha256

@aljoscha dedicated types in what name space? I'm okay with this as long as the object as a whole can still be parsed by something that does not understand the application specific types.

something like:

<extended_binary_type, length>: <extended_type_tag><extended_type_data>

where <extended_type_tag> is a varint or something like that.

@Dominic %1SzX8UqabnPEE8ORfo/pv3/u6zdIZxH4m2uxj/sy9nk=.sha256

@aljoscha very keen for back pressure rpc though!

@aljoscha %+4td00hm6klxesp9Lt8TgLbxKztabPABvypUvv8na9w=.sha256

dedicated types in what name space?

Dedicated types in the logical data model. Imagine json would suddenly gain support for integers. These integers would then have the same standing as the old data types (null, boolean, float, string, array, object). Unlike json where this can't happen, we can do this for hsdt. What I'm asking for in this thread: What types of things other than null, booleans, floats, strings, arrays, and objects should application devs have at their disposal?

Not thinking about this means accepting that the types provided by json form the best possible set of types, and everyone should only use those to build applications. Ssb currently restricts devs to them. But pretty much every developer who did not grow up in js-land would look at this at say "Wait a second, the fundamental datatype of every machine is the integer. A large subset of algorithmic problems consists of manipulating integers. Do you seriously expect me to build apps without integers? Sorry, but I'm leaving for ipfs".

There are some concerns with introducing new types, but for now I'd just like to survey which types might be sensible to have. Well, and I'd like to lobby for the inclusion of integers.

@ansuz %DCi/XiixGL3Fp8LlDC0ugifpJbpneI8N+FoyhuhWItQ=.sha256

Maybe I'm missing some context here and I shouldn't jump in, but doesn't JSON support integers (technically numbers, of which integers are a subset).

JSON.stringify({x: 5}); // '{"x":5}'

As per json.org they fall into the value category.

@ansuz %ezVgMxRO0Ne5kH1nQzClzT5omg0eYcA2w5nI6hjQWDs=.sha256

Oh, yea, you listed floats. Don't mind me!

@aljoscha %kXrRuCJFJlKoylDvStKuwbDrar9bIdsX3t+hNqp82bg=.sha256

Note to self: What about 16 and 32 bit floats? Cbor supports them, would be trivial to add to hsdt. Don't know about database considerations. And whether we'd actually want them.

@aljoscha %urfag1KCw+gqNawrG5PigBmwgZjWAXwrBN4UOwYCIWs=.sha256

@Dominic @arj @Piet @keks @cryptix

Can you give an estimate on the additional effort it would take for your databases to support more values than just those provided by json? I'm especially interested in the sbot side of things, so we can decide whether to roll out new data types with the initial hsdt release.

I'm currently leaning towards the following additions:

  • signed and unsigned 8, 16, 32, 64 bit integers (all supported by cbor)
  • 32 bit floating point numbers (supported by cbor)
  • ssb cypherlinks (not supported by cbor, but can be encoded via unassigned cbor primitives))

These give us the number types any modern statically typed programming language provides, and special treatment for cypherlinks seems appropriate since they have special meaning for the core protocol. None of these are inherently complicated, they are just simple pieces of data with a fixed length.

Dropping cypherlinks would make this a strict subset of cbor, but canonicity requires specialized implementations anyways, so we might as well add them. I will write a high-quality (robust to malicious input, handles out-of-memory gracefully, supports partial input/output) C library implementation for hsdt in any case, that can be reused by other implementations. So I don't think departing from cbor is going to be a large drawback.

Join Scuttlebutt now