Extending the ssb message format
Before you can talk about encodings of ssb messages, you need to know what to encode. Currently, the set of values in ssb messages is the following:
null
(the single value of the unit type, representing absence of information)- a boolean (
true
orfalse
) - a utf8 encoded string (may include null bytes)
- an IEEE 754 double precision floating point number, except the
NaN
s - an ordered sequence of values, called an array, not necessarily homogenous
- an unordered mapping from strings to values, called an object. An object may not contain the same key multiple times. Values are not necessarily homogenous
Introducing a new encoding can serve as an opportunity to also extend that set of values. I invite everyone to think about how and which additional data types can improve the experience for people developing on top of the ssb protocol. CC #ssb-show-and-tell, #ssb-learing and #ssb-grants for visibility to devs working with ssb.
Adding a bunch of features makes ssb more complicated though, so features should be well-justified. Adding support for complex numbers probably isn't worth it.
Another consideration is how ssb-client APIs can expose the data to the user. In statically typed languages, there won't be any problems, you just define a type for everything. In dynamically typed languages like js, this is a different story. The current set of message values has been chosen because it has a direct mapping to built-in javascript types. Adding new types without direct js equivalent will make the js API feel less natural.
On the one hand, ssb is currently tightly coupled to the flexible and rapid programming experience this allows. Adding e.g. various fixed-size integers would feel off in that environment. On the other hand, ssb is not tied to js in particular. In statically typed languages, the lack of different integer types (any integer type at all) is baffling and unreasonable.
I think adding binary strings to the message format would be worth it. Currently, you need to use base64 encoded strings for binary data in messages. The js API can simply hand buffer objects to consumers, no need to introduce non-standard types.
Integers might be more controversial. So far we did fine without integers (floats can represent any signed 32 bit integer anyways), but not having any integer type might seem absurd to anyone not from js-land. I currently lean towards including a single integer type (signed, 64 bit), unless the js crowd surprises me by advocating for the full set (8, 16, 32, 64 bit, signed and unsigned).
There is a big caveat with integers though: The js API can not parse them into normal js numbers.
Js numbers can store signed 32 bit integers without rounding errors. But if storing integers as floats (as js does), parsing the integer 2 and the float 2.0 would result in the same logical value, which would then have to be encoded the same way. So we can't do that. Adding integers would thus require js implementations to store them nonidiomatically. On the plus side, that means we could do 64 bit integers. And also, having different runtime representations for integers and floats would be idiomatic in most other languages.
Other data types to consider are sets (which are currently represented as arrays in many message types), and arbitrary precision numbers, bot integers and rationals. But I'm most interested in your opinions on integers, as well as sensible data type I completely missed.