You are reading content from Scuttlebutt
@aljoscha %ZOosXgUtSqt4TtTwACKyLGoHypg6THxbXOBvpQhFunQ=.sha256

SSB Message Signing is Fun!

Turns out there is a whole new class of strings for which JSON.stringify(JSON.parse(str)) === str does not hold: Node does not preserve object key order in all of these cases. The one I (or rather the fuzzer) found is JSON.stringify(JSON.parse('{"!":2,"66[[":2,"6[[":2,"":3,"0[al|":3,"6[":2,"6":266}'), null, 2), which yields (cleaned up escaped quotes for readability)

{
  "6": 266,
  "!": 2,
  "66[[": 2,
  "6[[": 2,
  "": 3,
  "0[al|": 3,
  "6[": 2
}

instead of the expected

rs: {
  "!": 2,
  "66[[": 2,
  "6[[": 2,
  "": 3,
  "0[al|": 3,
  "6[": 2,
  "6": 266
}

If a non-js implementation used the order-preserving signing encoding to compute the signature, and then sent the message over the wire to sbot, then sbot would reject it as invalid. This is not good.

I see three ways out:

  • burn everything to the ground
    • as much as I want to do this, it's not really an option
  • reverse-engineer the js behavior and make it part of the spec by disallowing to send messages over the wire where the object order might change upon the JSON.parse -> JSON.stringify roundtrip.
    • this is horrible for obvious reasons
  • change the behavior of the js implementation to use a custom, order preserving parser when decoding json messages received over the wire
    • since this only affects the transport encoding, it is not a breaking change to the core protocol
    • I've heard rumours of a rust implementation that could serve as a drop-in replacement for JSON.parse and JSON.stringify once js bindings are written

CC @Dominic @dinosaur @Piet @arj

@cryptix, @keks, @cel: Could you check how your implementations deal with this message?


A bunch of test data for implementors:

Input:

{"!":2,"66[[":2,"6[[":2,"":3,"0[al|":3,"6[":2,"6":266}

Input as hex byte array:

[7b, 22, 21, 22, 3a, 32, 2c, 22, 36, 36, 5b, 5b, 22, 3a, 32, 2c, 22, 36, 5b, 5b, 22, 3a, 32, 2c, 22, 22, 3a, 33, 2c, 22, 30, 5b, 61, 6c, 7c, 22, 3a, 33, 2c, 22, 36, 5b, 22, 3a, 32, 2c, 22, 36, 22, 3a, 32, 36, 36, 7d]

Nodejs signing output:

{
  "6": 266,
  "!": 2,
  "66[[": 2,
  "6[[": 2,
  "": 3,
  "0[al|": 3,
  "6[": 2
}

Nodejs signing output as hex byte array:

[7b, a, 20, 20, 22, 36, 22, 3a, 20, 32, 36, 36, 2c, a, 20, 20, 22, 21, 22, 3a, 20, 32, 2c, a, 20, 20, 22, 36, 36, 5b, 5b, 22, 3a, 20, 32, 2c, a, 20, 20, 22, 36, 5b, 5b, 22, 3a, 20, 32, 2c, a, 20, 20, 22, 22, 3a, 20, 33, 2c, a, 20, 20, 22, 30, 5b, 61, 6c, 7c, 22, 3a, 20, 33, 2c, a, 20, 20, 22, 36, 5b, 22, 3a, 20, 32, a, 7d]

Order-preserving output:

rs: {
  "!": 2,
  "66[[": 2,
  "6[[": 2,
  "": 3,
  "0[al|": 3,
  "6[": 2,
  "6": 266
}

Order-preserving output as hex byte array:

[7b, a, 20, 20, 22, 21, 22, 3a, 20, 32, 2c, a, 20, 20, 22, 36, 36, 5b, 5b, 22, 3a, 20, 32, 2c, a, 20, 20, 22, 36, 5b, 5b, 22, 3a, 20, 32, 2c, a, 20, 20, 22, 22, 3a, 20, 33, 2c, a, 20, 20, 22, 30, 5b, 61, 6c, 7c, 22, 3a, 20, 33, 2c, a, 20, 20, 22, 36, 5b, 22, 3a, 20, 32, 2c, a, 20, 20, 22, 36, 22, 3a, 20, 32, 36, 36, a, 7d]
User has not chosen to be hosted publicly
@cryptix %sAFqnoYgwIJh6LUmv0PQyTQmrfTSdMC8zhZs0lk231g=.sha256

I made this test to check it, and yess. It produces the output you expected.

func ExampleEncode() {
    b := []byte(`{"!":2,"66[[":2,"6[[":2,"":3,"0[al|":3,"6[":2,"6":266}`)
    out, err := EncodePreserveOrder(b)
    fmt.Println("Err:", err)
    fmt.Println(string(out))
    // Output: Err: <nil>
    //{
    //   "!": 2,
    //   "66[[": 2,
    //   "6[[": 2,
    //   "": 3,
    //   "0[al|": 3,
    //   "6[": 2,
    //   "6": 266
    // }

}

but like keks said, we made an encoder specifically to preserve order.

@Dominic %PTfhJQBitYFEJOZ6SHG6Ltx1KWv7wznoOJssKdHxPe0=.sha256

correct. natural numbers (positive integers and zero) are first, so "6" is first, because it's a valid integer. the rest of those keys are not valid integers:

numbers that are not valid natural numbers are encoded as strings and lexiographically sorted.

> JSON.parse('{"-1": -1, "0.1": 10, "1.0": 10, "1": 0, "0": 1}')
{ '0': 1, '1': 0, '-1': -1, '0.1': 10, '1.0': 10 }
@aljoscha %i8kZo3h3awoj/t/aY33fdLtcz0fDOUfU3f6VeOrmfvo=.sha256

@Dominic

natural numbers (positive integers and zero) are first, so "6" is first, because it's a valid integer. the rest of those keys are not valid integers:

numbers that are not valid natural numbers are encoded as strings and lexicographically sorted.

The output on my test message is definitely not lexicographically sorted for the non-integer keys. Can you point to a source for the deserialization order you claimed? Is it part of the ECMAScript spec, or just an implementation detail of node? Does node guarantee that behavior, or might it change?


About the scope of this issue: This is not part of the core message format, instead this is a detail of the rpcs sbots use to send messages to each other. But in practice, this forces signature computation to use the exact same object entry ordering, or all sbots will reject the message, since they will compare the message's claimed signature against the signature they got by using the ordering emitted by JSON.stringify(JSON.parse(str), null, 2). If we don't change the sbot rpcs but don't understand the entry ordering procedure either, it becomes impossible to write non-js implementations (and the js one could break on us with any new node version, at which point the scuttleverse would fork).

We really should change the rpcs to use a decoding procedure we actually understand. And since we need to support those legacy messages for which we do not understand the entry order, we should let is simply use the entry-order of the input when computing the signing format.


This is a pretty serious issue, can I please get back an ack that you are fully aware of the consequences @Dominic? Your previous post reads like you don't see a problem here.

CC @dinosaur, @cryptix, @keks, @cel

@aljoscha %imnlJPrxalmcfCiKOVWHRMdBjxYodSbRkPJynL90+rQ=.sha256

I think the best way forward might be to deprecate all rpcs that use json encoding for message transfer, specify and implement order-preserving cbor-encoding for legacy messages, and provide new rpcs that use the cbor encoding. A few months after those have been rolled out, all the rpcs that send json-encoded messages can be removed.

If this issue forces us to deprecate and eventually delete those messages anyways, there's no real benefit in replacing them with order-preserving json, we can directly go to the cbor subset.

If we go for cbor-encoded messages, we can also do a compact encoding of the metadata rather than putting them into a self-describing format. So this involves some more design work than just doing a 1-to-1 conversion between json and cbor. But this is what I would have been working on soon anyways. I'd simply postpone the clean-up work (and completely drop the json-encoded metadata if we remove them anyways), and specify and implement the full cbor message handling instead.

@aljoscha %ZEJe7nZSE0f/uQzUTsQNbmzH+/pcNSSTZm58XfZxGJA=.sha256

Drafted a proposal for compact legacy metadata that can be converted back into json legacy metadata for signing encoding computation: %QTFcrr9...

@Dominic %fyQlMJ0MJxJOZoHhk6mjTYtyp/5mr6Gg9Dn670kTZRs=.sha256

@aljoscha sorry! I got some neurons crossed there, when I said "lexiographic order" I should have said "insertion order", I was pretty sleep deprived when I wrote that

@Dominic %sINR9l55abiivXi7+csVY+oROtRzgwhkK8eivgZBcns=.sha256

@aljoscha also I think you are greatly overstating the risk that the order of JSON fields will change. Firstly, that's not part of node.js, it's a part of V8, who are both more conservative and more competent.
They won't change it because it's been this way for ages and they'll break the web.

@aljoscha %vn1IyTU5EJTvfpmUPIpkbWCGYwgpJjHX85f4VNhNIro=.sha256

@Dominic

I got some neurons crossed there, when I said "lexiographic order" I should have said "insertion order"

Ok, that actually works. Input order is preserver, except all keys that are either 0 or match /[1-9][0-9]*/ are moved to the top, sorted lexicographically among themselves. Just implemented this, and the rust impl now passes the full test set.

@aljoscha also I think you are greatly overstating the risk that the order of JSON fields will change. Firstly, that's not part of node.js, it's a part of V8, who are both more conservative and more competent.
They won't change it because it's been this way for ages and they'll break the web.

Fair enough, but I'd still prefer if sbot explicitly implemented that behavior rather than relying on unspecified details, even if they realistically won't change. I'll clean up the rust implementation next, and then I can create js bindings as (nearly) drop-in replacements for JSON.stringify and JSON.parse, with the ordering semantics ssb needs.

Maybe you could already think about how those could be switched in, and whether/how to benchmark the performance impact. I honestly don't know whether they should be faster or slower than v8's JSON handling.

The js interface of the bindings fill be slightly different from JSON.foo: The parsing function won't support revivers, and the serialization function will come in two, unconfigurable varieties: One for whitespace-less encoding, and one for ssb's signing-format encoding.

@aljoscha %lAdPzOAshFmzhhJVzXZpfytv/fBCmd1XkpNdEorCiIg=.sha256

Ok, that actually works. Input order is preserver, except all keys that are either 0 or match /[1-9][0-9]*/ are moved to the top, sorted lexicographically among themselves. Just implemented this, and the rust impl now passes the full test set.

I lied, they are sorted by length first, using lexicographic order as a tie-breaker.

JSON.stringify(JSON.parse('{"300": 0, "20": 0, "1": 0, "200": 0, "250": 0, "201": 0, "30": 0}'), null, 2)

"{
  \"1\": 0,
  \"20\": 0,
  \"30\": 0,
  \"200\": 0,
  \"201\": 0,
  \"250\": 0,
  \"300\": 0
}"
@Anders %NOSuJbsMldlqiURriZzMCTltYllOUvWxKH/yZdFHshw=.sha256

@Aljoscha I have done a loooooooooot of json benchmarking over the last couple of weeks, so please ping me once you have something to test.

@Dominic %ekaV/dO2QwmfDQyfQq7HtKPaFeOCvs2YgYKYFgOJoks=.sha256

@aljoscha

I lied, they are sorted by length first, using lexicographic order as a tie-breaker.

in other words, they are numerically sorted.

@Dominic %jKI0Fl9KQKulVnvUMB7IRIaywixWC9AvzGY96AVF4JY=.sha256

@aljoscha it will indeed be interesting to see whether the js or rust is faster. we don't use reviver functions anywhere, so that is fine.

@cel %0sKmKyvWwM2/Qwh5ItbcMhglNSosB9scZ62Y2kF+p8g=.sha256

So this means that semicannonical-json is not just about the encoding of some values but also about the order of keys in some cases.

My implementation does not yet handle messages. But when it does, I was planning for message verification to just validate the signature on the message in its received encoding with whitespace expanded (and verify sequence number + previous id). If/when I implement signing/publishing of user-specified message content, I would probably then implement a function to detect and reject non-semicannonical-json. If I am feeling generous I could make it semicannonicalize JSON during publish. It is advantagous for implementations to agree on which messages are invalid: for example, @cft was able to recover their forked feed because no peers considered the messages on the losing fork valid ( %TPs53tQ...). But it is mostly on the publishers/signers to get it right I think.

@aljoscha %onEYdS5lX014Oj8fLzDYf599JIxuPwwj5P0CAe6y/tA=.sha256

@cel You might like the spec I am drafting, it contains all that info.

@cel %zBsDQtQ8nFs3N+Rc2U4zsmCIlMF8j8b6zF//em2L7Bs=.sha256

Blobbified: spec.md.html

@Dominic %7QY10rXTfLAasDFoRIl/hJMTaoscVVjb9WtGtFe9GRk=.sha256

@aljoscha I'm sorry to be the bearer of bad news, but I just realized that the rule about integer keys being numerically ordered only applies to ints, not integers.

> obj = {}
{}
> obj[Math.pow(2, 44)] = 1
1
> obj[Math.pow(2, 43)] = 2
2
> obj[Math.pow(2, 31)] = 3
3
> obj
{
  '2147483648': 3,
  '17592186044416': 1,
  '8796093022208': 2
}

but it gets worse... it's actually only integers from 0 - 0xfffffffe!
the 2^32-2! 0xffffffff, 4294967295 is the first integer key treated as a string.

However, I confirmed this in both firefox and chrome, weird but seems to be intended.

@aljoscha %PNnTRjvfQ1uoYX8VuxNbS3k6rSRZ0rXBZozekwIn4Ik=.sha256

Wow.

At least this is easy to fix... Until we run into the next exception =/

@Dominic %zwh+tGdcbq2M7y3mcDs4DNSc7bp6cgWz0YBdAtatw0M=.sha256

Surely the rate of exception discovery is decreasing.

This one sort of makes sense, I mean, I can imagine how this happened.
Although, not sure why 0xffffffff isn't allowed. surely that < could have been a <= ?

@aljoscha %cqSExpmrBcq+Rf1a5rRGuZ2kcoaMTw2ZN8u2DM2l2FY=.sha256

Updated the spec (and the rust implementation):

  • The order in which to serialize the entries s_i: v_i is not fully specified, but there are some constraints:
    • intuitively: Natural numbers less than 2^32 - 1 are sorted ascendingly
    • formally:
      • a key is called a more-or-less integer key if it is either the string "0", or a nonzero decimal digit (1 - 9 (0x31 - 0x39)) followed by zero or more arbitrary decimal digits (0 - 9 (0x30 - 0x39)) and consists solely of such digits
      • a key is called an int key if it:
        • is a more-or-less integer key
        • its numeric value is strictly less than 2^32 - 1
          • no, there's no off-by one error here: 0xfffffffe is an int key, whereas 0xffffffff is not
          • Why? We have no idea either
      • all entries with int keys must be serialized before all other entries
      • among themselves, the int keys are sorted:
        • by length first (ascending), using
        • numeric value as a tie breaker (the key whose raw bytes interpreted as a natural number are larger is serialized later)
          • note that this coincides with sorting the decimally encoded numbers by numeric value
    • all other entries may be serialized in an arbitrary order
@Dominic %sau0bx15RvoqPwbmFAr/j+UhmVPofnzNSXbQyqAV/Ag=.sha256

seems that there isn't agreement on whether 'natural numbers" includes zero, https://en.wikipedia.org/wiki/Natural_number "whole numbers"?

@Dominic %e3680R31uYDcA9IGgdAGnUPZ+MegrLUwVAkkBxZllwg=.sha256

update: javascript now has an officially specified key order, including 0->2^32-1 keys, and insertion for string keys. http://ecma-international.org/ecma-262/#sec-ordinaryownpropertykeys

@aljoscha %fe3LA76XocgNRsZkqfbRrwWK7IFjzOe2o0tNsTrQ6Ic=.sha256

Looks like that ecma spec treats numbers up to 2^53 - 1 as integers though (which makes a twisted sort of sense, up to 2^53 - 1, 64 bit floats can represent all integers). I'll look into that tomorrow and change the spec/rust impl again if necessary. But note that this order is not necessarily the iteration order of a for in loop, so it might not apply to our situation.

Join Scuttlebutt now