You are reading content from Scuttlebutt
@Dominic %SauuZIie5mLHH2zKl1vZmBXCHI/aNNt0NjoMKDx37Vw=.sha256

nuances of implicit signing format

looking at ssb-validate because of this formats discussion with @aljoscha I'm looking at the details of the signing format, to see if there are wobbly bits we can shore up (and lay ground work towards a better format)

field order (and signature position)

currently, field order of the top level object is not specified. I did a test:

{ 'previous,author,sequence,timestamp,hash,content,signature': 480644,
  'previous,sequence,author,timestamp,hash,content,signature': 49692 }

that's nearly 50k messages with sequence before author.
But, signature is always last!

This is important, because if you inserted signature some place else, it would have a different hash but the signature would be the same! That would mean someone could send a message for someone with the signature moved, and it would seem valid, but the hash would not be the correct previous hash on the next message, so you would drop any future messages on that feed.

This would be a DoS, that we could fix by checking that the signature field is always last. Instead of deleting the signature, we should have signed were the signature would go, or require that it's last. Requiring that it's last is the thing we can do now.

I propose at the very least, rejecting messages that do not have the signature last

optional fields

Every current message has a hash field, but it's not checked. that means the current implementation would still accept messages missing the hash field.
We should make this required. But, prehaps we can take advantage of this opportunity to add a version field?

hash was included to enable upgrading the hash.

utf8 & hashes

this is the most embarassing one. before node@6 (iirc) the default encoding for crypto.createHash('sha256').update(string) is "binary" which is an alias for "latin1" (defined in https://tools.ietf.org/html/rfc1345) if only ascii characters are used, this makes no difference. but if ascii characters are used, only the first byte of each character is used in the hash. however, since the signature uses new Buffer(string, 'utf-8') they are included in the signature. Thankfully, this means the signature is not mutable, but the hash is.

As it is, this is ugly but not a problem.
We could fix it by using a hash: utf8-sha256 in the hash field. We'd have to add code for handling that, then a few months later, make that the default way to make messages. (this might be good practice for upgrading formats)

@aljoscha %SHjA6SE3Q9piu56RuAoKmaeTvgxakCWRAmbY4Un5DKw=.sha256

Ugh, this is annoying. I will write a post today about using a binary format with an implicit schema for the non-freeform (i.e. non content) fields of messages. This can be done in the most simple and efficient way if the order of these fields is fixed. Messages with the new hash prefix required for hsdt can introduce that, but I hoped the same format could apply to the rpcs that send semi-canonical json as well. But more on that later today. I woke up to a page of notes scribbled at 3 A.M, there are a lot of posts incoming for you guys today...

I propose at the very least, rejecting messages that do not have the signature last

Yes, please. This is super important for the opt-out-of-legacy-verification-details mechanism I sketched at the end of this post (and which I plan to flesh out today).

Every current message has a hash field, but it's not checked. that means the current implementation would still accept messages missing the hash field.
We should make this required

Or drop it. What value does the field have if it is not required? Is it used anywhere at all? Or does it simply duplicate information already in the multihash suffix of the signature?

But, prehaps we can take advantage of this opportunity to add a version field?

Version of what (honest question, not rhetorical)? Don't we have the hash suffixes to indicate message formats already?

Thankfully, this means the signature is not mutable, but the hash is.

Can you clarify this? As I understood you post, the signature field always uses the correct value, but cypherlinks can contain garbage. Is this correct?

@Dominic %lHk2x2ElJNcTXocbkOwkVGzOf0NpHXzz/P7XtudFEEs=.sha256

by mutable hash but not signature, I mean you could take a valid message, and change some utf8 chars and the hash was the same but the signature wasn't. Oh, sorry just realized that "mutable hash" is not the right phrase for that.

The idea for the hash field, is to enable switching to a new hash algorithm at some point. The idea is that the expressed hash is the primary id, and if you want to refer to an old message with a new hash, you'd use the old hash concatenated with the new hash. Saying "of values with this (possibly collided) insecure hash, I mean the value with this secure hash"

Now that I think about it again, this idea may have included some assumptions about how the implementation might work. With flume style database it wouldn't be that hard to just add a second key index for a new hash.

Actually, second thoughts, I think we should focus only on defining the implicit format, not changing it. "hash": "sha256" should be required (so that implementations will drop messages if they have a unknown hash). Also the key order should be defined. Note: messages may have "author" and "sequence" positions switched, but all other fields are fixed order.

@aljoscha %7k/41yCbyoHiB1MKIEEteIp7EyQK+NFIoRwOjN5G5ro=.sha256

Actually, second thoughts, I think we should focus only on defining the implicit format, not changing it.

I can agree that enforcing hash in the current format to improve consistency is a sensible choice. But if we do a complete metadata redesign for hsdt, I'd strongly prefer to drop it.

@Dominic %6Cv3kn3pjIaKKIvdtPZwTut8oaJFUsJEK67qge1ul3I=.sha256

But if we do a complete metadata redesign for hsdt, I'd strongly prefer to drop it.

ack

@cel %3nj+k7KgaNVaGUrg+l/odNsqSe6CPWHuCrUpIyQpxeg=.sha256

rejecting messages that do not have the signature last

:+1:

@Dominic %UmLEujtw+eNee97nfZlEFSzU7dBkYMOPDu42q9BOyY4=.sha256

I have a fix here: https://github.com/ssbc/ssb-validate/pull/10

btw, I'm gonna be completely offline for a week from tomorrow, I do think we should release this swiftly (after review) but I'm gonna leave it for someone else to merge.

@Dominic %A8aXizJHSn5Jm46Orbq2MBClJA5ap2YsM5b5B2C6k6M=.sha256

I merged this into ssb-validate@3.0.11 @matt @mix @andrestaltz please update any bundled releases!

@Christian Bundy %obJ+s1LODft2RVdakn8LHQ+oJEYcdoouwPAj1WWgKhU=.sha256

I've been reading a lot of posts on the issue of JSON validation and it seems intuitive to me that contant as blobs (see %31vQzkA...) would resolve most of the issue as we'd only have to deal with top-level props rather than arbitrarily nested JSON in the content property. Anyone have thoughts on whether my intuition is correct, or whether these are unrelated problems?

I'm going to try and track/wrangle this idea as #blob-content.

@mix %Aj6lpmafBcRhTczOnykhEbqIvTIPQK9Ib/6knL51OC4=.sha256

@dominic thanks, upgrading patchbay now

that's probably a fork-thread @christianbundy

Join Scuttlebutt now