You are reading content from Scuttlebutt
@aljoscha %o+3XFaGO6iSgnW2EEO946rZB5FaF2S1HNmro/lcVnR0=.sha256
Re: %Pnas64BJ6

More spec remarks, putting on a deeply black hat, so be warned:

author: ssb-bfe-encoded buttwoo feed ID, an ed25519 public key

The bipf spec heavily implies that its compound values have to contain further bipf data ("with schemaless json-like semantics", "OBJECT : 5 (101) // sequence of alternating bipf encoded key and value"), but it is ambiguous about arrays ("ARRAY : 4 (100) // sequence of any other value"). Assuming arrays must also contain bipf values, you cannot put an ssb-bfe-encoded feed ID into your top-level bipf array. So do I have to wrap these in an bipf buffer (of tatically known length, nonetheless)? At the very least, this needs clarification in both specifications.

parent: ssb-bfe-encoded buttwoo message ID used for subfeeds. For the top feed this must be BFE nil.

See above.

Both author and parent must use redundant encoding, for example: why is the author ssb-bfe encoded, if you already know it has to be a buttwoo feed ID? Is an optimizing implementation allowed to just ignore the useless bytes and just look at the key instead? Or do I still have to check that they contain the only valid - i.e., completely pointless - byte pattern? This opens up room for mistakes or just peers that sow chaos, for no benefit.

sequence: number for this message in the feed

Which data type is this? How is it encoded? "number" is not a bipf type.

timestamp: integer representing the UNIX epoch timestamp of message creation

This is a bipf int (32 bit) I assume? Considering that author/parent are not bipf values either, this should be specified. Any reason to not go for 64 bit? 32 bit run out in 16 years, your crypto primitives hopefully last longer.

tag: a byte with extensible tag information (the value 0x00 means a standard message, 0x01 means subfeed, 0x02 means end-of-feed). One can use other tags to mean something else. This could be used to carry for example files as content.

Please specify what an implementation must do with unknown tags.

contentLength: the length of the bipf-encoded content in bytes

hash: concatenation of 0x00 with the blake3 hash of the bipf-encoded content bytes

This contradicts that content is not necessarily bipf-encoded.

If the spec mandates a 0x00 byte at the start of the hash, then an implementation has to reject everything that does not have that byte. Any change that allows other starting bytes would be a breaking change, i.e., a whole new format. So as the spec is currently written, this byte is completely redundant and should be removed from the spec.

The content is a free form field. When unencrypted, it SHOULD be a bipf-encoded object.

Please specify what implementation must do when it is not a bipf-encoded object.

A buttwoo message consists of a bipf-encoded array of 3 fields:

Metadata must be an bipf encoded array of 8 elements:

The only unknown about the length of a message is the length of the metadata. Since the metadata starts with its length, storing the message as a bipf array is redundant, you could simply concatenate metadata, signature and content instead. An efficient limitation would ignore the first bite of the message encoding - but it has to verify that the length is correct. This byte is reduntant and forces reduntant computations (branching even) on all implementations.

Similarly, the length of the metadata array is only influenced by whether the parent is null is not. Should you really encode this by prefexing the metadata with two different, fairly large, arbitrary looking integers? Especially since the parent encoding contains the information about whether it is null again.

Overall, the information whether the parent is null is stored in three different places, in three different ways, and while only one place needs to be checked to function correctly, the consistency of all three locations has to be verified. This violates DRY in a pretty horrible manner.


Also, just out of curiosity: did you condiser VarU64 for bipf, and if so, why did you choose LEB128 above it. Adoption/implementation availability, or design reasons? CC @Dominic (?)

@Anders %bMYC966xWTKcagqz37wQJsomz4nB8KELEUFGzo8BMMw=.sha256

@Aljoscha good input. Thanks

So do I have to wrap these in an bipf buffer (of tatically known length, nonetheless)

Yes. One could have just directly concatenated the bytes of the values, it would save a few bytes, but since we would like to use bipf for content anyway I wanted to keep this uniform.

Or do I still have to check that they contain the only valid - i.e., completely pointless - byte pattern?

Yes. I would like that the values makes sense on their own. This format is not designed to be as minimal as possible. #tinyssb is a good example of pursuing that.

sequence: number for this message in the feed
Which data type is this? How is it encoded? "number" is not a bipf type.

Thanks that is good input, I have clarified that now.

timestamp: integer representing the UNIX epoch timestamp of message creation
This is a bipf int (32 bit) I assume?

That was a bug in the spec. Its encoding as a double. Fixed now. Thanks

Please specify what an implementation must do with unknown tags.

Good input.

Any change that allows other starting bytes would be a breaking change, i.e., a whole new format

Or a minor update to support a new hash

The content is a free form field. When unencrypted, it SHOULD be a bipf-encoded object.
Please specify what implementation must do when it is not a bipf-encoded object.

Good input.

@aljoscha %GqnZ1w0tYBEzMDNK7ioRTC/3qTKbM5YTmAtK+8frceE=.sha256

@arj

Or a minor update to support a new hash

Is it really minor though? If it mandates implementations to treat certain messages different than before the update, then all old implementations that have not implemented the update yet suddenly do not conform to the specification anymore. That's a breaking, i.e., major change as I see it; new behavior is not a strict superset of old behavior, but old behavior has to change.

At the very least, you have to very carefully specify how to handle as-of-yet unsupported hashes, and make sure that not knowing some hash format still leaves an implementation conformant to the specification. Following that argument, an implementation that does not know about any hash format should also be conformant (albeit useless). Otherwise, adding hashes clearly is a breaking change.

Join Scuttlebutt now