More spec remarks, putting on a deeply black hat, so be warned:
author: ssb-bfe-encoded buttwoo feed ID, an ed25519 public key
The bipf spec heavily implies that its compound values have to contain further bipf data ("with schemaless json-like semantics", "OBJECT : 5 (101) // sequence of alternating bipf encoded key and value"), but it is ambiguous about arrays ("ARRAY : 4 (100) // sequence of any other value"). Assuming arrays must also contain bipf values, you cannot put an ssb-bfe-encoded feed ID into your top-level bipf array. So do I have to wrap these in an bipf buffer (of tatically known length, nonetheless)? At the very least, this needs clarification in both specifications.
parent: ssb-bfe-encoded buttwoo message ID used for subfeeds. For the top feed this must be BFE nil.
Both author and parent must use redundant encoding, for example: why is the author ssb-bfe encoded, if you already know it has to be a buttwoo feed ID? Is an optimizing implementation allowed to just ignore the useless bytes and just look at the key instead? Or do I still have to check that they contain the only valid - i.e., completely pointless - byte pattern? This opens up room for mistakes or just peers that sow chaos, for no benefit.
sequence: number for this message in the feed
Which data type is this? How is it encoded? "number" is not a bipf type.
timestamp: integer representing the UNIX epoch timestamp of message creation
This is a bipf int (32 bit) I assume? Considering that author/parent are not bipf values either, this should be specified. Any reason to not go for 64 bit? 32 bit run out in 16 years, your crypto primitives hopefully last longer.
tag: a byte with extensible tag information (the value 0x00 means a standard message, 0x01 means subfeed, 0x02 means end-of-feed). One can use other tags to mean something else. This could be used to carry for example files as content.
Please specify what an implementation must do with unknown tags.
contentLength: the length of the bipf-encoded content in bytes
hash: concatenation of 0x00 with the blake3 hash of the bipf-encoded content bytes
This contradicts that content is not necessarily bipf-encoded.
If the spec mandates a 0x00 byte at the start of the hash, then an implementation has to reject everything that does not have that byte. Any change that allows other starting bytes would be a breaking change, i.e., a whole new format. So as the spec is currently written, this byte is completely redundant and should be removed from the spec.
The content is a free form field. When unencrypted, it SHOULD be a bipf-encoded object.
Please specify what implementation must do when it is not a bipf-encoded object.
A buttwoo message consists of a bipf-encoded array of 3 fields:
Metadata must be an bipf encoded array of 8 elements:
The only unknown about the length of a message is the length of the metadata. Since the metadata starts with its length, storing the message as a bipf array is redundant, you could simply concatenate metadata, signature and content instead. An efficient limitation would ignore the first bite of the message encoding - but it has to verify that the length is correct. This byte is reduntant and forces reduntant computations (branching even) on all implementations.
Similarly, the length of the metadata array is only influenced by whether the parent is null is not. Should you really encode this by prefexing the metadata with two different, fairly large, arbitrary looking integers? Especially since the parent encoding contains the information about whether it is null again.
Overall, the information whether the parent is null is stored in three different places, in three different ways, and while only one place needs to be checked to function correctly, the consistency of all three locations has to be verified. This violates DRY in a pretty horrible manner.
Also, just out of curiosity: did you condiser VarU64 for bipf, and if so, why did you choose LEB128 above it. Adoption/implementation availability, or design reasons? CC @Dominic (?)