Turns out writing is hard and takes time. For anyone willing to read it, here's a somewhat structured brain-dump:
Self-Describing Message Schemas for Scuttlebot
This post explores a way of building self-describing message schemas in the scuttleverse.
Background
Scuttlebot is a protocol for building up an eventually consistent peer-to-peer database. Unlike related projects (e.g. the IPFS), each entry in the data store is tied to a user identity. Data is added to the store in the form of messages. Each message contains some metadata, a type string, and arbitrary data. The data may contain links to other messages. The database automatically creates indices based on these links, the resulting message graph is bidirectional.
For a developer, the primary interface to the database thus consists of the following operations:
- add a message to the database, signed by the current user
- retrieve messages of a certain type
- retrieve messages by a certain user
- look up messages referenced by a certain message
- look up messages referencing a certain message
- the expected mechanisms to filter data and to be notified of new data
The resulting ecosystem of applications has some interesting, unique properties. All data in the scuttleverse inhabits the same graph and is tied together by persistent user identities. This way, any application can use data produced by any other application. For example, an application might combine status updates à la twitter and commits to git repositories into a single newsfeed.
As a consequence of this, any application is free to interpret messages as it sees fit. There is no authority on what messages to produce, or how to consume them or present them to the user. Interoperability happens because several people interpret messages in similiar ways, not because a grand authority decides on how things are supposed to work.
Going Meta: Messages Describing Messages
- need for some structure (why?)
- given a message without context, do something with it
- type field
- problems
- conflicts
- names as resource
- difficult to find information on given message type
- primitive
- opportunities:
- tooling, linkability
- documentation
- gather information about ecosystem
- client-recommendations
- type information
- serialization
- automatic API negotiation
- plan interoperability
- requirements (what?)
- powerful (should allow everything listed above, and anything one might come up with later)
- subjective
- flexible/evolutionary
- future-proof
- how?
- as simple as possible
- minimal setup to allow self-description
- type can be a hash
- add meta-messages, and use their hashes as the types of normal message
- one special meta-message that is its own instance: %et1Dc8i...
- no assumptions about what self-description should look like
- meta-messages don't need any fields
- but we can add a human-readable name just for good measure
- optional and no predescribed functionality
- certainly useful for building interfaces
- it might turn out to be useful to add more information on meta-messages, who knows
- doing something useful: post descriptive messages which link to a meta-message
- what to post?
- I don't care. Anything useful
- free to experiment: everything is nothing but messages (which may be ignored by everyone)
- who posts?
- one approach: represent message consumers and producers (clients) as messages themselves
- a usage description could link to a meta-message, a client-message, and contain some other information, e.g:
- a human-readable description of how the client deals with the message
- or which library it uses for dealing with the message
- or a structured representation of the fields and the types in the message
- and so on
- a usage description could link to a meta-message, a client-message, and contain some other information, e.g:
- one approach: represent message consumers and producers (clients) as messages themselves
- trust in these claims?
- votes on these messages?
- trust the author of the message?
- different versions of clients might do different stuff
- tie client identity to git-ssb repositories?
- what to post?
Implementation
- not protocol-level, can be organically adopted (or left to die)
- create meta-messages for the currently used message types
- ideally, apps would start using the corresponding hashes for their produced messages
- apps should consume both the old, stringly-typed messages and the new hashly-typed ones
what about protocol-relevant messages?
- pub
- contact
implement a simple cli for querying meta-messages
- start adding some information, e.g. human-readable documentation
- no need to enforce this tough, it's enough if people turn to this once they have genuine need for some message metadata