You are reading content from Scuttlebutt
@aljoscha %ScgGMfSbd4WXBtH7ry+pjaaxMFF4GRMdf5ddWdM/uE0=.sha256

BPMUX (back-pressure multiplexing)

@Dominic:

@aljoscha very keen for back pressure rpc though!

I guess teasing about an alternative message format and then not giving any information about it would be somewhat mean. So here are some notes on the protocol.

At its core, there's an abstract specification of different communication patterns, single unidirectional messages, request/response exchanges, sinks, streams, duplexes (combined sink + stream). Everything involves backpressure, data can only be transmitted once the receiver has signaled that it wants to receive data. This abstract specification determines the API exposed to the programmer.

There can be different concrete implementations for these APIs, depending on the physical (or logical) network over which the multiplexing happens. So far I've only specced out bpmux/rel, which implements the abstract specification on a bidirectional, reliable, ordered communication channel (i.e. what you get with unix domain sockets or tcp). But there could be implementations for infrastructure other than reliable channels, e.g. bpmux/udp, bpmux/[sctp](https://en.wikipedia.org/wiki/Stream_Control_Transmission_Protocol) or bpmux/[dccp](https://en.wikipedia.org/wiki/Datagram_Congestion_Control_Protocol).

Here are some old notes of mine on the abstract model and the concrete bpmux/rel specification. I've iterated on the protocol design a few times since then, so they are not up to date. Key things I'd handle differently than described beloware unordered streams of messages, and the details of how bpmux/rel encodes packet types. The core ideas (credit-based backpressure à la reactive-streams, packets à la packet-stream) are the same though.

The whole thing is pretty similiar to packet-stream, just with mandatory backpressure on all levels, and a slightly different abstract communication model.

Here are the notes I jotted down a few months ago, I hope the above introduction helps you to make some sense of them.


Infodump

A protocol for multiplexing messages, requests and responses, and sinks, streams and duplexes over a bidirectional, reliable, order preserving channel. Provides backpressure for top-level packets as well as per-stream backpressure.

TODO provide a description of how things work.

Top-level packets are guaranteed to be delivered in the order in which they were sent.

Unless indicated otherwise, sending a message consumes top-level credit (1 credit per byte sent, including metadata).

The VarU64 type is defined and implemented here.

When receiving a packet of unknown first byte, treat the next byte(s) as a VarU64 and drop that many bytes after that VarU64, then continue the stream. If any new packets are added to the BPMux protocol, they will include VarU64 indicating the remaining packet length directly after the first type. This will allow backwards-compatible protocol additions.

TODO:

  • close top-level with error
  • reject min-credit
  • include min-credit in CreateSink and CreateDuplex
  • add greedy sending versions of CreateSink and CreateDuplex
    • consume top-level credit to attach any number of items to directly send
    • peer replies how many of these items it accepted (the dropped ones should be resent using inner credit)
  • handshaking (give initial top-level credit, specify min-credit)

Top Level Packets

/// Allow the peer to send `credit` more bytes of top-level packets (does not consume credit)
MainCredit | credit: VarU64

/// Indicate that no more Message, Request, OpenStream, OpenSink, OpenDuplex packets will be sent (does not consume credit)
Close

/// Indicate the lowest credit at which you can still fully function.
Minimum | min_credit: VarU64

/// The same as sending RequestCancel and InnerCancel messages for all requests, streams and duplexes (does not consume credit)
CancelAll

/// Send a message to the peer (one-shot, unidirectional)
Message | length: VarU64 | payload: [u8; length]

/// The same as sending multiple messages. `total_length` is the remaining length of the packet (including the bytes for the length VarU64s)
Messages | total_length: VarU64 | (length: VarU64 | payload: [u8; length])*

/// Send a request to the peer (one-shot, receive a single response)
Request | req_id: VarU64 | length: VarU64 | payload: [u8; length]

/// The same as sending multiple requests. The reqests are assigned consecutive ids, starting from `initial_req_id`. `total_length` is the remaining length of the packet (including the bytes for the length VarU64s)
Requests | initial_req_id: VarU64 | total_length: VarU64 | (length: VarU64 | payload: [u8; length])*

/// Indicate that you are no longer interested in the response to a request of a certain id (does not consume credit)
RequestCancel | req_id: VarU64

/// Send the response to a previously received request (does not consume credit)
Response | req_id: VarU64 | length: VarU64 | payload: [u8; length]

/// Answer a previously received request with an error (does not consume credit)
ResponseError | req_id: VarU64 | length: VarU64 | payload: [u8; length]

/// Create a stream from the peer, with initial credit
CreateStream | inner_id: VarU64 | credit: VarU64 | length: VarU64 | payload: [u8; length]

/// Create a sink to the peer
CreateSink | inner_id: VarU64 | length: VarU64 | payload: [u8; length]

/// Create a duplex with the peer, with initial credit for the stream half
CreateDuplex | inner_id: VarU64 | credit: VarU64 | length: VarU64 | payload: [u8; length]

Inner Packets

These consume inner credit, not top-level credit.

/// Allow the peer to send `credit` more bytes of stream-level packets (does not consume credit)
InnerCredit | inner_id: VarU64 | credit: VarU64

/// Indicate that you are no longer interested in more items over a stream (does not consume credit)
InnerCancel | inner_id: VarU64

/// Send an item over a stream
InnerItem | stream_id: VarU64 | length: VarU64 | payload: [u8; length]

/// Send multiple items over a stream. `total_length` is the remaining length of the packet (including the bytes for the length VarU64s)
InnerItems | inner_id: VarU64 | total_length: VarU64 | (length: VarU64 | payload: [u8; length])*

/// Signal that no more items will be sent over a stream
InnerEnd | stream_id: VarU64

/// Signal an error over a stream (same as InnerEnd, but with a payload)
InnerError | stream_id: VarU64 | length: VarU64 | payload: [u8; length]
@Dominic %1/mEC4NBU56gri2Tn86NZ+22uW5Lkbz97Ice0mF/O/Q=.sha256

This looks broadly similar to what I was working on with muxrpc@7
But when I tried to deploy it I realized I had missed one of the most important things: how much credit do you give?

This needs to take into account latency: back pressure has something like momentum. It takes at least a roundtrip to give more credit, so to keep things flowing smoothly, you need to give more credit before all the credit has been exausted, so you need to estimate the amount that will arrive before they receive the credit.

If you want something to go really fast, and you give lots of credit, then want to stop it, you'll have to receive less before it knows. But if something is only moving slowly, then it's easier to stop.

I got to here, then put it on the back burner after trying to deploying it and breaking lots of stuff, especially making blobs go really slowly. It would be greate to have this though, because it would make ssb-tunnel work really well, that would enable bootstraping other protocols via ssb.

@aljoscha %J8liMO0/pXLopBGQDGb+/dte8Lgikq5ftoULGUWOYDk=.sha256

Yup, it is similiar to muxrpc, but the actual encoding will be much terser, and it will apply backpressure to nearly every byte sent, not just messages on substreams.

As for the latency: You can do an exponential backoff thing, where you specify the maximum amount cmax of credit you want to allow, and you send credit updates whenever you received messages worth cmax / (1 - (1/2)), cmax / (1 - (1/4)), cmax / (1 - (1/8)), `cmax / (1 - (1/16)) etc. of credit.

I also experimented with some optimistic messages that allow to overuse credit, but all of these turned out broken in some way. Some packets can specify an initial credit for the peer, but when trying to send an unsolicited stream of data to the peer, you first have to wait for some credit. Which in some sense reflects the user-level information flows on ssb =D

Another aspect, not mentioned in the notes at all, are unordered transmissions. Sometimes you don't need to preserve order of messages, and implementations that don't run on a reliable channel can utilize that. Not sure if this is really necessary, but it will be worth it to spend some brain cycles trying to figure out if there's an elegant way of providing it in the abstract specification.

If you want something to go really fast, and you give lots of credit, then want to stop it

Completely stopping something is done via cancellations and errors. I you just want to slow things down, you have to be conservative in the credit you gave in the first place. Assuming the network has no guarantees on latency, there's no way of taking back credit that is guaranteed to work. You could politely ask and hope that the network latencies work out in your favor and the peer does not pretend that your polite request came in too late, but that's all you can do. And this polite asking should perhaps be part of the application layer, not the protocol layer.

I'll stop now, I really don't want to invest the time into fleshing this protocol out, before there's a consensus on the whole message stop.

User has chosen not to be hosted publicly
@aljoscha %szjQzZ2UpNtE0rsVr0BHyXyMU19I3Es35iBQuZcsfU0=.sha256

@alanz: And also QUIC and SCTP. There's a lot of related work out there, but ssb needs something that can be layered on top of a simple tcp connection. But rest assured, I'm not reinventing stuff without looking at what is already out there. Which, by the way, also includes libraries like zeromq and nanomsg.

@Dominic %aR90vDH/9Pz6hIFLn5NZKtS2f3Vt3iwgmuvhmPgJVe0=.sha256

Some packets can specify an initial credit for the peer, but when trying to send an unsolicited stream of data to the peer, you first have to wait for some credit. Which in some sense reflects the user-level information flows on ssb =D

Yeah that is great! loading a blob is something that would make sense to give lots of credit for. In some cases, it might make sense to give the application layer some control. Like, if someone wants to send a really big file, let them do it but make them go slowly.

@aljoscha %kjVcquCp/RfaMF8gWL95j0jxfxT+KVQSYu3RmZZl1ps=.sha256

The concrete programming api I have in mind lets the programmer set the maximum credit that is handed out at once (let's call it cap). The implementation would then transparently send out more credit to the peer, using the exponential backoff calculation above, with cmax set to cap. The peer can send as much data as it wants, but only cap bytes are in transit at a given time.

Join Scuttlebutt now