You are reading content from Scuttlebutt
@aljoscha %qMwAg6iW/W6IQH7o0K3+sNn7jaLV2ek/pQspURhJjTI=.sha256
Re: %G6FVdfLPI

Past me has spent some time on this.

First of, this can easily be generalized from tangles to arbitrary DAGs. There's no need to be restricted to a single root, simply transmit a set of nodes (let's call them sources) to collect all their transitive successors. So you send a set of sources, and then receive back everything that is reachable from them.

If you are missing N messages, you must ask the network N times. That is gonna be slow, especially if you have bad latency.

That's technically not correct: If there's you have a node with 20 outgoing links, and you ask for those 20 messages concurrently, then you still get the results after one rtt. The total rtt isn't proportional to the number of messages, it is proportional to the maximal depth of any node in the DAG (where the depth of a node is simply the length of the shortest path from a source to that node).

The idea of tasking the peer to send a node and everything reachable from there is similar to promise pipelining.

This probably needs some sort of mechanism to specify which cypherlinks are of interest (should be traversed) and which one should be ignored. In effect, this would end up as a database query language. Cap'n Proto comes to mind since it provides this combination of promise pipelining and filtering, another example is graphql. In addition, it might be nice to specify maximal recursion depths to prevent being flooded with far more messages than expected. If the peer signals that more data would have been available but the maximal depth has been reached, then you can simply send new requests. That way you basically interpolate between promise pipelining and request/reply.

Transmitting a set of nodes one already has can be regarded as simply another mechanism for filtering/restricting the result on the peer's side. It's good to keep in mind that this is not always an optimization, since it can actually make things worse: Suppose the peer only knows about 5 messages in the graph at all, but you just sent them 800 messages they should filter out. That exchange would have been more efficient if you hadn't sent a blacklist at all. So really this becomes a game of heuristics. Sometimes it might be a good idea to send only a subset of the nodes that you already know about.

It doesn't matter asymptotically whether we send the set losslessly or use an AMQ, so for simplicity I tend to think about this in terms of regular sets rather than bloom filters.

Rather then sending a set of nodes to ignore, one could also send a set of nodes for which you already have all the messages that are reachable from there. This actually still works with bloom filters. When the server traverses their DAG locally, they would cut off the traversal whenever a node is in the set (whereas in your blacklist they'd skip over the node but still traverse the successors). This does of course amplify the effect of false positives in an AMQ.

Just for completeness, all of this has been discussing declarative query specifications. Conceptually it would also be possible to send some procedural code to the peer that then does the filtering locally.

Join Scuttlebutt now