You are reading content from Scuttlebutt
@aljoscha %uJX/grT0vK9opXvlmf0eQB+SCrpWirZRzkrCcCxmQU8=.sha256

Reachability in Databases

Forking this post about a database that allows efficient querying of the causal order of ssb messages to not derail the thread.

@mix:

SQUEEE!!!

Indeed.


TLDR: Building a database that supports querying the causal order is not trivial, but possible.

Context: The discussion about removing mandatory timestamps from ssb messages has led us to the realization that we'd need the database layer to support efficient queries about the causal order of messages, i.e. whether some message is cryptographically guaranteed to be older than some other message. The causal order is determined by cypherlinks: If msg1 refers to the hash of msg2, it must be newer. We can treat the messages as the vertices of a graph, and cypherlinks as directed edges. The transitive closure of the edge relation (aka the reachability relation) is the causal order we are interested in. When asking whether some msg x is newer than another msg y, we are asking "Is there a directed path from x to y?".

This is a well-studied problem in database theory and implementation. As a theoretical introduction I'd recommend this summary, but it lacks any newer results (published 1990). For an overview of the current state of the art, I recommend the related-work section of this paper. But here is a summary of the basics, for your convenience.

All solutions to reachability query have to balance out three factors: Time complexity of query processing, space complexity of database indices, and time complexity of creating those indices. The two trivial solutions are:

  • not storing any indices, perform a depth-first search on the graph to answer a query. Query time is then linear in the number of vertices. This is not acceptable for ssb.
  • store the full reachability relation in indices. Query time is then logarithmic in the number of messages (or constant in the average case by using hash tables). This however requires storage space quadratic in the number of messages. This might be sufficient to test out how ssb without timestamps could work, but it is not acceptable in the long run.

The paper I linked to does a good job of roughly sketching the known approaches to interpolate between these two extremes, so I won't go into all of these. The gist is: They are good enough, and they get implemented in real databases. But there are a few ssb-specific properties we could use to obtain even better solutions:

We'd want efficient incremental updating of the indices, we can't recompute all indices from scratch whenever a message arrives. Luckily, we rarely lose edges or vertices from the message graph (only in the case of blocking). And we never add arbitrary edges, we only need to handle adding a single vertex and its outgoing edges (i.e. receiving a new message). These properties should allow us to find more efficient solutions than the general case. N.B: I did not find any literature on incrementally maintaining the reachability indices in general, so I did not find any on append-only databases in particular either.

We know that some subset of messages are in a total order, namely the messages from the same feed. No message is in more than one of these total orders. This allows a compact representation of reachability information that only takes up O(n log n) space: sequence numbers. Incrementally updating those takes constant time as well. These can be used to derive new algorithm from those for arbitrary dags. I also like how this implies that a sophisticated ssb database will always store sequence numbers explicitly, even if they are not part of the metadata.

Some further observations: The message graphs ssb produces will be fairly sparse for the most part. That's good, the heuristics in the literature work well on sparse graphs. Theoretically, we even have graphs of bounded degree, because there's a message size limit. So in theory, there might be some funky fpt algorithm. In practice, you can fit too many cypherlinks into a single message for that to actually matter.

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@aljoscha %SZzAqMDv1lqh0fo8CeWITAsp2QAozYDxz9HZA298Y8s=.sha256

@elavoie:

[...] I am really excited by the research prospects here.

This is funny, for me this is one of the areas of CS I try to avoid as much as possible. Too many heuristics, too much dependence on real-world caching behavior, too many trade-offs between time and space complexity. As fascinating as algorithmics are, I tend to back off once too many unclear tradeoffs have to be made. Oh, and everything that has to deal with file systems usually turns out to be rather unpleasant.

A really serious implementation attempt at an ssb-specific database would be some sort of append-only schemaless graph database. Being append-only is super useful, but schemalessnes and supporting arbitrary graphs are both really painful to get efficient.


@Piet Feel free to ask about anything you'd like me to explain in less computer sciency jargon. I'm just defaulting to it, because it is the most compact way of getting these notes out.

For boring ssb, or just for experimenting, you might want to look at graph databases. There are a lot of implementations for rdf data, and SPARQL queries are relatively easy to get into. Rdf is (somewhat) schemaless, so it might be the closest "real-world" thing to ssb where we could benefit from lots of previous engineering. I did not find any serious attempts at or information on append-only implementations, only this abandoned and unused one.

Assume that the sorting algorithm will be almost always be given data that's already sorted as we'll already have a partial ordering from the last time it was sorted

The problem with this is that we don't want to store the complete partial order, as that requires too much space (num_msgs ^ 2). So all scalable approaches involve some form of compression ((usually) not by running a literal data compression algorithm, but by carefully deciding which parts of the partial order to store, at which ones to recompute on the fly). A new run of the algorithm would need to recompute which parts to store and which parts to drop. So in general, this approach does not really work.

For (initial implementations for) ssb, it might be performant enough to naively store the whole partial order, in which case your observations do apply.

An ssb-specific representation can use the total order on messages from a single feed for some optiizations. Answering a reachability query "did msg1 happen after msg2" can be though of as trying to find a path to msg2.feed, that reaches that feed at a sequence number >= msg2.sequencce_number. This still leaves enough problems to require a fairly general solution (intuitively, you might have to hop along many feeds before you reach the correct one, and when trying to find it, you do not know which other feed to hop to). But it might turn out that the average query on ssb databases can benefit a lot from out structual knowledge of the graph, even if technically it isn't in any graph class that guarantees better algorithms.

When considering links between two feeds A and B, we do not even need to store all links. E.g. if message number 0 from feed A (let's write this as A::0) links to B::2 and A::1 links to B::1, we can drop the latter information. So we are not even storing the original edge set in the graph used for reachability queries, but a subset of it that has the same transitive closure (i.e. following all cypherlinks still leads to the same partial reachablity order). In some sense this means approximating the transitive reduction (exact computation of it is infeasible in our context). This does not give us any "hard" improvements, but it is very cheap and will improve the situation in practice. We might as well use any structural knowledge about the message graph that we have. And the partition of the vertex set into disjunct total orders (i.e. feeds) is super useful for optimizations.


@moid:

This is the hard part , the incremental update.

Yes, and it seems to be completely underresearched. Which strikes me as odd, don't real-world databases get updated from time to time? Well, it might also signal that this is really, really hard.

Many approaches do something like "let's encode all information in a matrix and then do a bunch of multiplications", which is about as unincremental as it gets. So I fear incremental updates will involve a bunch of heuristics. But being append-only (periodically running garbage collection on blocked feeds should allow us to disregard the fact that technically there may be deletion of vertices) and having the partition into feeds makes me optimistic that a solution can be found. I just don't really want to be the person to look for it.

I'm sorry about the paywalled links, I wrote this from univerity and didn't realize these were not open access. Here are open links:


While searching for these, I found this paper, which seems to deal with dynamic graphs specifically. Adding it to my reading list.

@aljoscha %20YTYK97A3MojxoDHD3kgHh2Zj1Dy7EF3NjlCXY54GA=.sha256

And another paper for graphs that get updated: SCISSOR (didn't read it yet)

User has not chosen to be hosted publicly
@cryPhone📱 %fzInqqvna35/f835uXQ2QqKkxdCGqwH1EaLo2Fr73YU=.sha256

For each scuttlebut app there is a final amount of valid json schemas.

You can publish any form of message with sbot. There is no contract of Schemas between sbot and it’s (local) clients. It’s up to the apps to verify the data they create and query.

@aljoscha %kjB05XmeipfsNZi7wFLEo6SrNB2F7l72MXDY8jUNf3A=.sha256

@Jan van Brügge @cryptixInTheCloud

Theoretically, there could be a server implementation that allows applications to define schemas up front, and that could use those for better database performance. But this is a whole other can of worms with its own set of problems.

But fundamentally, all apps need to work with free-form data. This is deeply ingrained in the values of ssb, because data is not tied to any applications. You can freely write your own programs that built upon the data already there, and there is no application lock-in.

Also note that we are currently considering to extend the set of values that can be stored, so thinking of it as just json will not be sufficient any longer.

User has chosen not to be hosted publicly
@aljoscha %Q3T7XeYjJEmlKpKxQIjtdvSSve0bQ1UiTkDG4uQZffE=.sha256

More random notes:

The problems Dominic encountered when trying to dynamically update the friends graph are related to those discussed here, whom to replicate can be reframed as reachability over paths of bounded length (which should be easier to answer than general reachability, since the length reduces the search space).

This paper discusses dynamic updates to a label-based index approach. Almost all complications stem from node delition problems - which we don't have in an append-only database. We also only add outgoing edges from a newly added vertec, but never incoming edges to a newly added vertex, and never edges between preexisting vertices. The latter is the most complicated case, but we get to ignore it. So this paper hides - in a lot of noise we are free to ignore - a scheme for incrementally updating (2-hop label based) reachability indices in an append-only, acyclic db. So apparently (cc @moid) this is not as big a problem as we feared.

This paper ("Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections") describes how to include distances in the indices, which might be helpful for scalable computation of whom to replicate

If we want to incrementally support vertex deletion (i.e. blocking), these papers describe a method, but they come at the cost of larger indices. Since keeping reachability information from a blocked feed in the graph does not hurt (it will only improve the partial order), I suspect assuming an append-only model and doing the occasional cleanup recomputation is the better solution.

Cypherlinks always form (directed) acyclic graphs, which spares us some trouble, as that is what all of these algorithms assume as inputs. For the purpose of reachability computation, arbitrary directed graphs can be converted into equivalent dags. But in a dynamic setting, keeping that representation up to date is not trivial (though possible).

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@Dominic %dxlyuIE6o5wvD91/zHE/ijLomkIpA5I6uOWpAEDfZjc=.sha256

I think this is a really interesting problem, but as I found when working on the friends-algorithm-rewrite that incremental graph algorithms are a lot more difficult than bulk algorithms.

One thing that would make this really interesting is it would allow us to say whether a message has been acknowledged by the network (move over, Proof of Work, Proof of Stake, here comes Social Proof!)

But I do think this would be really hard to implement efficiently enough.


We have an algorithm in ssb-sort, which is used to sort threads (and calculate the branch property) but it wouldn't be efficient to sort the entire database.

One thing you can do, is just give each message a monotonically increasing number, so that it's always at least 1 more than any message it links to. There could be multiple messages with the same number, of course. This means that any message Y with a number greater than another message X, means that Y cannot have happened before X, but it doesn't mean that X happened before Y. They could also both be parallel timelines (concurrent). We had these numbers in ssb-sort, but they turned out not to be useful.

Maybe this would be enough for feeds? (not sure...)

ssb-sort primarily uses causal ordering, and disambiguates with claimed timestamps.


@piet btw, I actually really like SQL, but just not schemas. MFR is sqlish, (map=where, filter=select, reduce=group-by)

User has not chosen to be hosted publicly
@aljoscha %3moSS9xHmfYNPHTK0xKFEPMps2d51WKbsj8XQJVFDYQ=.sha256

@Piet Traditional vector clocks are unsuitaable for ssb. We have an unbounded number of feeds, and thus no upper limit on the size of the vectors. An no single participant can even know how many other participants there are. There are generalizations that deal with this, for example interval tree clocks. But these mechanisms tend to get pretty complicated. As usual, I try to push the inherently complicated parts into the layer that is least-tied to ssb itself, i.e. the database.

@Dominic:

But I do think this would be really hard to implement efficiently enough.

Summarizing the relevant papers listed in this thread is something I want to get to, once the message format updates are done. There's a lot of densely packed information in there, but I think we can distill that into an algorithm that isn't too complicated. I'm happy to take the title of "Dude who reads through specs and papers so other's don't have to".

@aljoscha %wtCU/eSLEGcIDvDeWDAajS0kzWgTaheOwKR8h9hNi+E=.sha256

This writeup suggest a few simple heuristics for storing reachability indices leveraging special properties of ssb. They are not yet suitable for real-world application, but can serve as the basis for more fully-featured indices.

CC @Piet @dinosaur @moid @Jan van Brügge @Dominic @alanz


Causal Ordering of SSB Messages

Secure scuttlebutt (ssb) is an append-only data store, where all messages form a directed acyclic graph (dag). The messages themselves are the nodes, and references between messages (by means of cryptographically secure hashes) form the edges. An important problem that arises in dealing with this data structure in the context of ssb is the question of causal ordering: Given two messages, which one is older? Message x is (strictly) older than message y (written as x < y) if there is a (directed) path from y to x. Since messages form a dag, it is never the case that both x < y and y < x. It is however possible that there is no path in either direction, in which case it is impossible to say which messages are older. Mathematically speaking, the causal ordering relation is a partial order, not a linear one.

The main focus of this writeup lies on describing strategies how an ssb database can efficiently answer questions about the causal order between two messages. Other, related questions, such as finding (and possibly filtering) all messages older than some message, or computing topological sortings, are discussed briefly at the end. This writeup describes multiple approaches, starting with very simple ones, and advancing to more sophisticated methods. All approaches stay rather simple, they serve more as demonstrations of basic concepts rather than real-world indexing schemes.

Because causal order is determined by the existance of paths, this problem is nothing but the reachability problem on dags. This problem has been widely studied in the database literature, for a survey of existing approaches, study the related work section of Jin, R., Ruan, N., Dey, S., & Xu, J. Y. (2012, May). SCARAB: scaling reachability computation on large graphs. This document does not recapitulate the previous research, instead it examines how some unique properties of ssb can be leveraged for simpler and more efficient solutions than the general case.

A word of warning: Almost all big-O complexity claims in this text are very handwavy. The exact worst-case complexity often depends on many factors, including the total number of messages, the number of edges, the maximum out-degree, the number of feeds, the largest feed size, etc. So take them with a grain of salt, or better yet: Ignore them except as a nice intuition to have.

Basic Reachability

This section gives a short overview of the problem space.

Given the message dag, it is almost trivial to answer whether some vertex v is reachable from from another node u: Simply explore all paths originating from u and check whether v is eventually reached. This requires no database indices, but it is too slow for a practical implementation (worst case O(n + m)).

Time complexity can be reduced to O(log(n)) by precomputing the transitive closure of the edge relation, because v < u iff (u, v) is an element of the transitive closure. But the full transitive closure can take up to O(n^2) space, which is too much for practical implementations.

In practice, instead of storing the transitive closure as a complete reachability matrix, one can store at each vertex v the set of all vertices reachable from it (reach(v)). We will refer to such a set of indices as ind(v). This is still in O(n^2) in the worst case, but it is a sparse representation. Since ssb message graphs are relatively sparse as well, this scheme will serve as the basis for the indexing schemes described later.

Precomputing the transitive closure is the solution that minimizes lookup time by using as much space as (reasonably) possible. Exploring paths in the message graph however is not the solution that minimizes space usage. To minimize space, you could store the transitive reduction of the message graph, rather then the full graph itself. Reachability queries would then again be answered by following paths, but on the transitive reduction.

It might seem silly to store fewer edges than the original graph has, because those need to be kept somewhere in the database anyways. But at least in this discussion, we will assume a dedicated datastructure just for handling reachability queries.

All solutions are basically interpolating between storing the transitive reduction and storing the transitive closure. Or viewed in another way, all solutions choose a different degree of compression of the transitive closure. The more compressed it is, the less space is required, but the longer it takes to answer queries. Ideally, we'd like to find a solution that allows answering queries in sublinear time, while needing only quasilinear space for indices.

Query time and index space usage are the two main restrictions, but not the only ones. Another important one is the time complexity of computing the indices. And yet another consideration for ssb is that these indices need to be updated often. It should be possible to do so incrementally rather than recomputing indices for the whole database from scratch.

This concludes the short introduction to the reachability problem. The database literature provides approaches for efficiently handling the case of arbitrary graphs, but ssb can rely on a few special properties.

continued in next post...

@aljoscha %6UJXY6pNwSArXWSksLdj8jl5j0euQB7c7sh+E2xlUEs=.sha256

...continuation of previous post

Feeds and Sequence Numbers

Each ssb message is published by an identity, and appended to what amounts to a blockchain, called a feed. As a data structure, this is nothing but a linked list of messages. Any message can thus be uniquely identified by the feed's public key, and its sequence number (the index in the linked list). Notation: a_i refers to the i'th message in feed a. Every message belongs to exactly one feed, and the causal order inside such a feed is a linear order. This is what distinguishes ssb merkle dags from e.g. the ipfs merkle dags. And this partition of the vertex set into totally ordered subsets can be leveraged for compressing the transitive closure of the edge relation.

A first (and somewhat trivial) observation: Given two messages from the same feed, causal ordering can be determined by comparing their sequence numbers, which is much faster than traversing the linked list. This is an optimization every implementation can and should perform. Another, still rather trivial observation: Any backlinks from a message to another message of the same feed does not need to be stored, as the causal order is implied by the sequence numbers anyways.

A generalization of the latter observation can be intuitively formulated as "A very new message in one feed linking to a very old message in another one often provides no information regarding the causal order". Put formally: For all natural numbers i < j and k < l: if (a_i, b_l) in E and (a_j, b_k) in E, then (a_j, b_k) is not part of the transitive reduction. Proof sketch: There's a path from a_j to a_i since i < j, and there's a path from b_l to b_k since k < l, these paths together with the edge (a_i, b_l) form a path from a_j to b_k.

This has far reaching consequences, not only for index creation, but also during dynamic recompuation of the reachability relation. When computing the set of nodes reachable from some message a_i, it is sufficient to only store one reachable message per feed, namely the one with the highest sequence number.

The full Indexing Scheme

Taking these observations into consideration yields a very simple indexing scheme: For each vertex, store which other feeds can be reached, and store the highest sequence number at which they can be reached. The above observations allow to quickly restore the full transitive closure. A query only needs to look at the data stored at a single vertex, so we get optimal time complexity (O(n)).

We define some terminology to make the following explanations more succinct:

  • a vertex a_i shadows a vertex a_j if i >= j
  • a set V of vertices shadows a vertex a_j if there is no i such that a_i in V and a_i shadows a_j
  • a vertex a_i shadows a vertex set V if there is j such that a_j in V and a_i shadows a_j
  • the shadow set sh(V) of some vertex set V is the largest subset of V such that no element in sh(V) shadows another element in sh(V)

The above indexing scheme thus simply becomes "Each vertex v stores sh(reach(v)).

To create the indices, first compute a reverse topological sorting, so that the vertices can be iterated from oldest to newest. For each vertex, the set of indices is the shadow set of the union of all its out-neighbors indices. This simple computation can be used when adding a new message to the database, resulting in efficient incremental updates.

This approach still basically stores the full transitive closure, so space usage is fairly high (although already much better on typical data sets than naively storing all reachable vertices). The following section introduces an indexing scheme that has slightly worse lookup time, but needs less space for the indices.

continued in next post...

@aljoscha %2g/bxlJjCjnehn+K9b9cLH4Z9iay6iPm8S0owMgo4YE=.sha256

...continuation of previous post

O(sqrt(len_a)) Indexing Schemes

Suppose a is a feed of length len_a. In the previous scheme, the vertices store the full (shadow) set of vertices reachable from them. The central idea behind the new scheme is to only cache a subset of this information, and restore the full reachability information via an iterative graph traversal. For this, we need another definition:

  • the shadow difference shdiff(V, W) between two vertex sets V and W is the set containing all vertices v in V for which there is no vertex w in W such that w shadows v

The sqrt-1 Indexing Scheme

The new indexing scheme: Each vertex a_i stores shdiff(reach(a_i), ind(a_(i - sqrt(len_a)))). Just like the previous scheme, the indices can be computed by iterating in reverse topological order. Instead of computing the full reach(a_i), it is sufficient to use the union of the index sets of all outneighbors of a_i as the first argument to shdiff. This also works for incrementally updating vertices.

Only storing the shadow differences can greatly reduce space usage. Answering whether b_j < a_i is done via the following algorithm:

// Return value of `false` means that b_j is not older than a_i, but not necessarily
// that b_j is newer than a_i.
function older(b_j, a_i) {
  let current = a_i;

  while (current != null) {
    if (shadows(b_j, a_i.ind)) {
      return false;
    } else if (shadows(a_i.ind, b_j)) {
      return true;
    } else {
      current = get_message(a_i.feed, a_i.seqnum - sqrt(a_i.feed.length)); // null if second argument negative
    }
  }

  return false;
}

In the worst case, this requires a_len / sqrt(a_len) = sqrt(a_len) iterations.

Whenever the feed length reaches a new square number, all indices need to be recomputed. This yields amortized O(sqrt(n)) incremental index update costs (every sqrt(n) messages, all n messages need to be scanned). Where this is undesirable, the scheme can instead use the current sequence number for determinining how much data to cache in some vertex. Vertex a_i would store shdiff(reach(a_i), ind(a_(i - floor(sqrt(i))))), the query algorithm needs to be updated accordingly.

The sqrt-2 Indexing Scheme

Now we will introduce another concept for significantly reducing index size, while keeping roughly the same average-case query time at the cost of worse worst-case query time. The core idea is that not every vertex needs to store index information, instead it suffices if only some deliberately placed "cache vertices" store indices.

The "sqrt-1" scheme is modified to only store data in every sqrt(a_len)-th vertex. Intuitively, to answer a query, you first traverse the feed until you hit a cache vertex, and then you traverse the cache vertices. Finding the first cache vertex only takes sqrt(a_len) iterations in the worst case, so the overall time complexity stays the same. The total index size however is divided by sqrt(a_len), which makes an actual difference (O(a_len) turns into O(a_len / sqrt(a_len)) = O(sqrt(a_len))).

The previous paragraph omits an important fact that worsens the query time: In addition to finding the first cache vertex, we also need to consider everything reachable from the starting vertex up until the first cache vertex. This can include outgoing edges to vertices on other feeds, which might not be cache vertices either. To answer a query, you need to recursively compute temporary index sets, i.e. shadow sets of the union of all out-neighbors's index sets (which may again get computed on the fly). In the worst case, this can cascade across many feeds and messages. It is even possible that reachability information for multiple vertices from the same feed needs to be computed.

This scheme of scattering cache vertices is thus only reasonable if we can assume that message graphs which result computationally expensive queries are unlikely. Which they are, they only happen if a large swath of new messages from different feeds reference each other, and the new messages also have to happen to not include any cache vertices. So high query times are unlikely to happen by accident, and they would require a high number of malicious feeds to use them for denial of service attacks, which could be mitigated by simple blocking.

There are also a variety of mechanisms to improve these queries: When choosing between different out-neighbors of a vertex, implementations should prefer vertices with a low distance to the next cache-vertice, to increase the probability that the query can be directly answered from that cache (and its preceeding caches). Additionally, implementations can explore the graph in a breadth-first search (rather than the depth-first search you'd get by the obvious recursive implementation) to minimize the probability that the same feed needs to be visited multiple times. To protect against malicious input, it might make sense to include some randomization. And finally, if answering a query has required traversing many nodes, the implementation can simply add the newly computed shadow sets to appropriate vertices, speeding up any subsequent queries in the same area of the graph. This last point increases the complexity, but also makes the whole approach much more viable.

Due to these considerations, any indexing scheme presented in this document can be modified to only store indices every so often. Since this is an engineering concern rather than a real design question, we will ignore it from now on.

continued in next post...

@aljoscha %WpD965FnDSGXxBoWous/kepl1JcG9VqhlKnpINmUydk=.sha256

...continuation of previous post

O(log(len_a)) Indexing Schemes

The previous schemes saved a lot of index space, but the query time complexity dropped to worse than logarithmic. In this section, we'll propose two indexing schemes that can answer queries in logarithmic time, while still taking up much less space than storing the "full" indexing scheme.

"sqrt-2" worked by storing shadow differences to previous vertices in regular distances. Choosing any distance other than sqrt(len_a) gives worse results: For smaller distances, more iterations are needed to hop between cache vertices, for larger distances, it takes more iterations to find the first cache vertex. Even for "sqrt-1", larger distances are a problem, they incur larger cache sizes. Intuitively speaking, there's redundancy among the caches of successive vertices, and the distance hops serve as barriers against that redundancy. Increase the distances, and much more redundant data is stored.

To design a scheme with logarithmic lookup time, it is thus not sufficient to simply store shadow differences to a_(i / 2), as that would bloat indices. But still, logarithmic lookup time requires skipping over many vertices during the query. To solve this, we choose a scheme where only a few verties allow jumping far back during queries, and all other vertices jump towards those. It can be thought of as the traversal of a binary tree from an arbitrary node towards its smallest leaf, which has a worst case time of twice the height of the tree.

The log Indexing Schemes

"log-1": Each vertex a_i stores shdiff(reach(a_i), ind(a_(traverse(i)))), where traverse(i) is defined as follows:

  • if i is a power of two, traverse(i) = i / 2 (i.e. a bitshift to the right)
  • else, traverse(i) is a bitshift to the left: Shift the binary representation of i one bit to the left (inserting a 0 as the new least significant bit), dropping the most signifiant bit.

Index creation and queries proceed analogously to "sqrt-1". Unlike "sqrt-1", the "log-1" scheme does not require any reindexing when appending a vertex to the feed. Analgously to how "sqrt-2" is derived from "sqrt-1", you can derive "log-2" from "log-1" by using every log_2(len_a)-th vertex as a cache vertex.

There's really not a lot to say about this scheme that hasn't been said about the "sqrt" schemes, except that it is surprisingly elegant and powerful.

Reality Checks

Everything written so far assumes a very simple model: Append-only dags where new edges are only introduced as outgoing edges of a new vertex, and no queries other than binary reachability queries. The real setting of an ssb database is more complicated.

Blocking and Selective Deletion

Blocking a feed means removing all the feed's messages from the message graph. Selective message deletion (enabled by #offchain-content) allows removing arbitrary messages from the message graph. Both of these affect indices. It is simple to remove edges from the graph, but the problem is that their reachability information stays cached in other vertices indices. When querying infomation about new vertices, they will iteratively ask those same caches, so the information would linger in the system forever.

There are two trivial solutions two this: First, you can simply ignore the problem. As long as those indices are only used for causal ordering, the additional edges don't hurt - they only tighten the partial order. The other trivial approach is to rebuild all indices in the background, whenever a vertex/feed is deleted. This approach is expensive however, making vertex deletion at least O(n).

A smarter solution would start by deleting the vertex and its edges, and then checking locally which caches are affected by this, updating them as needed. The computationally expensive aspect of this is finding out whether removing an edge really removes a vertex from a cache, or whether it can still be reached along different paths. Also, there can be many caches affected by a single vertex deletion.

If blocks/deletions are expected to be rare events and to primarily happen on messages with relatively few outgoing links (such as human-readable texts), it might be ok to keep the indices simple and have deletion be somewhat expensive (especially since it can happen in a background-thread). Otherwise, it can make sense to increase the space usage of caches by storing information about how many paths there are that allowed reaching any particular message. Or you could use bloom filters to store the set of messages that refer to each message. There are probably more sophisticated approaches for leveaging ssb's properties for simpler vertex deletion, but this document is already getting long enough.

Out Of Order Messages

Out of order messages are messages for which the full feed is not known. These would need to be handled differently than normal feeds.

Messages can contain cypherlinks to other messages that the local database may not know about. Once such a message becomes known, it may provide more reachability information, complicating incremental index maintainance.

Topological Sorting

The database should be able to efficiently output a topological sorting of the vertices based on causal order, which the proposed indices do not support out of the box. Additional considerations are around storing backlinks, which may make it easier to efficiently compute topological sortings on the fly or reverse topological order. Incidentally, this sort of additional information would also speed up vertex deletion.

Custom Indices

In addition to reachability data, these indexing schemes could be used to store custom indices, such as reachability via only certain message types, or only via certain entries of a message. If the database exposes the ability to create such custom indices, it becomes more important to correctly and efficiently handle vertex deletion.

Another possibility for such custom indices is k-reachability (reachability where the shortest path connecting two vertices may have length at most k).

Conclusion

As the previous subsections show, the indexing schemes presented here are not sophisticated enough for a real-world ssb implementation. But they demonstrate how feed structure can be leveraged for efficient indexing schemes, and can serve as the basis for more complex ones.

User has not chosen to be hosted publicly
@aljoscha %nV/58IwSCEniUelpp0I7nUzjF/IgM4FepQhgxJennXI=.sha256

@moid One intuition I utilized to think about this is a sort of feed hopping. Within a single feed, you can quickly find all information, the hard part is finding a path to a new-ish message in another feed, which is done by traversing other feeds. Once you do this dynamically, you get pretty bad running times, so the indices cache all of that. I thought a lot about what you were suggesting, but couldn't find a good scheme. It amounts to dynamically iterating across feeds, rather than caching everything that crosses feed boundaries. But since a single message can have many outneighbors among many feeds, you don't really get a useful upper bound on running time. And it is trickier to place sparsely place caches.

The best I could come up with were randomized schemes where there is a certain probability (dynamially adjusted depending on factors like total vertex number, feed number, average feed length) that instead of adding information from a cross-feed link to the cache, you'd instead have to follow that link.

There's a lot of room for engineering/optimizing here, but it didn't make it into the introductionary post.

Quick aside: Instead of the follow graph (which is not really a core feature of ssb), you can simply impose an arbitrary order on cypherlinks, e.g. lexicographic order.


By the way, I think the conclusion that these were not suitable for real-world settings was overly pessimistic. Handling dangling cypherlinks is somewhat tricky but fine complexity-wise (time and space), for topological sorting you only need to store the set of messages without incoming cypherlinks (bounded by the number of feeds), for reverse topological sorting you'd need to store backlinks and the set of messages without outgoing links (again bounded by the number of feeds). While the reachability indices can ignore redundant cypherlinks, it can still make sense to store them in the same data structure, to allow the usual queries like "Give me all messages linked to by message foo", they just wouldn't be cached. And as for deletions, when only using this for order queries and topological sorting, ignoring those (except for some obvious optimizations) really is a viable option.

Join Scuttlebutt now