You are reading content from Scuttlebutt
@cryptix %i6pDs9WVWuG6QvcUIDejRumMj0Q0pFH6XD5SDq7vSUQ=.sha256

about time I start my #dev-diary for #ngipointer, too.

A lot has happened already since I started. The areas I'm mostly investing my time in are private-groups and EBT for #go-ssb, scheduled to be done for January.

I'm getting a bit anxious about the latter since the database changes to get private-groups usable ate a lot more time than I would have liked. So I might cut some corners in EBT, by rigging something up that is functional but a bit more wasteful then I would have liked but more on that later.

First I want to talk about what is happening with groups and the indexing system.

@cryptix %74dHfE8OI267N16sAVuSsYL2MZLXiz84kFzgK2ELmCk=.sha256

So... to the point: are private-groups usable in go-ssb? sadly the answer is yes and no.

I took over some decryption code for groups that keks wrote around October 2019. He already did the key-derivation and (un)boxing of bytes back then but what was missing is automatic decryption and (re)indexing. As you join/are invited to groups, the database needs to re-read all the messages by that other, find invites to other people and read those. Also care must be taken when crafting invite messages to a group to point to the latest tangle state.

Additionally the old indexing system had no notion of encrypted messages at all. private.read was just an orthogonal index to all the others. Meaning, you could read all the decrypted ones but you couldn't query: all the messages in this thread and unboxed, please.

So I did all of that and there now is an API to craft those and normal post messages to a group.

The code currently lives here on the appropriately titled Private groups mega PR. The current breakdown is 58 commits with +7,688 −2,105 changes. At some point it just became to big, and to add more to it I started Leftovers for private groups #67 which which does most of the things above minus the re-indexing and it's text will be the basis for the next post before I will talk about the re-indexing.

@cryptix %bd2oPU8YVFzora4rY8u4i1no9mDHRLt8CaXzlSAMxEU=.sha256

Some of those are more cosmetics but they annoyed me for a while and I hope to make the codebase more approachable by having consistent names and the like.

move keys to private/keys (bed7170)

This was confusing since there is also a keys.go in the root of the repo for the signing keypair.

Right now these keys are used only for private messages, so move the package there to make this clearer.

rename sbot.RootLog to sbot.ReceiveLog (ad164d7)

the stream of messages as they arrive

The old term wasn't saying much and also collided with tangle root.

cc https://github.com/ssb-ngi-pointer/jitdb/issues/50

sort by timestamp support (61c5c03)

supports three domains: claimed, received and feed sequence(*)

also supports createFeedStream and messagesByType sorting by claimed timestamp. (TODO: createUserStream)

*: this will become important later once feeds are received out of order.

ditch func (r ref) StorageRef() librarian.Addr from refs package and use TFK instead

See 5ce3d02 for the gory details. tldr: the refs package has fewer internal specifics from go-ssb.
(not merged yet, uses with-branches branch. to be merged with the tangle sort fixes)

fix tangle sort, add Heads() MessageRefs (69e6790)

go.mindeco.de/ssb-refs@v0.1.0 can sort slices of TangledPost now. It's a bit cumbersome to use because of go's type system. A slice of refs.Message needs to be copied into a slice of refs.TangledPost before that can be assigned to (refs.ByPrevious).Items to be sorted.

start interop testing against ssb-tribes

I added two cases to the tests that run against the javascript implementation. One where the JS side creates a group and invites the Go side, the other one does it in reverse (Go creates and invites JS).

This helped me thin out some problems in the DM key derivation (used for 1:1 messages and group invites) and that the cloaked groupID is actually using the read key not the group key for the derivation. So this work now, which is great!

On the negative side, i found two issues that feel like races on the JS side. They both only show if you run the tests over and over again (simply done using go test -count 100). I opened ssbc/ssb-db#316 and ssbc/ssb-db#317 respectively.

add private:bool query parameter

Added to messagesByType and tangles.replies but there is lot's TODO about it still. Before I add it to createLogStream I want to find a solution to the following but that can happen later.
First the code is terrible repetitive and doesn't fold in nicely with the existing margaret query code, similar to the live:bool code there now are two versions to drain a query to muxrpc. Consequently it isn't straight forward to add this to the internal indexing, to support private contact messages for instance.

Another missing feature from the application perspective is how to publish. As it stands an application needs to use groups.publishTo(groupID, { /* content */ }) and this doesn't play nice with existing clients which use publish({content, recps}), looking at the recps array to do the right thing. Using the existing code would leak messages since they would just be posted as-is. (edit: this is no longer the case, as of f531eb9 the muxrpc publish call snoops into the content object to look for an array or recps to do the right thing.

@cryptix %lcjr+sF+ZLtEwnIfIObuf99nG5ps86qPzW8VbA3yP/Y=.sha256

What I totally skipped in that enumeration was the change to the combined indexing setup.

As a refresher, before indexes were loaded/mounted as plugins. There was one for root:%yyy, one for author:@xxx, one for content.type:zzz. This worked mostly ok as long as they truly orthogonal to each other, meaning they don't influence or depend on each other. The only example that sticks out for which you can somewhat say this is author, no amount other of information will change its mapping. root: and type: however look vastly different depending on the content of a private message. Additionally, the contact graph and about/names index would ideally define their filter criteria on type: to read less messages but in the old system all the indexes read all the messages in the receive log individually, resulting in poor performance because of lots of overhead.

While I'm sure that it is possible to design an abstract stream processing system in which this can be cleanly modeled I took the opposite approach.

There is what I called a combined application index now which is in charge of filling author, type and root while also supporting unboxing private message on all of those. This means it's finally possible to do messagesByType contact private:true and put follow/block messages in groups or tangles.replies %theThread private:true to read a private thread..! I'm quite excited about this since this always blocked go-ssb from being used as a real application bot, not just a replication server.

Since the combined index fills all the sub-indexes at once this also removed the huge read-overhead, resulting in about 1 minute to index what took at least 5 times as long before.

@cryptix %ZYwmPHNdXXhzgskG5Aticp1YRMiB22DEKZHEviik0/c=.sha256

At this point you might ask, but why did you say "no" to usable then?

The answer and unsolved problem lies in the timestamp sorting I convinced myself I had to add to this to have a usable whole database. While this is functional there are a few bugs around nulling and re-fetching feeds, which happens if you block someone for instance. I haven't been able to reproduce in tests yet.

These bugs might cripple your queries until you delete the index folder manually for re-indexing and putting it out there in this state seemed like the opposite of what I tried to shot for (a usable database for clients). So I'm haunted a bit by my want for too much and stretched by the need to get started with EBT.

OTOH this feature was also needed for partial replication, so it seemed like a given to just do it. Not the timestamp (we all agree claimed ts is evil, of course) but the feed sequence itself. Once feeds are saved out-of-order in the receive log we need a way to assembled their intended order again.

@arj shared his solution for jitdb with me and I must say it's glaringly simple and awesome fast. For each entry in the receive offset log you hold the value you want to sort or paginate by in a gigantic array and serialize that to and from disk on startup. I implemented three domains for now: claimed ts, received ts and feed sequence.

I think and hope I just messed up around the serialization somewhere and then it hits a consistency failure case (len(rxlog) != len(seqsorter)).

I hope to get this done done once I have made some headway on EBT but right now there is too much ground to cover for trailing around this for another week.

@cryptix %S95tAx58EWXVOCjDbRO9wl9CgVc/TUGvjXUSeRPLSQ0=.sha256

one more thing on the re-indexing, from PR68:

This takes advantage of set operation on roaring bitmaps that we keep per author, per message type and per decryptable by us. This way we can construct the set of whose messages of this author are box2 but not yet readable and just reindex those.

The first attempt showed a problem with the bitmap indexes where the order is not kept as (previous) messages are added. ie first message is A and the set is [A]. As a message B which is before A is discovered (decrypted) it get's added to the same set but since its receive sequence is smaller then A the set looks like [B, A] and reading the 2nd message from that set will result in processing A twice.

As a workaround I'm loading an older badger multilog index which keeps the order (the set is [A,B]). A better approach would be to use the serialized array functionality which we use for the timestamp sorting.

@cryptix %KOSpV4TCmDaYbn0TUpfVbAEP/jS25NTuLxcZ8TXKFLc=.sha256

And another note-to-self / dragon territory warning is the whole sort by timestamp code. None of that cleanly folded into the existing margaret/luigi API and so the functions that support it (like messagesByType and createLogStream) now basically have to giant branches to implement sorting and bitmap operations or the classic margaret streaming-query API.

At some point I'd like to revisit this and make a much nicer iterator API to also solve these major refactors around all the empty interfaces but that is sadly out of the scope for the NGI grant.

Join Scuttlebutt now