You are reading content from Scuttlebutt
@andrestaltz %HqvC29wmrW64/mbsL+6Je4mNwMNazxW9iSBb3Wya70c=.sha256

My #dev-diary for #ngipointer

Quite similar to arj's dev diary, I'll begin tracking here what has kept me busy in this project. We've done quite a lot already, so I'll try not to describe the past, but only the present, one day at a time. As a summary, we've built so far ssb-fixtures, jitdb, and ssb-db2.

After getting familiar with the bitvector magic @arj has done in jitdb, these last 24 hours I started coding this idea in jitdb I call radix idea. I got some preliminary results on a local branch, and it seems the performance is quite fast. I wanted to experiment with the radix idea rather sooner than later, because if it works out well, we can/should refactor several other parts of ssb-db2. Last night I got the basics done, after I hit publish here I'll continue polishing this radix idea, and corner cases and tests.

@andrestaltz %CdfeqdnNUOJhtkOSiJEb6axt0fdDwgoRMLyjoJpX1lY=.sha256

2 hours later: draft PR for prefix indexes in JITDB!

@andrestaltz %hTIcItt1J9Heiw4qczDHL2rJHkW9snkfpAwx2mTL3vs=.sha256

Finished my PR for prefix indexes, and getting good vibes about this idea, I made it run even faster with some performance tricks. Also helped arj fix a weird this and secret-stack bug in db2 here.

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@Anders %OaT6TgLAAg1098bGtZqr6tDn9mNUm2/3Ke2idmYtIDc=.sha256


There are many areas we have improved the performance. Starting from the lowest level the flume log we have these numbers:

async flumelog:

name, ops/second, mb/second, ops, total-mb, seconds
append, 923964.807, 138.002, 4620748, 690.149, 5.001
stream, 1059075.865, 158.182, 4620748, 690.149, 4.363
stream no cache, 1102803.818, 164.713, 4620748, 690.149, 4.19
stream10, 2540947.641, 379.51, 12714902, 1899.068, 5.004
random, 39715.656, 5.931, 198618, 29.664, 5.001

flumelog offset:

name, ops/second, mb/second, ops, total-mb, seconds
append, 306180.037, 45.74, 3064556, 457.817, 10.009
stream, 294511.348, 43.997, 2945408, 440.017, 10.001
stream no cache, 327724.949, 48.959, 3064556, 457.817, 9.351
stream10, 452973.302, 67.67, 4530186, 676.776, 10.001
random, 28774.712, 4.298, 287891, 43.008, 10.005

Next there is how we handle encrypted messages, where the overhead is constant more or less, meaning it doesn't matter how many indexes you have ( . And for private groups the improvements should be even bigger.

Then there is the way the indexes are handled. Currently we have more specialized indexes, I don't have raw numbers directly for those. This will get even better once we incorporate the prefix indexes. Overall these index improvements also means smaller indexes so we save some hd space.

Overall I think we could end up with something like 10x improvement once things are settled. We should know more in around 3 weeks. I plan on using the excellent fixtures to see exactly how much we get. Currently head deep in the implementation of all of this.

Another thing that I'm quite excited about is how the go implementation is coming along. I did some benchmarks a few months ago and the improvements for replication are quite good. I'm also looking forward to changing my server over to the go bot once we have EBT. @cryptix is working on that now.

Remote go:

full feeds: 2.797s
partial feeds: 30.752s

Remote js:

full feeds: 4.026s
partial feeds: 51.746s
@andrestaltz %5GffWUxRfoYP1YyOkXv84uE3Vr5B2jwCUEyyjVgVA1A=.sha256

Today's diary entry: that PR got merged, prefix indexes are going to be useful in ssb-db2, and late night ideas I sketched about inverted ideas.

User has not chosen to be hosted publicly
@andrestaltz %t9oUFgY6/V/yH8DngO0BJN1I2Y5pMTWoOCKL1FJ46/Y=.sha256

Yesterday I made ssb-fixtures support generating messages for private groups (ssb-tribes), because we'll need that to test out how performance looks like, also when decrypting messages for either box1 or box2.

I also did some refactors of jitdb and ssb-db2, making jitdb support null and undefined, and removing some leveldb indexes from ssb-db2, converting them to prefix indexes.

@andrestaltz %Y6LwwMA7iYZ8KIlJzN23lXREw/So07ZcJAxZazSNHUA=.sha256

Yesterday I worked 10 hours (hold onto your seats, precious hands, don't go RSI-ing again!) on various things ssb-db2, bug fixes, etc. Surprisingly, the bugs just keep coming one after the other.

I also wanted to quickly experiment with leveldb alternatives. I looked into cacache, which is a persisted cache built by the npm team, supposedly "really fast, really good at concurrency, and it will never give you corrupted data". I tried using cacache for storing the mentions of each feed. So key=feedid, value=array-of-msg-sequences. Nope, it wasn't nearly as fast enough as leveldb, in fact the query didn't complete within 60sec. So I dropped that.

Then I gave level-rocksdb a shot (RocksDB is a fork of leveldb built by Facebook, meant for servers). It was basically 10% slower than normal leveldb. Then I tried level-mem which has no persistence, it does everything in memory, I wanted to try that just to see if working with things in memory makes anything faster. Nope. So I decided I'm done with leveldb experiments. And cacache too by the way.

I began thinking about inverted indexes again, but it was a lot of work to experiment with that. I kept thinking about the problem of mentions. With cacache, it behaved a bit like your blobs folder: a bunch of nested folders for various hex prefixes, e.g. ./a8/77/theactualfile. And with that I think it made sense to me that each of the files should be small, so it's quick to save as well as quick to load into memory, and doesn't occupy a lot of memory when loaded. And that reminded me of jitdb bitvector indexes, which are approximately 150KB each.

So that brought me back to basic jitdb bitvectors, and I realized that the primary difficulty with mentions is that they're an array, while root or are just plain strings, which allow you to do a simple EQUAL comparison. So I built an INCLUDES operation in jitdb that checks if an array contains stuff. To build that, it required me to fix some tiny issues with bipf. It worked! Then, because mentions are usually not array of strings, but instead are array of {link,name} objects, I made an additional pluck function that allows you to compare a field inside each array item. And it worked too! I hacked that into ssb-db2 almost near midnight, and ripped out leveldb. I'm now thinking what else can be done with jitdb...

@andrestaltz %sSxg8Ijf55lXEJ14+F2LC9e9pWaq8mr7ZiZ8m7ZzqJo=.sha256

Since last thursday, I did this:

  • a bunch of refactors and tiny fixes to ssb-db2 and jitdb
  • completed that INCLUDES idea, made it official in jitdb, and using in ssb-db2 for mentions
  • found a 2000x (yes 2000x) speedup fix in jitdb:
  • began putting ssb-db2 in Manyverse just for likes, and ran into a bunch of bugs, began solving those bugs
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@andrestaltz %TaE6He6/wAQ5gIyGmh5SHHUZyrvLMUNFe/PneR+4Hos=.sha256

I began writing about Rooms 2.0, I don't have much to show yet.

In the meanwhile, I stumbled upon LMDB, an alternative to Leveldb (I can't get over Leveldb huh?). See it's Node.js bindings. Also, see this benchmark page of (a fork of) leveldb versus LMDB. I must resist the temptation to try using it.

@andrestaltz %ay+J81QWbeT3SPhjeBP8C5GtZGKSUxsVV5quc4TsPoU=.sha256

Wrote a lot about Rooms 2.0 in my new markdown wiki powered by Obsidian, looking for feedback (of any kind, open a GitHub issue or GitHub PR, or comment here).

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@andrestaltz %RRucLOlV5I11uRIj+rELu9NnL37o8OK5YZProxR70Cs=.sha256

So, I spent 80% of this week on non-coding work: writing room docs, preparing a talk for FOSDEM, communicating, coordinating, and in general engaging online ever since Trump's coup.

But on Thursday and Friday I coded, basically polished some final bits of JITDB and ssb-db2 (such as adding support for the not() operator), and then I converted ssb-threads to ssb-db2 and put it on my Manyverse to see how well it ran. It ran, okay. It seems fast-ish, but there are some bugs (some content doesn't load sometime). But I feel like I made progress. What's nice is that in the process I deleted a bunch of code in ssb-threads, so hopefully once it all works it'll be even less code.

User has not chosen to be hosted publicly
@andrestaltz %tcG2peSbLoIvajruowUHl09ASeJGQptbE4YJSgSwUkI=.sha256

I converted ssb-threads to ssb-db2 and put it on my Manyverse to see how well it ran. It ran, okay. It seems fast-ish, but there are some bugs (some content doesn’t load sometime).

Current status: continuing the above, fixing bugs, and I'm starting to feel... nervous. Found a bug that means we did old-log-to-new-log migration wrong, requiring a reset and re-migrate. Found more crashes. And worse: I don't yet see ssb-db2 being faster. That's the biggest worry, and we really need to make it significantly faster before adoption. :(

Sure, I have a backlog of ideas and tasks to do, going to go through them now, but I really hope this mess gets untangled and things work butter smooth.

@andrestaltz %VTIjqrsjr4rCMdHBWL+3Am8r6vciZov2w085QLMqsW4=.sha256
Content warning: Animated GIFs

Worked on some bug fixing: db2 130, jitdb 94, and db2 135

Sure, I have a backlog of ideas and tasks to do, going to go through them now

Some of those ideas were good, such as putting a maximum on the CPU load. This is something I've been trying to do many times. Long time ago I built pull-drain-gently and I've been putting it here and there and checking whether it makes a diff. So far it hasn't, and that's a bummer.

This time I made a similar module, but for push-streams (Dominic's new stream library that is half finished, it's much faster, and it's what we use in ssb-db2 for scanning the log): push-gently. But sadly, I couldn't just hook it up in the stream pipeline, I think there are bugs in push-stream. :P (By the way, check the bottom of the readme on that module!)

I ended up going for a more general solution, and built this module: too-hot. Then, I put it in db2, and put in Manyverse. I got some nice results!

What this accomplishes is that it simply guarantees that other CPU tasks will have opportunity to run. This is important for all apps, but mobile apps suffer more visibly. For instance, Patchwork also suffers from this problem! When it's super busy indexing, it doesn't react to clicks on stuff, see this GIF:


In comparison, Manyverse using too-hot, capping CPU at 85%, allows all UI interactions to run immediately, even though indexing is happening in the background:


User has not chosen to be hosted publicly
@andrestaltz %z1mHHo6bLQPAnVUGJv2xwclbUeTLblhMOVt+QrBD0uM=.sha256

Some updates on what I've been working on:

Besides the day-to-day bug fixing, PR reviewing of ssb-db2/jitdb, I began working on progress stats for indexing and migrating in ssb-db2, and together with @David Gómez we figured out a nice way of displaying loading bars and spinners in Manyverse that indicates what is going on as well as the progress of that. It's working well! This is basically ready for release in the next version of Manyverse.

ssb-db2 in Manyverse still feels slow, but I think that's the case because we have both ssb-db1 and ssb-db2 running concurrently. arj and I are almost done porting everything to db2, and maybe soon we can rip out ssb-db1 entirely and then we'll be able to finally test the performance of only db2 in Manyverse. We're expecting that migration takes 20~30 seconds and indexing takes ~9 seconds. If that turns out to be true, it'll be mission accomplished. Otherwise, it'll be better than db1, but will still leave some room for us to ponder how to make it faster.

User has not chosen to be hosted publicly
@andrestaltz %zICAmC5uuZpyq26oxchguJy7dZiUBIuMc52ikS6kVRw=.sha256

Weekly update: 😬🥵😟🥺😨😖

@andrestaltz %bJ29+pHH+YlR4Rt/tGx/D8yOnQ0tyuey3mFBvK+DKtU=.sha256

Okay, less dread now. Things were really dark at some point, but in several pair coding sessions with @arj we poked around everything, and we found horrible things. The feeling is that we've been pushing a rock uphill for months (literally) and then we finally reach the top of the hill. Things are supposed to go butter smooth now downhill, but we're so tired of pushing the rock uphill that we're just taking a break up here on the hill, having a picnic and chatting about how shitty the uphill was before gathering strength to do the easy downhill.

The biggest thing is pull-weird in muxrpc. That was the big culprit, although there are other culprits responsible for other types of slowdowns. You know, Dominic always admitted that there are funky streams in muxrpc, that are neither pull-streams nor push-streams, they're just weird streams. Specifically what was happening in Manyverse was:

The frontend asks the backend for a pull-stream of threads, and there's a pull.take(3) on the frontend side (this detail is important). What you would expect to happen is that the frontend pulls one thread from the backend, updates the take counter to 1, pulls another thread and updates counter to 2, pulls another thread and updates the counter to 3, then stops. What ACTUALLY happened was that the frontend pulled one thread from the backend, and the backend SPAMMED the frontend with an infinite amount of threads. Then, the frontend proceeds to fetch "about" msgs for each of those threads coming in. Yes, a ton of wasted work. No wonder it was slow.

Here's the catch, though. It was slow because it was fast. And with ssb-db1, it was fast because it was slow. I know that's unintuitive, but it's true, and I'll explain how.

With ssb-db1, fetching one thread was slow enough, that the frontend had enough time to fetch "about" msgs for the received thread before the next thread arrived. So after it received 3 threads, it "closed" or killed the backend stream, which under pull-weird semantics is allowed. pull-weird doesn't give you per-item backpressure (like normal pull-streams do), it only gives you a "shut up now please" backpressure to terminate the stream. So with ssb-db1, things were slow enough that the frontend had time to terminate the stream before it could spam us more. So it was fast because it was slow.

With ssb-db2, each stream emission was friggin fast, and spammed the frontend with so many threads that it barely could manage anything at all. This also explains a super old bug in Manyverse! The raw DB screen was always very slow, and this always gave me WTFs, and I've tried to fix it multiple times. Because the raw DB screen should be just a straightforward scan of the log, it should be the fastest query. Yet, it was the slowest thing. If you've ever opened that screen, you know what I'm talking about. Now we know the reason! Scanning the raw DB is friggin fast, and that's why the frontend got choked with too much work. It was slow because it was fast!

Yes now you see why I was in deep hell. The whole Manyverse was behaving like the old raw DB screen, even though I was expecting things to be much faster. It was very strange, because in our microbenchmarks, everything was behaving very fast. And that's because in our benchmarks, we didn't have any muxrpc boundary crossing a pull-weird bridge.

It's not new that pull-weird has this strange behavior, it has always affected createHistoryStream over the network for instance, and @cryptix has felt it more often than he'd want to feel it.

I found a simple and hacky solution. Basic idea is that muxrpc source is considered harmful, only use it when you really know what you're doing. (e.g. "live" sources that are unlikely to spam you thousands of stuff, that's an okay use) The solution is to use only muxrpc async APIs. So I devised a secret-stack plugin called deweird that presents to you a "source" API, which under the hood translates to only "async" calls to the backend. It works! I'll prepare the package and put it on npm.

Next up: case-by-case optimizations of queries, and fixing/improving the migration part to take care of both log.offset and log.bipf.

@andrestaltz %lgC6ePiP4ncf7e+ApXGoIlLKVQryJfJBH1uaYEHHS7E=.sha256


@cel-desktop %gz4t3IHIf86PsciZ7ZcOYGGQ1tl2bnqrZ0BeDQi70hI=.sha256

Nice write-up and great solution (ssb-deweird). I'm looking forward to seeing Manyverse users enjoy the speed gains.

In other SSB apps, the issue of no backpressure on DB query streams has sometimes been dealt with by making multiple calls to the underlying source with a limit option, for example using pull-next-query (or pull-more or pull-next-step), e.g. as seen in %nLSxO7z... by @mixmix.

@andrestaltz %pic9tt7wgvlWZbvPPo3ToR8jsR8433at4qkJZzud7eg=.sha256

The dragon is slain. We're sitting on the ground, one arm resting on the dragon, the other arm has just dropped the sword on the ground. I'm pretty sure the dragon is dead. I have mixed feelings: on one hand, too tired to celebrate, on the other hand, weirdly excited that there is no more dragon killing to do.

After deweird, we have several of tiny issues to fix, some of them quite important such as mysterious race conditions and deadlocks with the use of pausable push-stream sinks in async-append-only-log, new sorting algorithms in jitdb. There was one annoying baby dragon, though: migration from old log to new log still took 3h on mobile while heating up the phone. That wasn't good. Arj and I scratched our head until hair came off, then we killed the baby dragon 5min ago. Now, migration takes just 5 minutes.

On Manyverse, it now takes me just 9min to do migration plus initial indexing. Without migration, initial indexing takes 4min and the phone does not heat up, and there are progress bars telling you what's happening all the time. It used to take hours in the plural, and the phone heating up like a frying pan. This is what we fought for, after 5 months, and it is now royally gloriously Done with an uppercase D.

@Dominic %R9ikNjh7xe2j6aDL4vd7SBhQIc0h/xAZud2v41iqrKE=.sha256

I just checked patchwork and saw this last message and then read the whole thread. What an amazing journey. It was quite an emotional ride. Even just as a story, very dramatic. Especially the part where it looked like it was having a significant impact on your health! Congratulations on slaying that dragon!

@Daan %n5EfcNC540c0+hx/gU/ZB3Nr9MN3D+OFETj/syAxjnk=.sha256

I seem to just have received this post. This is absolutely amazing news! Congrats @andrestaltz and @arj, this is gonna be game-changing! 🎉

@mix %RGjWFmVWzonlNwiwxLf4GE5e7AO5Y7pUIVEYmYuFiUA=.sha256
Voted The dragon is slain. We're sitting on the ground, one arm resting on the dr
@andrestaltz %+EeMWE9+1TA1blfX+0u9jhPLQNqxDxmieCGd25D6/kQ=.sha256

It's been a while since I haven't posted here. Ever since slaying dragons, things have been easier but still equally busy/hectic.

After launching Manyverse, I built a couple new modules that help either the upcoming room 2.0 servers, or Manyverse with ssb-db2 or both:

  • ssb-room-client
    • The module ssb-room had code for both room server and room client, and this meant that client repositories pulled in a bunch of unnecessary dependencies. This ssb-room-client is nice in that it doesn't pull in all those dependencies, is (re)written in TypeScript, has tests, and is beginning to receive room 2.0 features
  • ssb-uri-spec
    • Lots of back and forth and design thoughts culminated in this specification for SSB URIs, which will be important for rooms 2.0, but obviously for other use cases as well
  • ssb-suggest-lite
    • Alternative to ssb-suggest (for autocompleting mentions when you're typing) that's tailored for ssb-db2 and uses lighter choices (doesn't pull a ton of data from the database). In TypeScript and has tests
@andrestaltz %kzdKb/bd89IkbjagnhMD7NBoNbAxQ68mZzqCLLRIUXk=.sha256

Ah, and there have been a bunch of updates to the rooms 2.0 document. What's nice is that @cblgh came along and did a pull request to render it as a website. There have been some changes to alias registration and SSB URIs. Now it's hosted here:

@andrestaltz %ze2MhLuywHmjytZdNtSTxlku6KG8wZtoJx2G2nBHK7U=.sha256

Oh my, this thread has been gathering dust, I'm sorry.

Basically in the past 2 months I've been working on go-ssb-room, writing Go and HTML templates and CSS styles with Tailwind. Picking up Go was easy, it looks like a conventional imperative language, and the way it handles errors was similar to a style I was doing in JS with promisify-tuple.

I can't detail all the rest of the 2 months, so I'll just skip to the recent past. go-ssb-room is basically done, and I recently worked on the client code for rooms, which means ssb-room-client (to support aliases), ssb-http-auth-client (to sign in with your SSB ID), ssb-http-invite-client (to accept weblink-based invites), ssb-uri2 (similar to ssb-uri, but supports a few more types of SSB URIs). We are very close to doing something we call a "community launch", which will be videos, tutorials, and clear explanations of what Rooms 2.0 are about. Looking forward to that. :)

NGI Pointer also asked us to participate in a pitching contest, and I wasn't sure if it was an important thing or not, but I volunteered anyway. You can find it from one of the 9 videos here:

@andrestaltz %R8gkAlWIdegMquiUmdGHhQyOw+7zP8mIj3nLROWCncU=.sha256

Here we go again! After releasing rooms 2 and updating Manyverse a couple of times, I began a new task in ssb-ngi-pointer: an important refactor/rewrite of replication-related modules before we add support for partial replication.

Arj and I call it "The Bermuda Triangle": ssb-friends, ssb-replicate, ssb-ebt. These three modules are very important to SSB in Node.js, yet they are deeply intertwined with each other to a degree that is maddening. I made a diagram of which of these modules calls the other module, and it looks like this:


We're about to build replication of metafeeds and its index feeds, so we have to add logic to some of these modules, and it would be horrible to just cram in more code into this mess. We need to be able to write code and be confident that it'll do the right thing.

It doesn't take an expert to see that something is wrong when these two files exist: ssb-friends/glue/replicate.js and ssb-ebt/legacy.js.

So I took a deep breath and started refactoring these modules. It started by modernizing the tests, making sure they run in GitHub CI for Node.js 12, 14, 16 and use easy-to-understand async primitives like async-await, not continuables. This was important because when refactoring you don't want to break existing logic. There was a ton of tests to update and fix. However, mistakes happen, sorry @Mix, I'll get to the bottom of that soon.

ssb-replicate is a little odd. It's legacy now that most/all peers have EBT, but you're still supposed to install ssb-replicate, configure config.replicate.legacy = false just to get an empty shell which ssb-ebt will monkey-patch. That sounds very unnecessary to me, and the obvious conclusion is that I'd like to deprecate ssb-replicate, in the sense of signalling on its README that you shouldn't use it anymore. It should still function as an npm package, just in case someone out there depends on it existing. Then, ssb-ebt needs to be changed to not have monkey patching. I started that already: see also

The basic plan is that I'll make ssb-ebt and ssb-friends become "passive" modules. They don't trigger or orchestrate replication, they just provide low-level APIs. Then, a new module ssb-replication-scheduler will use ssb-friends and ssb-ebt and be the "proactive" module responsible for causing replication.

In the process, we bumped into bugs left and right, which is kind of good because we don't want bugs, but also kind of scary to see bugs in something you thought was solid.

Also, as I test ssb-ebt (without ssb-replicate) in Manyverse I have a big doubt. It seems to replicate less content than ssb-replicate, but which one is correct? Does ssb-replicate fetch too much content (e.g. from blocked peers, or peers beyond the hops range), or does ssb-ebt fetch too little content (e.g. not fetching something that it should fetch)? It's very hard to discover the answer in large-scale deployments, due to the p2p nature. We hope netsim by @cblgh will help us answer these questions.

I joked in our internal chat that I should put "Proficient in Refactoring Dominic's code" in my CV.

User has not chosen to be hosted publicly
@andrestaltz %yB2zFKLR8BsOY/sGEGmuRKmcHbotrIgPnXb5pPEhOMg=.sha256

I didn't think I would find a bug in epidemic-broadcast-trees (the underlying logic in ssb-ebt), but here we are:

@Mix Android I believe you that ssb-replicate might be more stable with regards to not missing any data to be fetched, but I think ssb-ebt is the way forward and we really need to push to start using it for real. ssb-replicate is horrible for performance, and ssb-ebt is basically ready, it's work that we don't have to do, we just have to make sure it's polished and production ready. Once we use only ssb-ebt, without ssb-replicate, it's going to be a huge deal for performance, user experience, and onboarding user experience specially.

@andrestaltz %M6iTUeb1a28FOl65OSa8FfH/ghAtlhzPlglCSd2hK+U=.sha256

No more bermuda triangle! I finished refactoring ssb-ebt and ssb-friends, I'd appreciate if folks could take a look at these two PRs: ssb-ebt PR 45 & ssb-friends PR 67

I also tested them plus the latest ssb-replication-scheduler in Manyverse and it's been great! Replicates everything I want it to replicate, but does so with good performance (just based on my perception, I have not run macrobenchmarks yet).

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@andrestaltz %+n2G1C0scwB+lFIbyXUYVRO38dWVEFsJFZWON8+pxIM=.sha256

Been a while since I posted here, so I'll give some high-level overview of what's been keeping me busy, and maybe a bit of low-level details if I can remember them. I worked on:

User has not chosen to be hosted publicly
Join Scuttlebutt now