You are reading content from Scuttlebutt
@Anders %AQNGc9G+6mWgilOxQFQJXcRdougXYLmqcHExBPFXXmk=.sha256

This is the first post in my #dev-diary for #NGIPointerPeach #NGIPointer

We (as in Andre and I) have been working quite intensely on getting jitdb and ssb-db2 into a state where we can start using it in manyverse and #ssb-browser. The reasoning for doing a ssb-db2 can be summed up as:

  • Performance, the database stores data in bipf
  • Replace flume with jitdb and specialized indexes
  • Run in the browser
  • Work well with partial replication

Jitdb is getting into a pretty good shape, the api is stable and we recently added live support so you can do:

db.query(
  and(votesFor(postMsg.key)),
  live(),
  toPullStream(),
  pull.drain((result) => {
    console.log("got a new vote on", postMsg.key)
 })
)

As Andre has also discovered, atomic-file is not as atomic as one would like, so I made a small wrapper module around atomically to enable it to work in the browser, called atomically-universal that should work better on node.

Private messages are coming along nicely and should be done soon. From there we have about, friends and private groups missing as bigger tasks before you can run a "normal" ssb social app on top of this.

If you are interested in following along or even better helping 😏 our work lives here.

@cryPhoneπŸ“± %VStXP3ctf8233/qxfamyXm+odXPEL2lufSiqM2CylTE=.sha256
Voted This is the first post in my #dev-diary for #NGIPointerPeach #NGIPointer
@Anders %Mha/KQPNexXVHm0NymVYWUmulipsI2cdlQA4nqQ1MH0=.sha256

Another week, another update.

Over the last week we merged a PR so private box1 unboxing is now working. I also did some experiments with private groups. Did a PR for sharing log streaming across multiple indexes that made indexing a bit faster. Then we merged some PRs to use prefix indexes that Andre has made, this had a significant impact on the storage requirements. Meaning index space for 1.4 million messages is roughly 80mb now. My .ssb flume folder of roughly the same things:

du --si ~/.ssb/flume/*
316M    /home/arj/.ssb/flume/backlinks-6CAxOI3f-
29M    /home/arj/.ssb/flume/clock
63M    /home/arj/.ssb/flume/feed
95M    /home/arj/.ssb/flume/keys
17M    /home/arj/.ssb/flume/keys.ht
3,2M    /home/arj/.ssb/flume/last.json
413M    /home/arj/.ssb/flume/links
480M    /home/arj/.ssb/flume/links2
123k    /home/arj/.ssb/flume/private-6CAxOI3f-
233M    /home/arj/.ssb/flume/query
23M    /home/arj/.ssb/flume/time
`

So to sum up, we have close to constant overhead for private messages and indexes.

@Anders %sqVE2AIvO1eXrJpwumaGhQ0FaIU11z1btJeRDCcOY/E=.sha256

Did some benchmarking of ssb-db against ssb-db2 today with the following results:

indexing 10.000 messages:

db1: 4.658s
db2: 939.209ms

roughly 4.9x speed up

Indexing 100.000 messages:

db1: 36.836s
db2: 4.994s

roughly 7.3x speed up

Indexing 1.000.000 messages:

db1: 4:23.873 (m:ss.mmm)
db2: 26.754s

roughly 9.89x speed up
@Anders %aYUnt2V7wU5V/TbBVyjEJz3meOZ1byMMnovNmpv/YDg=.sha256

Been a while since i update this thread. I spent the last couple of weeks ironing out bugs in ssb-db2, async-append-only-log and atomically-universal so that it should now be ready for production use.

@Anders %7T4vmwkPvAcKcpc21d/1C7f2GAUEkTNBbCBuoBFwyDs=.sha256

I keep forgetting to update this dev diary ;-)

Been busy preparing a talk for fosdem. Turns out obs studo is a pretty nice program for recording talks :)

Did a bunch of fixes to async flumelog and db2. Testing EBT in go against the browser, really happy to see that coming along. Spent quite a while yesterday trying to create an inverted index for prefix indexes. Seems to work pretty well with vote queries.

Also started porting ssb-about and ssb-suggest over to db2. Getting some help finish that from Kyle who has been doing a ton of work on ssb browser. More on that later.

@Anders %F15gPyccFswdAljjqCyGoOAEn7gLR1QwG0G4RU42SiI=.sha256

The inverted index for prefix indexes turned out to work great for some of our queries. This means that we got roughly 10-20x improvements on common queries (key, vote links etc). All of these are now < 1ms on a full database (1.5 million messages).

@Anders %moyNLgMwROai8MKeyLWCBHocOncrcdXdLQbjeEABa+s=.sha256

Fixed an issue in epidemic broadcast trees that has been buggy me for a very long time, where if you loose your state, there was no way to make the other end acknowledge that and send you stuff again. Really nice to have @cryptix there to pair on a solution with.

Tried converting ssb-social-index and ssb-suggest to ssb-db2. Landed some PRs, sadly its not there yet. We worked on bunch on this about issue, the question is do you want the assigned names of the network or only self-assigned names? Maybe with the ability for yourself to assign names. Right now I'm back to a level index of self-assigned names in browser-core.

Worked on supported db2 in ssb-friends. That also turned into a untangle the dependencies between replicate and ssb-friends a bit, meaning you can use ssb-friends to only care about the graph stuff and not all the replication stuff. This way it is possible to run friends without the replication plugin as we do in browser-core.

@Anders %i1ZdHr+QjBoN618J4zTZHCg/lXvimYH5W9NjvCfLu8M=.sha256

Been working on getting meta feeds into shape. Still about 27% done, so lots of work to do on that one.

Other stuff:

  • Helped Andre with getting the manyverse release ready
  • Paired with @mixmix on private groups. We used ssb-fixtures (see this PR) to generate a bunch of groups and mesages in these and found a few different bugs. If you want to help getting private groups ready for prime time that is a very good place to start :)
  • Worked on a faster sort in jitdb that has some nice improvements for queries like get me the latest 10 public posts for my whole database. Before it had to sort all the possible results before it could return the latest 10 based on timestamp. I tried a bunch of stuff including radix sort but nothing really helped until I thought, wait! Lemire must have thought about this problem before and yes of course he has. He even made a nice library that we could use.
@mix %qYb/Kc0MKn0LTeWv6xQ0Vn0Z3PjueYtka0nHr4ysz+E=.sha256
Voted Been working on getting meta feeds into shape. Still about 27% done, so lot
@Anders %aWJ8ANr7veK/SP/t3AfNOeo29s68EIpqi8J61a404Fo=.sha256

Did a bunch of work and thinking related to ssb meta feeds and wrote a post about it. I'm pretty happy about the current state and would welcome feedback before we start working on this next month.

@Anders %Z8zhXeHQKZK3mKkk+6vDrC3i49bVTZK9d7o9aS0dCDE=.sha256

Made a PR for JITDB that improves the performance of prefix indexes by up to 10x for queries like getting the latest 10 messages of this: and(author(x), type('type')). That query is now 15ms on my laptop running against a full (1.5e6 msgs) database.

Had a call with Mix about fusing together feedless identities with the work done in Ahau on identities. Very excited about this. Seems like we'll be moving forward with defining a proper spec. Notes from the meeting in case anyone is interested.

User has not chosen to be hosted publicly
@Anders %vEFXoLLyosnmQV3EfRbhdEv6CnbtDnkF8/EjRc21FQg=.sha256
  • Started working on meta feeds in JS, just the messages, seed values etc. of the spec. Feels good to be coding again ⌨️ 😏
  • Extracted the core of what @Mix Android and I have been working on over the last month for the fusion identity spec. Still some things to work out, but I feel it is getting into a good shape. Scoping this was quite important otherwise you can spec this thing forever.
  • Supporting @cblgh on the network simulator and @glyph on rusty validation. I'm so happy working with them, not only are they excellent humans they also produce great code :)
@Anders %dUUBkabmOIVktnT6jYf1iiO5fZmrsCsQ7+yO6hA5GgA=.sha256

Moving forward with bendy butt (the meta feed feed format). Thanks to test vectors from @cryptix we have the happy path of encoding and decoding covered. Been working an a module to use that so that it looks something like this:

sbot.metafeeds.getOrCreate((err, mf) => {
  // lets create a new chess feed
  mf.getOrCreateFeed('chess', 'classic', (err, feed) => {
    sbot.db.publishAs(feed, {
      type: 'chess-move',
      ...
    }, (err) => {
      if (err) console.error(err)  
    }))
  })
})

Still a long way before this can be used in production, at least we can persist these new messages in the database and get them out again. We have been talking about building small example applications to show how these things can be used to build things.

The work on bendy butt meant we had to come up with a spec for binary encoding SSB concepts. Lets a thousand butt formats bloom 🌷

As for replication, EBT has supported binary tranfers since version 8, and we have added some RPCs to be used together with meta feeds for partial replication.

Very much in flux all of this, thought I would share what I have been up to since the last update.

@Anders %Mzhh+egKZTD/zkIwXZddZnuHJu8xKZPxjkmOZqCjZSc=.sha256

Ugh, so patchwork tells me it has been 2 months since the last update. Summertime took a good chunk of that. Was good spending time with family in the 🌞 πŸ–οΈ

Anyway so what has been up. The last weekly meeting was more or less exactly 2 months before the deadline so with that in mind we decided to reduce the scope of the partial replication a bit in order to get something working properly instead of a ton of half-working code. This means we will be focused on only doing indexes for you own main feed. Another one of the motivations is that manyverse on the desktop has been moving steadily forward and with that, the need to index other peers feeds decreases quite a bit. Adding a new feed format is a ton of work, its pretty crazy actually. I have come to appreciate what can be build using meta feeds, not only for partial replication. The remaining tasks that we are working on in JS land is in this repo. The major tasks left are:

  • finalize meta-feeds module including validation
  • finalize bendy butt support in db2
  • feed replicator (to use the indexes)
  • netsim simulations of different scenarios for performance and cross platform testing
  • example applications

I'm really happy that we can start sharing more code between ahau and #ssb-ngi-pointer. I recently refactored some of the code from ssb-tribes out into a separate module called ssb-private-group-keys and ssb-bfe spec has been getting a lot of attention to make sure if it something we can use in both projects. This means box2 support in db2 got a lot closer to becoming reality.

Lastly I'm working on a example application that will be showcased at #svendsolar πŸ¦€ using meta feeds to build fun stuff that is not directly tied to the ssb social app use case. More on that later. Currently in ⌨️ 😼 mode.

@Anders %N0NWULSuWf0LCptTmZn8cGB+kRpHT36QGekFfNg8KmE=.sha256

Demoed the 8k demo an hour ago at #svendsolar

8k-demo.jpg

8K demo is an example SSB application running purely in the browser using ssb-browser-core. The aim of the application is to showcase some of the things we have been building during the ngi pointer grant.

8K demo works by allowing users to create their own applications and share these together with the data to other users connected to the same room. The name 8k comes from the message size limit, meaning applications can be no larger than 8192 bytes and demo refers to the golden demoscene days were packing a ton of stuff into a tiny file was an art in itself.

8K demo uses meta feeds to store information in different feeds. By doing so allowing for partial replication by selectively downloading the feeds of a peer you are interested in.

There are some sample applications here for inspiration of what can be build including a news app and mixtape built by @SoapDog

You can try it out here.

@Anders %INA23Wiru+IbE0/QcpOQhc+laZXyAzDFhQn1PymLj7E=.sha256

Finale month of the #ngipointer grant. The pieces are starting to fall into place.

Bendy butt, Meta feeds and index feeds are more or less complete. This means we are focusing on getting partial replication between nodes working with these building blocks. I just finished a quite big refactor of EBT to allow it to replicate other things than classic messages including bendy butt (in binary) and indexes. Next up is using replication scheduler to do the orchestration of what feeds needs to replicated how. And finally putting that into netsim for proper performance numbers.

While it might be a bit silent here, we are busy getting the last pieces in before the deadline. I can't think of a better group of people to be working together on this. Its so great to see the amount of teamwork happening behind the scenes.

@Anders %gKfGZwOWq//PtfG55hf3sRMz6237Yy0ylpvPd60GGKI=.sha256
  • Added transaction support to ssb-db2 and async-append- only-log. What databases doesn't have transactions these days? 😏

  • Wrestled a bit with ssb-ebt to shake out the (hopefully) last bugs. For the gory details on how promises might bite you in the πŸ‘

  • Updated 8k-demo to the latest dependencies and made it a bit more stable with > 10+ connections. Lets see. This bullet point doesn't end with a emoji, so 🌼

@Anders %fhmYMdKiwMtl0kQlix+3opzlFugHutRLD2vA5Z9bVX8=.sha256

Still deep in the netsim testing of the partial replication changes, it does look like there is light at the end of the tunnels, so we should have some numbers to share soon.

In other news, today I merged and released the ebt changes for partial replication as 8.0.0. Go is also getting similar ebt changes soon.

I updated the main site so that it now includes are the specs, videos and audits & research. Worth noting is Eileens dev UX report and @keks and Justins audit report on partial replication including a formal analysis of the fusion identity protocol using the Tamarin prover tool.

@Anders %W1a3g4D/ojzBiBD7VB7zZdBkGefwVOOHrDVQL1eL9po=.sha256

Thought it would be good with an update on this. On a local machine with basically no latency, partial replication is roughly the same speed as full replication except it replicates only around 30% the number of messages. This is with indexes, so there 2x overhead in the number of messages. If this was a new feed format instead where contacts would be neatly put in their own feed under the metafeed, this would be roughly Β½ the number of messages.

Why is this not faster? First of all this is a raw throughput test. So while full replication is just as fast, it consumes a lot more cpu and bandwidth. Partial replication on the other hand, for each feed, first needs to discover the metafeed. Download that, find the index feeds and replicate those. When we tested this in manyverse the app was quite use-able while doing an initial sync. Secondly this is on a local device, it would be very interesting to test this on wifi between a phone and a desktop computer or to use netem.

Overall I think this mostly works (90-95% there). It is very dependant on how many machines in the network start using metafeeds, which is why manyverse on the desktop will be a big deal.

Join Scuttlebutt now