You are reading content from Scuttlebutt
@andrestaltz %pKCvhtNtn2xgLorsYZXApZGL6YsDQsol6/TtGylBMm0=.sha256

Onboarding versus Sustainability, a timebomb

Or: why we need money to tame database growth before we can move on with anything else.

TL;DR I want to check with the community whether it would be okay for us to use Scuttlebutt Maintenance OC funds or leftover Handshake council funds cc @elavoie to work on improvements to ssb-db2 that we need in Manyverse before introducing partial replication, private groups, and brokerAuth or shs2. The goal is to deploy these things in production in Manyverse, but the work's output should benefit anyone using ssb-db2 to develop apps.

Concretely, this means that I would work on this nearly full time, while someone else would take over Manyverse UI maintenance on 5 platforms (we are looking for someone who could do this, as a paid role funded by Manyverse OC!) temporarily. There is also a chance that @Mix would work on either ssb-db2 things or Manyverse UI, but I'll let Mix himself comment on that. @arj would be a perfect person to work with me on ssb-db2, but he has limited availability lately.

Having addressed the concrete request, let me finally give y'all the context. :smile: In the Manyverse newsletter I laid out that we need to significantly improve Onboarding, Sustainability, and Safety, but improving Onboarding will necessarily make the two other areas worse. So we have to solve for Sustainability first.

To give an idea how that's true, consider that currently SSB grows at a rate of 6 bytes per second (on average). There is actually a hard limit of 4.3GB for the maximum size of the log. Do the math and this means that apps will stop working in approximately 15 years (considering that current log sizes are approx 1GB). We could of course improve the database to use uint64 (thanks @Dominic for the ideas!), but that's only doable in Manyverse. Patchwork will not be updated, so it would definitely die in 15 years. But there are other limits. I just discovered today (with a reproducible test) that ssb-db starts giving out-of-memory crashes at about 2.4 million msgs, which is ~1.9GB. At the current rate of 6 bytes/second, Patchwork is going to be dead to out-of-memory crashes in 5 years!

So, if you improve onboarding, then there will be lots more people joining, thus more content. The rate of growth could double, or triple. We don't know exactly. Supposing it would triple to 18 bytes/sec, then Patchwork will be dead approximately September 2023, which is very soon in my opinion! And this is excluding calculations from large amounts of people joining via #planetary or other apps. This means I want to take it easy with novel onboarding methods. First, we have to make SSB storage use stable and support automatic pruning of the past, what we call "database compactions".

Database compactions are also important before we introduce partial replication, because index feeds will actually (paradoxically) double the amount of messages on the log, over night. That's because you need to write 1 index msg on the log for every "post", "vote", "contact" msg, and so forth.

I made a chart to outline these priorities as a roadmap, and it looks like this:

top-bottom graph with: database compactions, index feeds in production, private groups, brokerAuth invites or shs2, fusion identity, and cross-app bootstrapping

Everything depends on getting database compactions done first! So my request to the community is whether I can use any of the aforementioned fund buckets (approximately 2 months worth of 1 developer) to pay myself to work on this while I guide some other developer to work on Manyverse bug fixes and features.

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@andrestaltz %12SU7MwJRpVBJbzLlj5vRxtQ8cV2gWBX8tZju34LMqI=.sha256

Would these improvements also solve the 4gb max log size issue or will that stay? At first, no. The 4gb limit is not the most pressing limit (giving that we'll hit it in the far future), the other limits have priority, such as entering out-of-memory territory concerning indexes, or social limits such as people's patience to wait for a 2GB log to replicate and index. I definitely support the idea of extending the 4gb limit in the way that Dominic suggested ("pages" with u64 checkpoints but offsets still expressed in u32), sometime later.

Are you proposing to take the money away from that process?

@Matt Lorentz I'm not proposing changing that process at all, I think @elavoie can lead that. I'm just suggesting one goal or "priority" as to where those funds could go.

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@andrestaltz %HSN2ehhSI7AmscPUMMlrfhUpJ0TosI7KORa5Ukw1M9E=.sha256

Good questions @cblgh!

do you have a rough timeline / time allocation (other than the already mentioned ~FT) in mind? e.g. 1 person @ full time for 2 months

I'd like to start working on database compactions as soon as possible, but that probably means first we find some full-stack developer to work on Manyverse. It could take a while to find someone who: (1) is interested and available for 60%+ full time, (2) is comfortable or willing to debug issues in Electron, Android, iOS, (3) aligned with SSB principles, (4) accepts the payment terms from Manyverse's open collective (i.e. freelance, not employment, and roughly 2200€/month). Powersource was the first person I asked, but his time availability is quite limited.

Timeline would probably be at least 6 weeks, maybe up to 10 weeks.

do you have a rough budget range in mind? knowing estimates of these probably makes decisions easier :)

At 22€/h, the 6–10 weeks translates to 5000€–9000€.

posting about the fundraising need in the ssbc newsletter could help spread awareness! similarly so via e.g. manyverse twitter or other prominent channels

Yes! I tweeted yesterday and some ~5 people started donating recurrently. That's going to help at least better fund the Manyverse developer.

maybe this is a decent use case for trying to fund it via e.g. gitcoin?

Maybe I missed the opportunity when they ran that quadratic funding thing with Open Collective. (I missed it back then because Manyverse was not on Open Source Collective, it was on Open Collective Europe). I took a look at their page now and didn't find anything similar to that campaign they ran back then. Right now they have "grants" which seem to be like "Open Collective" but with crypto being the only way of receiving funds. I know that money in crypto now is EsSeNtiALLy rAiNiNg DoWn FoR fReE but actually dealing with declaring taxes with crypto is a pain in the ass (in Finland), especially if the crypto is received freely as a gift or donation, and I'd be super happy to not declare crypto taxes anymore. It basically subtracts from my time that I could be productive elsewhere. (I once spent nearly 4 full days on understanding crypto taxes in Finland)

maybe there are other companies


@andrestaltz %lI9LR0TSsL4qXmx+2FrsSNL5DN45fOHvZpv2XumTEZE=.sha256

In this scenario pw would die in 2023 regardless of this work getting done or not, since it would keep using db1.

Maybe to reduce FUD, let me try to explain in more details what "dying" means.

The test I ran was with ssb-fixtures on my 16GB RAM computer, it was the simplest tool I had at hand to create synthetic SSB databases of arbitrary size. I asked ssb-fixtures to create a db with 4 million records, it ran for ~1 hour and then crashed when the record count was something like 2.4 million records. Then I ran it again and it crashed again at around the same number (I should have verified if it was exactly the same number or not). If it were the same number, it would mean that some deterministic bug happened, but if it were not the same number, it was probably a legitimate out-of-memory (OOM) crash.

ssb-fixtures runs just a bare minimum ssb-db, it doesn't even have any extra indexes. So it's kind of the same thing as running ssb-db alone, creating as many msgs as we possible can.

Maybe ssb-fixtures got an OOM because it was creating msgs at the highest rate possible. Like, maybe there is a memory leak in ssb-db due to creating msgs and thus hitting the OOM happens only because msgs were created during the execution of the process. That is not the case with Patchwork, messages are created at a slow rate, and by many peers, and across several app "sessions". So it could be that Patchwork doesn't suffer the OOM at 2.4 million msgs, and could survive still 5+ years easily. But this is just one explanation, I don't exactly why the OOM happens.

There is an aggravator in Patchwork though: it runs many more indexes than pure ssb-db does. By definition it should use more memory than ssb-fixtures does.

(Hey, this just gave me an idea. I'll generate those 2.4 million records again and load them up in Patchwork and see how it handles that)

Also, Patchwork wouldn't die suddenly for everyone. It would first die for those who have the most amount of data. And there are ways how you can reduce the amount of data: you can start a new feed and follow only a very few amount of people, so to keep your database size as small as possible.

If the hypothesis I put up holds, then "death" just means that people's Patchwork installations start randomly crashing with OOM, it might be a showstopper (like crash on startup) or it might be tolerable (like crash every 2 hours). Anyway, I wanted to take preventive/precautionary measures and at least make Manyverse "patchwork-parity" before that starts happening.

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
@andrestaltz %cEvg7udIzpx4KY3svmFDF4e7SPG/HrcYIosFxPxsDWU=.sha256

@elavoie When can I post the proposal? Isn't the process still being defined? I would be ready to start in March, but this is not so sure because we first have to find a developer to work on Manyverse.

@SoapDog (Macbook Air M1) %AzVOjBnScVPp1wPoBHhSgr1MWcCOm+nZ120MG8hx6Tw=.sha256


There is no harm at if the proposal stays on SSB for a short while before @elavoie and others can kickstart the process. Better for it to be there and ready, than to wait for people to be ready and then write it down and post.

I suspect that the first round of the last batch of funding started with me simply posting a proposal for Patchfox, then others posted their proposals as well IIRC.


User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
Join Scuttlebutt now