You are reading content from Scuttlebutt
@Gordon %Kw9x8uLauZfbEe7D+5n1ORB6E7BljEMdKhUeieQE1j8=.sha256

Hullo @arj @andrestaltz. I have a question regarding #ssbdb2

First of all, ssb-db2 is really cool and thanks for working on it! Yay for ssb in the browser and performance improvements!

One difference between ssb-db2 and ssb-db that I'm struggling with while I experiment with ssb-chess in the browser is that in the old ssb-query / messagesByType APIs, the result pull streams emit a 'sync' message when the stream goes live. I don't think something similar is emitted in ssbdb2.

{'sync': true}

This applies to the majority of modules in the javascript ecosystem that return a pull-stream and accept a 'live: <boolean>' parameter, I believe. I think it comes from the lower level flumedb libraries.

I use this in ssb-chess to decide when to begin rendering certain things that have to build up data from a pull-stream to avoid the UI flickering from updating too much when it's still building the initial state. I believe Patchwork does similar for a lot of things. I'm not sure about Patchfox / Manyverse / oasis, etc.

Could we add something similar to ssb-db2?

I think 'no' could be a reasonable answer here but I thought it was worth discussing. One reason for a 'no' might be that it's not intuitive to expect a sync message with a different schema and it's common to forget about it / not know about it and get null pointer exceptions.

Perhaps as an alternative it could be attached to a field within the last message that is emitted before the stream goes live...

{
   "meta": { "sync": true },
   // The usual fields go here...
   "value":
..... "content": ...
}

That doesn't feel super elegant either but it at least stops null pointer exceptions and it does make it possible to know when a stream turns live...

Workarounds / Alternatives

cat(old, sync, new)

I've seen this pattern in a couple of places where you first grab the old messages, then follow up with the live messages:

ssbdb2

https://github.com/ssb-ngi-pointer/ssb-db2/blob/master/compat/log-stream.js

    const old$ = pull(sbot.db.query(toPullStream()), pull.map(format))
    const sync$ = pull.values([{ sync: true }])
    const live$ = pull(sbot.db.query(live(), toPullStream()), pull.map(format))

    cat([old$, sync$, live$])

Manyverse:

https://github.com/staltz/manyverse/blob/324262cd50b1e9d6301c4bf88390fab76823d2a4/src/backend/plugins/aliasUtils.ts#L85

https://github.com/staltz/manyverse/blob/3fc39193d1a6336d214ebe0e80027d2e90e76a67/src/backend/plugins/friendsUtils.ts#L33

I considered just using this too, but I am concerned it might lead to missed messages. If a message gets written to the database after the 'old' stream ends but before the 'live' stream starts, then there will be missed messages that aren't written to the stream.

I'm not sure if this is an issue in practise. I don't know enough about the nodejs / javascript event loop internals to know when the event loop would stop processing the pull stream and go back to a callback that is reading a new record off a socket / processing a callback for a newly gossiped record to the database before beginning the 'live' stream. I think there could be a race though... Opening up the live stream is IO and there could be a 'record' write in the event loop's list of callbacks to process first.

So I think this is a bug in Manyverse even if it doesn't ever occur (due to timings working out, etc.)

pull-pushable for live stream ()

This is the only other approach I've thought of. It's a lot of code compared to having a 'sync' flag that can be looked for...

// Start processing live stream before doing anything...

// https://github.com/pull-stream/pull-abortable
let aborter1 = Abortable();
let aborter2 = Abortable( () => aborter1.abort() );

let timestamp = 0;

// https://www.npmjs.com/package/pull-pushable
const liveStream = require('pull-pushable')()

// Stops pulling only when aborter2 callback aborts the stream.
// Note: 'live' here would only emit live values, not 'old' values
pull(pullLive({"old": false}), aborter1, pull.drain(msg => liveStream.push(msg));

const processOld = pull(oldStream, ... )

// If the message timestamp is <= timestamp we know we have already processed this message
const processLive = pull(liveStream, pull.filter(msg => msg.timestamp >= timestamp), pull.map(...))

const syncMsg = pull.values([{"sync":true}]);

// Aborter2 is necessary here to listen for when downstream ends the stream, and we need to abort the standalone live stream populating the deferred stream...
pull(cat([processOld, syncMessage, processLive], aborter2);

I think this would work but it's complicated enough it's probably bug prone and I'd no doubt get sick of repeating the pattern... Maybe it could be abstracted into a library function that deals with the boiler plate though...

I'm probably going to use this while we think about whether we want ssbdb2 to expose something...

Question

Do you think it'd be worth adding something to ssbdb2 like the {'sync': true} message? I'd be happy to try and modify ssbdb2 to do so, but I think the harder part here is 'making the right decision' rather than making the code change.

Maybe it's just me and my wee side project that's feeling the absense of this, haha.

@andrestaltz %xnrMoL1uoMMBepG4cre/fl4fO8U3RL/DHg/pF1b+480=.sha256

@happy0 👏 superb intuition for bugs! I was reading and was about to suggest cat([old,sync,new]) but saw that you wrote about that already.

About the race condition, it's possible that your intuition is correct, but we need to get hard evidence on that first. A test case would be great, and we have several test cases already. I hope our tests are well written so that it's easy to use them as examples and write a new one. In this case, we would need this tested in jitdb which is the module that does the actual logic for this specific live-vs-old concern.

Doing {sync:true} in jitdb would not be a quick and easy solution, because we need to modify the live function to support the old case, and that function is a bit of a monster. And even if we do that, we would still need a test case that makes sure that our changes solve the race condition. Otherwise, the addition of {sync:true} is no guarantee that it solves it.

So before we decide on which API is preferable (to have sync:true or not), I think it's important that we reproduce this race condition first. About the specifics of reproducing the race condition, we need to check what the gt byte offset threshold is in the live stream and check whether it matches the largest offset in the old case which is handled by paginate.

Join Scuttlebutt now