You are reading content from Scuttlebutt
@aljoscha %OxnLiHXOxulJ4Gtbx1Xv/gN03yQw8Dt3YioDq37GF0o=.sha256
Re: %l9kfaeS8X

@andrestaltz

it would still make sense to change the gossip scheduler so that it disconnects as soon as (plus a few seconds) its done downloading the recent updates from a pub, and then go on to visit the next pub, and keep rotating.

There's a lot of different options here, with a lot of tradeoffs. For example, EBT wants long-lasting connections. Here are a few components off the top of my head that a well-engineered solution should probably satisfy:

  • keep a (small-ish) pool of active connections
  • keep a larger pool of addresses for which the peer is probably still online/reachable, but without an active connection
  • keep a large, persistent db of all/most/many previous gossip connections, together with some statistics (estimated uptime, pub?, followed?, following?, overlap in feed subscriptions, common networking layers (ip, filesystem, cjdns etc.))
  • when introducing a new active connection, move one of the previously active connections to the passive address pool
  • introduce new active connection via:
    • peers that connect to you
      • might have a heuristic for rejecting them
    • a connection is dropped
      • with few current connections, try a connection attempt that has a high probability of succeeding, use your db for that
    • the occasional random connection attempt to a peer from the passive pool or the db
      • this one is really important to fight network partitions and local optima
  • when evicting connections from the active pool, use a heuristic that might include the following factors:
    • try to keep a certain number of highly-trusted peers in the connection pool
    • once that threshold has been reached, be more adventurous with new connections
    • how many feeds do you and the peer share an interest in?
    • how recently did these feeds update?
    • do you usually send data to them, or the other way around? (it's fine for a single connection to be asymmetric, but the active pool as a whole should be balanced
    • are they an eager-push-peer or a lazy-push-peer (i.e. is this connection part of EBT's spanning tree)?
    • how long has this connection already been active?
    • when long since the last data transfer over this connection?
    • am I configured to always keep this connection (example: my pub or my mirror)
    • are we on the same lan?
    • if I drop this connection, are there any feeds that I am interested in but that are not covered by other connections?

What I'm trying to say: There's no "right" way of doing it, and you can get as sophisticated as you'd like. The only really important thing: Whatever you do, add some randomization to escape from local optima.

Also, while I'm at it: In my opinion, the protocol should have a mode where there is a connection maintained between two peers, but it is not actively used for replication. It's just available as a backup, since connection establishment is fairly expensive (tcp handshake + shs handshake, also hopefully NAT traversal in the future). Keeping a connection alive is comparatively cheap. Instead of closing an active connection that gets replaced, you could just place it in a passive view of "standby" connections. In addition to this, these passive connections could still be used to exchange some metadata for optimizing gossip.

Join Scuttlebutt now