@Aljoscha actually, additional comment: even if we don't use mirrors, it would still make sense to change the gossip scheduler so that it disconnects as soon as (plus a few seconds) its done downloading the recent updates from a pub, and then go on to visit the next pub, and keep rotating.
it would still make sense to change the gossip scheduler so that it disconnects as soon as (plus a few seconds) its done downloading the recent updates from a pub, and then go on to visit the next pub, and keep rotating.
There's a lot of different options here, with a lot of tradeoffs. For example, EBT wants long-lasting connections. Here are a few components off the top of my head that a well-engineered solution should probably satisfy:
- keep a (small-ish) pool of active connections
- keep a larger pool of addresses for which the peer is probably still online/reachable, but without an active connection
- keep a large, persistent db of all/most/many previous gossip connections, together with some statistics (estimated uptime, pub?, followed?, following?, overlap in feed subscriptions, common networking layers (ip, filesystem, cjdns etc.))
- when introducing a new active connection, move one of the previously active connections to the passive address pool
- introduce new active connection via:
- peers that connect to you
- might have a heuristic for rejecting them
- a connection is dropped
- with few current connections, try a connection attempt that has a high probability of succeeding, use your db for that
- the occasional random connection attempt to a peer from the passive pool or the db
- this one is really important to fight network partitions and local optima
- peers that connect to you
- when evicting connections from the active pool, use a heuristic that might include the following factors:
- try to keep a certain number of highly-trusted peers in the connection pool
- once that threshold has been reached, be more adventurous with new connections
- how many feeds do you and the peer share an interest in?
- how recently did these feeds update?
- do you usually send data to them, or the other way around? (it's fine for a single connection to be asymmetric, but the active pool as a whole should be balanced
- are they an eager-push-peer or a lazy-push-peer (i.e. is this connection part of EBT's spanning tree)?
- how long has this connection already been active?
- when long since the last data transfer over this connection?
- am I configured to always keep this connection (example: my pub or my mirror)
- are we on the same lan?
- if I drop this connection, are there any feeds that I am interested in but that are not covered by other connections?
What I'm trying to say: There's no "right" way of doing it, and you can get as sophisticated as you'd like. The only really important thing: Whatever you do, add some randomization to escape from local optima.
Also, while I'm at it: In my opinion, the protocol should have a mode where there is a connection maintained between two peers, but it is not actively used for replication. It's just available as a backup, since connection establishment is fairly expensive (tcp handshake + shs handshake, also hopefully NAT traversal in the future). Keeping a connection alive is comparatively cheap. Instead of closing an active connection that gets replaced, you could just place it in a passive view of "standby" connections. In addition to this, these passive connections could still be used to exchange some metadata for optimizing gossip.