You are reading content from Scuttlebutt
@cel %nZX9OSMJNPbolioRyaVPMDENahfNnwKgw5QWT2XxR+I=.sha256

Distribution of size of SSB feeds

Scroll down for graphs.

The latest sbot RPC command lists the latest sequence number and timestamp of each feed your sbot has replicated. The latest sequence number is the number of messages in that feed. We can use this data to
look at the distribution of messages over feeds.

sbotc latest | jq -r .sequence | sort -rn > counts

In my data there are 9286 feeds, containing 617362 messages:

$ wc -l counts
9286 counts
$ awk '{sum+=$1} END{print sum}' counts
617362
$ flumecat -z ~/.ssb/flume/log.offset | grep -zc .
617362

The longest feed has 20080 messages. The most common feed length is 3, occurring in 1072 feeds:

$ head -1 counts
20080
$ uniq -c counts | sort -rn | head -1
   1072 3

The median count of messages per feed (rounded down) is 9.
Half of the feeds (4643 feeds) contain 9 or less messages (totaling 20092 or 3.25% of messages).
The other half of feeds (4643 feeds) contain 9 or more messages (totaling 597270 or 96.75% of messages):

$ bc -lq
9286/2
4643.00000000000000000000
$ sed -n 4643p counts
9
$ head -4643 counts | awk '{sum+=$1} END{print sum}'
597270
$ tail -4643 counts | awk '{sum+=$1} END{print sum}'
20092
$ bc -lq
597270+20092
617362
20092/617362 * 100
3.25449250196805115900
597270/617362 * 100
96.74550749803194884000

The longest 10 feeds (0.11%) each contain at least 7112 messages, and in total account for 19.48% (120255) of messages:

$ head -1 counts
20080
$ sed -n 10p counts
7112
$ head -10 counts | awk '{sum+=$1} END{print sum}'
120255
$ bc -lq
10/9286 * 100
.10768899418479431400
120255/617362 * 100
19.47884709457336214400

50% of messages (308681) are from the top 78 feeds (0.84% of feeds):

$ awk '{sum+=$1} sum > 617362/2 {print NR; exit}' counts
78
$ bc -lq
78/9286 * 100
.83997415464139564900
617362/2
308681.00000000000000000000

The top 1.00% of feeds (93 feeds) each contain at least 918 messages, and in total contain 52.52% (324264) of messages:

$ bc -lq
9286 * 0.01
92.86
93/9286 * 100
1.00150764591858712000
$ sed -n 93p counts
918
$ head -93 counts | awk '{sum+=$1} END{print sum}'
324264
$ bc -lq
324264/617362 * 100
52.52412684940116171700

The longest 10.00% of feeds (929 feeds) each contain at least 74 messages, and in total contain 503853 messages (81.61% of messages).

$ bc -lq
9286 * 0.10
928.60
929/9286 * 100
10.00430755976739177200
$ sed -n 929p counts
74
$ head -929 counts | awk '{sum+=$1} END{print sum}'
503853
$ bc -lq
503853/617362 * 100
81.61386674268905439500

Here is a graph of the longest 10% of feeds:

pngspark -s 0 -h 512 counts.png < counts
convert counts.png -crop 10%x100%+0+0 feed-message-counts-10pct.png

feed-message-counts-10pct.png

Here is a graph of the length of all feeds, on a log scale:

awk '{print log($1)}' counts | pngspark -s 0 -h 1536 counts-log.png
convert counts-log.png -geometry 960x feed-message-counts-log.png

feed-message-counts-log.png

Links for tools used:
sbotc, pngspark, flumecat, convert

Note: there seems to be a bug in pngspark where it doesn't display the lowest value on the right (which is nonzero).

I'm sure people more practiced with statistics could produce better, more informative graphs, tables and info. Here is just a start on some of the data you can get from scuttlebot and what we can learn from it.

Other things to do:

  • Weight messages by their size in bytes
  • Study length of feeds in terms of time

#ssb datavisualization

@Christian Bundy %YD+0aNRbRJdrjZSZPsWxqGuiz45JSSA2YR2Yc7R04Ao=.sha256

@cel This is really cool! Another thing I've wondered is the size of a feed, including both the size of the message and any referenced blobs.

@andrestaltz %g6ehUpDfQ9s7iqLx3oMRK7FLgPO/uvLRmyxB8JgqGzY=.sha256

Awesome stats @cel! I'm curious also about edge count, beyond node count (re: %Rr/M6sh... ). I think it would be rather straightforward to measure that with messages of type contact with following:true, except also need to account for unfollowing or blocking afterwards. Or use friends.all.

User has not chosen to be hosted publicly
@andrestaltz %8QfMSqRg0XzGQMSfOA9K+VmeVLPRwQDK+Pxy4Hsuo7k=.sha256

News! I generated a new graphml file with the social graph on my computer, and it's:

  • Nodes: 10022
  • Edges: 91818
  • On 2018-10-31

That's about 2000 new nodes since late August. The Gephi program finds more feed than SSB does because it also counts feeds that are just mentioned in contact messages, but not hosted as feed logs.

That's about 33 new accounts and 280 new follows every day!

@Frédéric Guilbault %wJ0jkNN+98KXgLvN8iKLCd7ij295t+gVaSy1AhbEqfg=.sha256

Amazing. Monitoring decentralised networks is a challenge :D

@Dominic %emZb/ySq9dHBP32L1nqh5GB4vCqquWuHIDWYTY7tWPs=.sha256

This is funny because yesterday some one asked me how many users there were and I said "probably 10k" but I hadn't seen this thread!

User has not chosen to be hosted publicly
User has not chosen to be hosted publicly
Join Scuttlebutt now