You are reading content from Scuttlebutt
@mix.exe %O3tQKuu5bD4uLCtRYwg3AQg5FR5v0XYn6kLoeOsgH8o=.sha256

was wondering about compressing msg content (before encrypting it), so had a quick poke with brotli

const log = console.log
const round = num => Math.floor(num * 100) / 100

const brotli = require('brotli')

const content = {
  type: 'profile/person',
  preferredName: { set: 'mix' },
  description: { set: 'a father of two, and programmer of scuttlebutt, ahau. enjoys sci-fi, video games, and planting native trees' },
  authors: {
    '@ye2312309187102931l238172387123017298=.ed25519': {
      350: 1
    }
  },

  tangles: {
    profile: { root: null, previous: null },
    group: {
      root: '%asdajsed123qkajsdlXCSDCIUjazskjd=.sha256',
      previous: [
        '%3234axaseadDEEascdcawseasdDWSDjd=.sha256'
      ]
    }
  },
  recps: [
    '%asdajsed123qkajsdlXCSDCIUjazskjd=.cloaked'
  ]
}

/* compress */
const buf = Buffer.from(JSON.stringify(content), 'utf8')
const jsonSize = JSON.stringify(content).length
log('initial size', { json: jsonSize, buf: buf.length }, '\n')

console.time('compressed')
const output = brotli.compress(buf, {
  mode: 1, // 0 = generic, 1 = text, 2 = font (WOFF2)
  quality: 8, // 0 - 11
  lgwin: 22 // window size
})
console.timeEnd('compressed')

const base64Size = output.toString('base64').length
log('compressed size:', { base64: base64Size, buf: output.length })
log('ratio:', round(jsonSize / base64Size), '\n')

/* decompress */
console.time('decompressed')
const arrayBuffer = brotli.decompress(output)
const original = Buffer.from(arrayBuffer)
console.timeEnd('decompressed')

if (!original.equals(buf)) log('error!')

Output:

initial size { json: 484, buf: 484 } 

compressed: 122.854ms
compressed size: { base64: 1061, buf: 291 }
ratio: 0.45 

decompressed: 1.456ms
@mix.exe %ascu3TmlXkKjG/1oGh87YczwkJYRXa9hPOetWBXd+ts=.sha256

I don't really know anything about bench-marking, so would love feedback.

What I have in mind, is taking the box2 encoding (which uses the envelope spec on a JSON string) but adding e.g. brotli before passing it in for encryption. Call it something like... box3

It seems ... like this could save a bunch of space, and at pretty insignificant cost? I'm running this on a beefy desktop computer, so would need to try a phone I guess?

Another question I have is how does this interact with db2? If I understand correctly, that's based on a binary format for the log, .. but how does that work when you've got encrypted content? Is the log plaint-text... that can't be correct because how would you rebuild the database to decrypt messages once you had a new group key?

cc @andrestaltz @arj

@andrestaltz %NrPByCpOI0/bOfQxurHcf2fXtCtWCkwauxlxSn54Xu0=.sha256

@mix.desktop @Mix Android brotli is a great idea!

What are you seeking to optimize though? Disk usage or network bandwidth? If it's disk, then we can also compress the entire log file, and decompress when reading it. Or compress the "blocks" (a concept used in both flumelog-offset and async-append-only-log as far as I remember).

Compressing the content may be more useful for optimizing network bandwidth.

User has not chosen to be hosted publicly
@Anders %ra10FOwEPCjwju70sneOe2EdZZtFCq1QJAT0lt5gGfs=.sha256

@Mix Android

I wonder if this compresses so well is because of some of the same words being the multiple times (set, previous, ids). Basically a lot of this tangle stuff. So what I would try to do is to do the same for your full log.

Another question I have is how does this interact with db2?

The log is stored encrypted. In the case of this brotli, you would have to know that the feed is compressed, so you have to decompress first of all to validate. Then whenever you get it from the database you decompress as well. Because the compression time is so large, I don't think it makes sense to store it decompressed.

One thing you can do with meta feeds is that you can attach metadata to it, so you could say that this feed is compressed with brotli and the other end, if it supports it, could then chose to replicate and handle that feed accordingly.

User has not chosen to be hosted publicly
@Anders %qPsNU52fWtIhzazDSn0hEZRi+6RSrFICg5nwxBnAfDY=.sha256

Yeah so in order to keep the same shape of things, encrypted messages are decrypted and returned as bipf so they can be read in-place as any other message. This is all handled in ssb-db2/indexes/private.js.

Join Scuttlebutt now