You are reading content from Scuttlebutt
@cel %5Pmvr9fijzjX/ouPf3pq+RfkvH5KX7RqYqN8RMltCDc=.sha256

flumefind.c

Given a flumelog-offset file and a message id, finds the offset of that message in the log, by scanning in reverse through the log file.

#flume

@Anders %VvatUmklNtD468ese9HL68v1cgREmJnyMZorKy9CQaA=.sha256
Voted # [flumefind.c](&FHKB0tMX2SolDWG6Bxvo5C5GWUAQUD5d2w/3ICrCg2k=.sha256) Give
@SoapDog %Vf6qQ35i4BSmatS3663JnzX9xVKn8EUchfQRoX3P3O0=.sha256

flumefind.rkt

@cel Thanks a lot for that C code. In this message I mentioned that I wanted to try playing with flume, so I've picked your code and reimplemented that flumefind utility using #racket. It is much slower than your code of course, not only because it is interpreted but because it parses the json for each frame. Still, it works and I am happy.

@Dominic thanks for the help with format description. @dinoƧ𝔸Ⓤᖇ thanks for the Rust code even though my baby knowledge of Rust got me stumbling on some parts.

The main loop is quite cute:

(define (flumefind pos)
  (define f (read-frame pos))
  (if (equal? (message-id) (frame-key f))
      (displayln (frame-start-position f))
      (flumefind (- (frame-start-position f) 4))))

(flumefind i)

Regarding speed, this are the results from running on my Surface Go under WSL (which has a notoriously slow filesystem access):

Cel's C version:

soapdog@FafiGo$ time ./flumefind ./log.offset %5Pmvr9fijzjX/ouPf3pq+RfkvH5KX7RqYqN8RMltCDc=.sha256
676860406

real    0m0.256s
user    0m0.047s
sys     0m0.141s

SoapDog's Racket version:

soapdog@FafiGo$ time racket ./flumefind.rkt --id %5Pmvr9fijzjX/ouPf3pq+RfkvH5KX7RqYqN8RMltCDc=.sha256 ./log.offset
676860406

real    0m15.977s
user    0m11.422s
sys     0m4.547s
User has chosen not to be hosted publicly
@Dominic %DugblrV7StE7mYZOgnX8AVeMFAW26Zq+FNxuZYGP3Q8=.sha256

@SoapDog you can surely speed this up by testing the key with a regular expression and then parsing the json only if that works. That way you save the json parsing time, but also avoid allocating memory for the parsed structure.

@SoapDog %jTmulhb/jAhJtryGFW+dpnYFnVIEl138izdskqFOD+c=.sha256

@Dominic and @m455,

Made a new version that allows you to either decode json or use RegEx. The speed gain is great. Below is the version running with decode-json? set to true:

> (time (flume-find "%5Pmvr9fijzjX/ouPf3pq+RfkvH5KX7RqYqN8RMltCDc=.sha256" #:decode-json? #t))
676860406
cpu time: 19453 real time: 19810 gc time: 4096

And this is the version using RegEx:

> (time (flume-find "%5Pmvr9fijzjX/ouPf3pq+RfkvH5KX7RqYqN8RMltCDc=.sha256"))
676860406
cpu time: 7640 real time: 7772 gc time: 1000

I'm running from inside DrRacket so it doesn't count the overhead of starting Racket but it is surely impacted by the other stuff DrRacket is running inside it. Still, the RegEx version is using less than half the time to find the message, and I am not being clever in the code. I'm sure that if I knew more Racket I could speed this even more.

@SoapDog %jX0iTTwP9vIcQ/ct4m9uKbo7PIguqAGncVOfUcpf6QM=.sha256

I wonder how a version built with SBCL (which is CL not Racket) or Chicken Scheme (which is Scheme not Racket) would run. Both are able to compile down to native machine code which should enable it to run in near C speed.

User has not chosen to be hosted publicly
@SoapDog %dk4kyaWbAWY/6BFHxHdpxxXmXibfyF7rRe1sQcIZ0/I=.sha256

It would be fun if people recreated flumefind.c in their favorite language so that we had more examples to look at...

@cryptix %A7GNPlRai8+PX/6xqFQzFiX9xeDlHlQ4lVvDoibyxhw=.sha256

@SoapDog challenge accepted!

@cel %xgqsk1EniWTuONdZ3SIzs7dj0jI2+yJ+OYDmhgszlXw=.sha256

Just a reminder, if you want the most efficient lookup of a message in ssb's flumelog, you can use the hashtable stored at file keys.ht to look up the offset of a message in the log by its id. That is what ssbget.c does. flumefind.c just linearly scans the log in reverse to find a message. So it works best for recent messages. This fits the use case I had for it, which was to find the offset of a message I recently published, so that I could truncate the log file to that length and then regenerate indexes, thus attempting to delete that message (this being done while offline or before gossipping the message). I made the tool because more than once I had executed that procedure and it was a little tedious to manually calculate the offset. (And I didn't want to erase the whole flumelog).

Also another pointer: some of the code in flumefind.c and ssbget.c came from flumecat.c.

I'm glad this can serve as a cross-language reference "Rosetta Code"-style task. The flumelog-offset format is simple and useful for many SSB users so it is nice to see it implemented in more languages. Nice work @SoapDog.

@cel %MeVLPAJ8+pam3axy3TGlw7a8vFUnPLRyQiL6kQyy0TI=.sha256

Also, part of the reason the C code is fast is because it is using memory-mapped I/O (mmap). Although with I/O caching maybe this doesn't matter. Good luck with the challenge.

@Soapy mcSoap %LSr2V8MLTAXwq4G3Xo7l0ieIpKZqh5B771OW33/YN8U=.sha256

@cryptix have you cooked some go amazing gizmo for this? :-P

@Soapy mcSoap %MIAR4CZiA/Yz4aK05aM96MTl6YZeCUja51/k/0I6g9Y=.sha256

Languages that I'd love to see an implementation of this:

  • OCaml
  • Haskell
  • Go
  • Common Lisp
  • Zig

I will send a digital thank you note with an original bad haiku about the language you used for anyone who sends an implementation in any of these languages :-P

Join Scuttlebutt now