You are reading content from Scuttlebutt
@jer %JgYgMCLlFHFm6dIPpg+BNIxtiOE995kah7jAiyzRzNs=.sha256

One of the things I like about programming out of my comfort zone, is I get great exposure to new things. Even in your comfort zone, you find new things that make some things that you have to do in various applications over and over, simpler. This week, for me, that was the gb_trees module in Erlang.

So I've been working on a hash ring implementation for a financial trading bot app, and as part of that implementation you've gotta figure out how you want to build the data structure to slot each node shard into the ring. Key things you're looking for:

  • Consistent Hashing -- I want the data to be distributed at a certain location based on how it hashes
  • As time passes, new larger nodes come online, I want to be able to weight them differently, so older nodes that can't take as much load don't get as much ... load :)
  • I want to keep data replicas in different zones
  • I don't want very much (< 0.5%) to be outside of the zone caring for it
  • I need a property that if a node is unavailable, the data will be saved to the next highest virtual node; and if there is no next highest virtual node, wrap to the first virtual node in the ring, thus completing the ring

As it turns out, I can use gb_trees for the heavy lifting in my hash ring code, and it performs really well. So the entire hash ring is about 90 lines of code, and supports adding/removing nodes, getting all nodes, and finding nodes to place data on when given a key. This is somewhat unfortunate because it means I have to add 3 nodes at a time to increase capacity.

I'm still thinking about whether or not its worthwhile to spend more time to build the hash ring a different way, such that I can have 1 ring for all my nodes, while guaranteeing good distribution of data on nodes in different zones with little over/under...

The application it's being built for is using this hashring to store data on nodes for order books of financial assets (crypto coins, forex pairs), and data is read out of nodes by my quantitative siganlling service to perform statistical analysis, so it has to be as fast as possible for the signals to be actionable by the order execution service, such that orders can actually get filled. So far, entire latency through the system (from the time the data comes in through the scraper to the time an order is placed) is 114 milliseconds. The slowest part of this whole pipeline is actually the REST call that I have to make to place the order, at 80 ms avg ttl.

This is totally not fast enough, but the ring isn't my bottleneck, the exchange is; so that's at least a good as I can get right now.

@jer %W+hWDy1hRX2sGonU3O48JFZRE9OTMK0vU9JOsDf/JLo=.sha256

I did say "next highest virtual node" when I meant "next highest node" Oops.

Join Scuttlebutt now