my thinking is that we're on the right path, but the journey still requires more hard work before we reap all the sweet rewards.
what i mean, is that our data model is "event sourcing" ("every change to the data is an event"), our database model is "kappa architecture" ("we persist the events to an append-only log, we use the events to construct views that can efficiently and effectively answer questions (queries) about the data"). to this end, @dominic made flumedb (most similar database is Apache Kafka, but Flume is for a single machine not a distributed cluster), which provides abstract interfaces for a "FlumeLog" (how to persist the events to an append-only log) and a "FlumeView" (how to index the events to answer queries about the data).
to me, i'd be very disappointed if we stopped using the flume abstractions (Log and View), since i think they are necessary complexity for our system, and so this is something i continue to maintain as our direction for the #sunrise-choir.
however i think there are some implementation details that we can improve on with regards to using flume:
- using a FlumeLog doesn't mean we have to use
flumelog-offset
, i care about the abstract interface but not the current implementation - similarly, i find the current JavaScript FlumeViews (
flumeview-level
, etc) leave much to be desired, i reckon SQL is a match made in heaven for our FlumeViews - a FlumeLog's purpose is to persist the log, a FlumeView purpose is to index the log. this means the FlumeView can be rebuilt at any time using the FlumeLog's persisted data. a common concern i've heard is that we're storing the data twice, in the Log and in the View(s), so it's inefficient and duplicating and why not just have one database for everything. then let's solve that problem without throwing out the baby with the bathwater: if the purpose of a Log is simply to persist, why not have a Log which heavily compresses the data. because to me it's fundamental that we can change Views at any time for any reason without complicated migrations.
- and yes, every app (or ecosystem of apps who create a shared agreement) should have it's own database, i don't think we should make decisions based on this still being the case
- we should also expect a Scuttlebutt future with partial replication. last night at Art~Hack i was introduced to the latest thinking on this by @joshalja, i'm very excited for what could be possible in the future, i think this question of how to index Views out-of-order will continue to require effort, i think the data concerns will be minimized.