Caveat : This post may seem more ranty than pratically applicable. Recently, I’ve been toying with some of the implications of Nathan Marz excellent Big Data book, and Lambda Architecture discussed within it . The basic premise , and concepts with in the book revolve around the following principle. You have various systems that are great at doing those things they are good at, pigeon-holing them into multiple domains, doesn’t really make sense, systems that are easily able to recover from failures are those that store all of the data that would make it possible to recreate the state prior to a failure, and finally constantly regenerate your data form scratch so your system is virtually self maintaining.
In a nutshell , use some big data store to store all the actions required to rebuild the data that your application would need to serve, take the generated and use this as a static cache, periodically regenerate this cache by running a map-reduce task or some batch processing task over the big data, and finally serve the data that’s not in the cache from an easy to maintain tiny cache of the data that gets transformed by the same process that the batch job runs. I’m sure i’m not being entirely thorough here, so I encourage you to read the book or at least the first chapter to give it a full eval.
What this looks like in practice is Big Data Store -> Serving Layer Store (SQL,HBase,ES)->Speed Layer ( Elasticsearch,SQL,Redis,Mongo…) , in additionally probably some queueing system and lots of logs. I think generally this is great, my only beef with this the operational complexity of teaching people to deal in potentially 4-5 different systems that have to be more or less in lock step. I think that toning down the complexity of this, First not perpetually regenerating the index/cache, but doing it only when you have an issue, and in addition perhaps being ok with generating the caching layer as data comes in vs relying on batch processes to complete before updating the cache . P.S. with this you can use your tiny cache as a cache that has your needs more in sync , be that the caching policy is based on something other than time.
This will ultimately, make you not as resilient to failure, but it might make your life a little easier to deal with whilst you crank out more performance, and sleep well at night knowing you can if you really need to hit the recovery button.