Webinar Notes: Typesafe William Hill Omnia Patrick Di Loreto
My friend Oliver White is doing his usual bang-up job in his new gig at TypeSafe. One aspect is the humble webinar. Here are my notes for one that caught my eye, Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization. This is a very dense but well delivered presentation by Patrick Di Loreto who helped develop a new platform for his employer, the online gambling service, William Hill.
Morally, I am sensitive to the real damage done to real lives and families that is caused by gambling, so I will include a link to an organization that offers help: 1-800-GAMBLER. That said, this is just another instance of the ancient tradition of technology development being driven by something that traditionally is seen as vice. (For a humorous, NSFW and prophetic Onion article, search google for “theonion internet andreessen viewing device”. I’m old enough to have first read that on an actual physical newspaper!)
Now, on to the raw notes. YMMV of course, but if nothing else this will help you overcome the annoying problem of the slide advancing not being synched to the audio.
Context: presentation by Patrick Di Loreto (@patricknoir) R&D Engineering lead for William Hill online betting. The presenation was done for Typesafe as a Webinar on 14 June 2015. They have a new betting platform they call Omnia. - Need to handle massive amount of data - Based on Lambda Architecture from Nathan Marz <http://lambda-architecture.net/>. - Omnia is a platform that includes four different components * Chronos - Data Source * Fates - Batch Layer * NeoCortex - Speed layer * Hermes - Serving layer 03:47 definition of Lambda Architecture “All the data must come from a unique place (data source) They separate access to the data source into two different modes based on timeliness requirements. NeoCortex (Speed Layer) is to access the data in real time, but without some consistency and correctness guarantees. Optimized for speed. It has only recent data. Fates (Batch Layer) is to access the data not in real time, but with more (complete?) consistency. 05:00 Reactive Manifesto slide 06:15 importance of elasticty for them 06:47 Chronos Data Source: “It’s nothing else than a container for active streams” “Chronos is a sort of middleware. It can talk to the outside world and bring the data into their system.” Organize the data into a stream of observable events, called “incidents”. Can have different viewpoints for different concerns * Internal (stuff they need to implement the system itself) * Product centric (which of the WH products such as “sports” “tweets” “news”. * External (“social media sharing”) * Customer centric 10:12 Chronos streams connect to the external system and bring it into Chronos Adapter: Understand all the possible protocols that other systems implement. Connect to the other system. Converter: Transform the incoming data into their internal format Persistence Manager: Make the converted data durable. 11:22 Chronos clustering Benefits from the Akka Framework. Distributes the streams across the cluster. When failover happens, stream connection to outside source is migrated to another node via Akka. Keeps referential transparency. Each stream is an Actor which “supervises” its children: adapter, converter, persistence manager. 12:41 Supervising (Slides diverged from audio) (Slide 12) Supervision is key to allowing “error kernel pattern”. <http://danielwestheide.com/blog/2013/03/20/the-neophytes-guide-to-scala-part-15-dealing-with-failure-in-actor-systems.html> Basically, it is just a simple guideline you should always try to follow, stating that if an actor carries important internal state, then it should delegate dangerous tasks to child actors, so as to prevent the state-carrying actor from crashing. Sometimes, it may make sense to spawn a new child actor for each such task, but that