acunu and hailo: a realtime analytics case study on cassandra
DESCRIPTION
A use case (Hailo, taxi eHailing app) for Acunu's realtime analytics on cassandra nosqlTRANSCRIPT
@daiclegg @acunu
Hailo - a case study for Cassandra & Acunu
dai cleggoctober 2013JAX London
@daiclegg @acunu
What is Hailo?
‣ The world’s highest-rated taxi app – over 11,000 five-star reviews
‣ Over 500,000 registered passengers
‣ A Hailo hail is accepted around the world every 4 seconds
‣ Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation
2
@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo
‣ Launched on AWS
‣ Two PHP/MySQL web apps plus a Java backend
‣ Mostly built by a team of 3 or 4 backend engineers
‣ MySQL multi-master for single available zone resilience
‣ Get/create/update entity
‣ Analytics
‣ Text search
3
@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo
‣ A desire for greater resilience – “become a utility”
‣ Cassandra is designed for high availability
‣ Plans for international expansion around a single consumer app
‣ Cassandra is good at global replication
‣ Expected growth
‣ Cassandra scales linearly for both reads and writes
‣ Prior experience
‣ successful in-team experience with Cassandra
4
@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo
‣ Replacement of key consumer app functionality,
‣ split PHP/MySQL web app into:
‣ a mixture of PHP/Java services
‣ backed by a Cassandra data store
‣ Launched into production in September 2012
‣ originally just powering North American expansion,
‣ gradually switching over Dublin and London
5
@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo
‣ Further decompose functionality into Go/Java SOA
‣ Migrating:
‣ Entity databases to Cassandra
‣ Analytics to Acunu
‣ Search into Elastic Search
6
@daiclegg @acunu
Cassandra
@daiclegg @acunu
“Cassandra just works”Dom W, Senior Engineer, Hailo
8
@daiclegg @acunu
Some Considerations for Data Modeling
‣ Do not read the entire entity, update one property and then write back a mutation containing every column
‣ Only mutate columns that have been set
‣ This avoids read-before-write race conditions
‣ Choose row key carefully, since this partitions the records
‣ Think about how many records you want in a single row
‣ Denormalise on write into many indexes/views
9
@daiclegg @acunu
not obvious!
Some Considerations for Data Modeling
10Average years experience per team member
MySQL Cassandra
10
@daiclegg @acunu
whoops!
Some Repercussions of Data Modeling
11
@daiclegg @acunu
Some considerations for Application Development
People who canattempt to queryMySQL
People who canattempt to
query Cassandra
12
@daiclegg @acunu
Some Considerations for Applications development
13
@daiclegg @acunu
Acunu Analytics
@daiclegg @acunu
Hailo needed to understand system performance/business SLAs
Acunu Analytics
‣ Raw Cassandra lacks analytic primitives
‣ eg: COUNT, SUM, AVG, GROUP BY
‣ Acunu Analytics provides a platform for real time
‣ for pre-planned query templates
‣ It uses Cassandra as the store
‣ so it is highly available, resilient and globally distributed
‣ Integration is straightforward
15
@daiclegg @acunu
Acunu Analytics: technology
16
Real-time incremental cubing provides instant answers to Big Data questions
build cube from history
@daiclegg @acunu
Acunu Analytics: technology
17
Apache Cassandra is the repository
build cube from history
Apache Cassandra
@daiclegg @acunu
Acunu Analytics: an example
18
build cube from history
Define aggregate cubes:CREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time
New events update cubes
Rich instant queries over cubes SELECT TOP(keyword) FROM table WHERE browser = ‘chrome’ AND time BETWEEN.. GROUP BY d1, d2, ... JOIN ... HAVING .. ORDER BY ..
Drill down to raw events Populate new cubes from historic data
@daiclegg @acunu
Overview of the workflow
Acunu Analytics: summary
19
define aggregation cubes with DDL or infer from self-service queries
define connector: either from library, toolkit or REST
define pre-processors: programmatic, Java or
Javascript; or AQL query
develop queries in AQL, query builder or self-service data explorer
invoke queries from within applications with JSON query API
populate new cubes from historic data
define event schema with DDL or infer from sample events
fill cube from history
define alerts to be raised on trigger conditions
@daiclegg @acunu
some sample screenshots
Acunu Analytics at Hailo
“drill-across” to see breakdown of data
and in-depth analysis
20
@daiclegg @acunu
use cases
Acunu Analytics at Hailo
‣ Infrastructure and Application monitoring
‣ Real-time A/B testing of app layout and incentives
‣ Real time geo-view of supply/demand for drivers
‣ More in the pipeline
21
@daiclegg @acunu
Conclusions
@daiclegg @acunu
Conclusions
‣ Solid Cassandra design
‣ High availability characteristics
‣ Easy multi-data centre setup
‣ Simplicity of operation
‣ With Acunu
‣ SQL-like rich queries
‣ easier data modeling
Choosing the Platform
23
@daiclegg @acunu
Conclusions
‣ Have an advocate
‣ sell the dream
‣ Learn the fundamentals
‣ get the best out of Cassandra
‣ Invest in tools to make life easier
‣ Keep management in the loop
‣ explain the trade offs
Exploiting the platform
24
@daiclegg @acunuApache, Apache Cassandra, Cassandra and the eye logo are trademarks of the Apache Software Foundation.
Thank You.