polyglot persistence in the real world: cassandra + s3 + mapreduce
DESCRIPTION
This talk focuses on building a system from scratch, showing how to perform analytical queries in near real-time and still get the benefits of high performance database engine of Cassandra. The key subjects of my speech are: ● The splendors and miseries of NoSQL ● Apache Cassandra use-cases ● Difficulties of using MapReduce directly in Cassandra ● Amazon cloud solutions: Elastic MapReduce and S3 ● “real-enough” time analysis In particular the talk dives into ways of handling different kinds of semi-ad-hoc queries when using Cassandra, the pitfalls in designing a schema around a specific analytics use case. Some attention will be paid towards dealing with time series data in particular, which can present a real problem when using Column-Family or Key-Value store databases.TRANSCRIPT
Polyglot Persistence in the Real World
Anton Yazovskiy Thumbtack Technology
� Software Engineer at Thumbtack Technology � an active user of various NoSQL solutions � consulting with focus on scalability � a significant part of my work is advising people on
which solutions to use and why � big fan of BigData and clouds
� NoSQL – not a silver bullet � Choices that we make � Cassandra: operational workload � Cassandra: analytical workload � The best of both worlds � Some benchmarks � Conclusions
• well known ways to scale • scale in/out, scale by
function, data denormalization
• really works • each has disadvantages • mostly manual process
(newSQL)
http://qsec.deviantart.com
� solve exactly these kind of problem � rapid application development
� aggregate � schema flexibility � auto-scale-out � auto-failover
� amount of data able to handle � shared nothing architecture, no SPOF � performance
� splendors and miseries of aggregate � CAP theorem dilemma
Consistency
Partition Tolerance Availability
Analytical Operational
Consistency Availability
Performance Reliability
Analytical Operational
Consistency Availability
Performance Reliability
I want it all
(released by Facebook in 2008)
� elastic scalability & linear performance * � dynamic schema � very high write throughput � tunable per request consistency � fault-tolerant design � multiple datacenter and cloud readiness � CaS transaction support *
* http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra
� Large data set on commodity hardware � Tradeoff between speed and reliability � Heavy-write workload � Time-series data
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra
Cassandra
Operational
Reliability Performance
Analytical
Small demo after this slide
TIMESTAMP FIELD 1 … 12344567 DATA
SERVER 1 12326346 DATA 13124124 DATA 13237457 DATA
SERVER 2 13627236 DATA
� expensive range queries across cluster � unless shard by timestamp � become a bottleneck for heavy-write workload
select * from table where timestamp > 12344567 and timestamp < 13237457
� all columns are sorted by name � row – aggregate item (never sharded)
Column Family
row key 1 column 1 column 2 column 3 .. column N value 1.1 value 1.2 value 1.3 .. value 1.N
row key 2 column 1 column 2 ... column M value 2.1 value 2.2 … value 2.M
Super columns are discouraged and omitted here
get slice
get range
+ combinations of these queries + composite columns
get key
� all columns are sorted by name � row – aggregate item (never sharded)
row key 1 Emestamp Emestamp Emestamp Emestamp SERVER 1 row key 2 Emestamp Emestamp Emestamp
row key 3 Emestamp row key 4 Emestamp Emestamp Emestamp Emestamp
SERVER 2 row key 5 Emestamp Emestamp
get_slice(“row key 1”, from:“timestamp 1”, null, 11)
get_slice(row_key, from, to, count)
get_slice(“row key 1”, from:“timestamp 1”, null, 11) get_slice(“row key 1”, from:“timestamp 11”, null, 11) get_slice(“row key 1”, null, to:“timestamp 11”, 11)
Next page
Prev.page
� all columns are sorted by name � row – aggregate item (never sharded)
row key 1 Emestamp Emestamp Emestamp Emestamp SERVER 1 row key 2 Emestamp Emestamp Emestamp
row key 3 Emestamp row key 4 Emestamp Emestamp Emestamp Emestamp
SERVER 2 row key 5 Emestamp Emestamp
get_slice(row_key, from, to, count)
� Time-range with filter: � “get all events for User J from N to M” � “get all success events for User J from N to M” � “get all events for all user from N to M”
� Time-range with filter: � “get all events for User J from N to M” � “get all success events for User J from N to M” � “get all events for all user from N to M”
events::success::User_123 Emestamp 1
value 1
events::success Emestamp 1
value 1
events::User_123 Emestamp 1
value 1
� Counters: � “get # of events for User J grouped by hour” � “get # of events for User J grouped by day”
events::success::User_123 1380400000 1380403600
14 42
events::User_123 1380400000 1380403600
842 1024
(group by day – same but in different column family for TTL support)
� row key should consist of combination of fields with high cardinality of values: � name, id, etc..
� boolean values are bad option � composite columns – good option for it
� timestamp may help to spread historical data
� otherwise, scalability will not be linear
In theory – possible in real-time � average, 3 dimensional filters, group by, etc..
But: � hard to tune data model � lack of aggregation options � aggregation by historical data
“I want interactive reports”
Cassandra
“Reports could be a little bit out of date, but I want to control this delay value”
Auto update somehow
� Impact on production system or
� Higher total cost of ownership
� Difficulties with scalability
� hard to support with multiple clusters
http://www.datastax.com/docs/0.7/map_reduce/hadoop_mr
http://aws.amazon.com
� Hadoop tech.stack � Automatic deployment � Management API � Temporal cluster � Amazon S3 as data storage *
* copy from S3 to EMR HDFS and back
JobFlowInstancesConfig instances = ..
instances.setHadoopVersion(..) instances.setInstanceCount(dataNodeCount + 1)
instances.setMasterInstanceType(..)
instances.setSlaveInstanceType(..)
RunJobFlowRequest req = ..(name, instances) req.addSteps(new StepConfig(name, jar))
AmazonElasticMapReduce emr = ..
emr.runJobFlow(req)
Execute job on running cluster: StepConfig stepConfig = new StepConfig(name, jar)
AddJobFlowStepsRequest addReq = …
addReq.setJobFlowId(jobFlowId) addReq.setSteps(Arrays.asList(stepConfig))
AmazonElasticMapReduce emr =
emr.addJobFlowSteps(addReq)
� cluster lifecycle: Long-Running or Transient � cold start = ~20 min � tradeoff: cluster cost VS availability
� Compressing and Combiner tuning may speed-up jobs very much
� common problems for all big data processing tools - monitoring, testability and debug (MRUnit, local hadoop, smaller data set)
try { long txId = cassandra.persist(entity) sql.insert(some) sql.update(someElse) cassandra.commit(txId) sql.commit()
} catch (Exception e) { sql.rollback() cassandra.rollback(txId)
}
insert into CHANGES (key, commited, data) values ('tx_id-58e0a7d7-eebc', ’false’, ..)
update CHANGES set commited = ’true’
where key = 'tx_id-58e0a7d7-eebc’
delete from CHANGES
where key = 'tx_id-58e0a7d7-eebc’
non-production setup: • 3 nodes (cassandra) • m1.medium EC2 instance • 1 data center • 1 app instance
I numbers
real-time metrics update (sync): � average latency - 60 msec � process > 2,000 events per second � generate > 1000 reports per second real-time metrics update (async): � process > 15,000 events per second uploading to AWS S3: slow, but multi-threading helps *
it is more then enough, but what if …
� distributed systems force you to make decisions � systems like Cassandra trade speed for
Consistency � CAP theorem is oversimplified
� you have much more options
� polyglot persistence can make this world a better place � do not try to hammer every nail with the same
hammer
� Cassandra – great for time series data and heavy-write workload…
� ... but use cases should be clearly defined
� Amazon S3 – is great � simple, slow, but predictable storage
� Amazon EMR � integration with S3 – great � very good API, but … � … isn’t a magic trick and require
knowledge about Hadoop and skills for effective usage
/** [email protected] @yazovsky www.linkedin.com/in/yazovsky
*/
/** http://www.thumbtack.net http://thumbtack.net/whitepapers
*/