dc area spark interactive nov 2015
TRANSCRIPT
![Page 1: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/1.jpg)
1© Cloudera, Inc. All rights reserved.
Jean-‐Daniel Cryans on behalf of the Kudu team
Kudu: a new storage engine for fast analytics on fast data
![Page 2: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/2.jpg)
2© Cloudera, Inc. All rights reserved.
Myself
• Software Engineer at Cloudera• On the Kudu team for 2 years• Apache HBase committer and PMC member since 2008• Previously at StumbleUpon
![Page 3: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/3.jpg)
3© Cloudera, Inc. All rights reserved.
Motivation and GoalsWhy build Kudu?
3
![Page 4: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/4.jpg)
4© Cloudera, Inc. All rights reserved.
Motivating Questions
• Are there user problems that can we can’t address because of gaps in Hadoopecosystem storage technologies?• Are we positioned to take advantage of advancements in the hardware landscape?
![Page 5: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/5.jpg)
5© Cloudera, Inc. All rights reserved.
Questions to the crowd
•What is Apache Parquet good at?•What is Apache HBase good at?
![Page 6: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/6.jpg)
6© Cloudera, Inc. All rights reserved.
Current Storage Landscape in HadoopHDFS excels at:• Efficiently scanning large amounts
of data• Accumulating data with high
throughputHBase excels at:• Efficiently finding and writing
individual rows• Making data mutable
Gaps exist when these properties are needed simultaneously
![Page 7: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/7.jpg)
7© Cloudera, Inc. All rights reserved.
• High throughput for big scans (columnar storage and replication)Goal:Within 2x of Parquet
• Low-‐latency for short accesses (primary key indexes and quorum replication)Goal: 1ms read/write on SSD
• Database-‐like semantics (initially single-‐row ACID)
• Relational data model• SQL query• “NoSQL” style scan/insert/update (Java client)
Kudu Design Goals
![Page 8: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/8.jpg)
8© Cloudera, Inc. All rights reserved.
Questions to the crowd
• How much RAM do typical servers have today?•What about in 2 years?• How many of you have hybrid ssd/hdd servers?•Who’s currently running/planning to run/would like to run with non-‐volatile memory express drives (NVMe)?
![Page 9: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/9.jpg)
9© Cloudera, Inc. All rights reserved.
Changing Hardware landscape
• Spinning disk -‐> solid state storage• NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping• 3D XPoint memory (1000x faster than NAND, cheaper than RAM)
• RAM is cheaper and more abundant:• 64-‐>128-‐>256GB over last few years
• Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t designed with CPU efficiency in mind.• Takeaway 2: Column stores are feasible for random access
![Page 10: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/10.jpg)
10© Cloudera, Inc. All rights reserved.
Kudu Usage
• Table has a SQL-‐like schema• Finite number of columns (unlike HBase/Cassandra)• Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP• Some subset of columns makes up a possibly-‐composite primary key• Fast ALTER TABLE
• Java and C++ “NoSQL” style APIs• Insert(), Update(), Delete(), Scan()
• Integrations with MapReduce, Spark, and Impala•more to come!
10
![Page 11: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/11.jpg)
11© Cloudera, Inc. All rights reserved.
Use cases and architectures
![Page 12: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/12.jpg)
12© Cloudera, Inc. All rights reserved.
Kudu Use Cases
Kudu is best for use cases requiring a simultaneous combination ofsequential and random reads and writes
● Time Series○ Examples: Stream market data; fraud detection & prevention; risk monitoring○ Workload: Insert, updates, scans, lookups
●Machine Data Analytics○ Examples: Network threat detection○ Workload: Inserts, scans, lookups
●Online Reporting○ Examples: OperationalData Store (ODS)○ Workload: Inserts, updates, scans, lookups
![Page 13: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/13.jpg)
13© Cloudera, Inc. All rights reserved.
Questions to the crowd
• How do you manage incoming data when running reports on data stored in Parquet?
![Page 14: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/14.jpg)
14© Cloudera, Inc. All rights reserved.
Real-‐Time Analytics in Hadoop TodayFraud Detection in the Real World = Storage Complexity
Considerations:● How do I handle failure
during this process?
● How often do I reorganize data streaming in into a format appropriate for reporting?
● When reporting, how do I see data that has not yet been reorganized?
● How do I ensure that important jobs aren’t interrupted by maintenance?
HBaseHave we
accumulated enough data?
Incoming Data (Messaging System)
Parquet File
Reorganize HBase file into Parquet
Reporting Request
New Partition
Most Recent Partition
Historic Data
Impala on HDFS
• Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file
![Page 15: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/15.jpg)
15© Cloudera, Inc. All rights reserved.
Real-‐Time Analytics in Hadoop with Kudu
Improvements:● One system to operate
● No cron jobs or background processes
● Handle late arrivals or data corrections with ease
● New data available immediately for analytics or operations
Historical and Real-‐timeData
Incoming Data (Messaging System)
Reporting Request
Storage in Kudu
![Page 16: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/16.jpg)
16© Cloudera, Inc. All rights reserved.
How it works
16
![Page 17: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/17.jpg)
17© Cloudera, Inc. All rights reserved.
Tables and Tablets
• Table is horizontally partitioned into tablets• Range or hash partitioning• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS
• Each tablet has N replicas (3 or 5), with Raft consensus• Allow read from any replica, plus leader-‐driven writes with low MTTR
• Tablet servers host tablets• Store data on local disks (no HDFS)
17
![Page 18: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/18.jpg)
18© Cloudera, Inc. All rights reserved.
Questions to the crowd
• How many failure(s) can a quorum of 3 nodes tolerate?
![Page 19: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/19.jpg)
19© Cloudera, Inc. All rights reserved.
![Page 20: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/20.jpg)
20© Cloudera, Inc. All rights reserved.
Questions to the crowd
•What are the main advantages of using an LSM tree when inserting/mutating rows, like in HBase and Cassandra?• In which situations can column-‐stores can read data faster than row-‐stores?
![Page 21: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/21.jpg)
21© Cloudera, Inc. All rights reserved.
Kudu trade-‐offs
• Random updates will be slower• HBase model allows random updates without incurring a disk seek• Kudu requires a key lookup before update, bloom lookup before insert, may incur seeks
• Single-‐row reads may be slower• Columnar design is optimized for scans• Especially slow at reading a row that has had many recent updates (e.g YCSB “zipfian”)
21
![Page 22: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/22.jpg)
22© Cloudera, Inc. All rights reserved.
Benchmarks
22
![Page 23: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/23.jpg)
23© Cloudera, Inc. All rights reserved.
TPC-‐H (Analytics benchmark)
• 75TS + 1 master cluster• 12 (spinning) disk each, enough RAM to fit dataset• Using Kudu 0.5.0, Impala 2.2 with Kudu support, CDH 5.4• TPC-‐H Scale Factor 100 (100GB)
• Example query:• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkeyAND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;
23
![Page 24: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/24.jpg)
24© Cloudera, Inc. All rights reserved.
-‐ Kudu outperforms Parquet by 31% (geometric mean) for RAM-‐resident data-‐ Parquet likely to outperform Kudu for HDD-‐resident (larger IO requests)
![Page 25: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/25.jpg)
25© Cloudera, Inc. All rights reserved.
What about Apache Phoenix?• 10 node cluster (9 worker, 1 master)• HBase 1.0, Phoenix 4.3• TPC-‐H LINEITEM table only (6B rows)
25
2152
21976
131
0.04
1918
13.2
1.7
0.7
0.15
155
9.3
1.4 1.5 1.37
0.01
0.1
1
10
100
1000
10000Load TPCH Q1 COUNT(*)
COUNT(*)WHERE…
single-‐rowlookup
Time (sec)
Phoenix
Kudu
Parquet
![Page 26: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/26.jpg)
26© Cloudera, Inc. All rights reserved.
What about NoSQL-‐style random access? (YCSB)
• YCSB 0.5.0-‐snapshot• 10 node cluster(9 worker, 1 master)• HBase 1.0• 100M rows, 10M ops
26
![Page 27: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/27.jpg)
27© Cloudera, Inc. All rights reserved.
Apache Spark
27
![Page 28: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/28.jpg)
28© Cloudera, Inc. All rights reserved.
Motivational Quotes
• “seeing kudu as a great fit for Spark Streaming workloads”
• “One unexpected benefit I am seeing from Kudu is the filter performance on non-‐key columns -‐ subsec on 2 million rows of meetup data... (on small ec2 instance)”• Kudu user in our Slack channel, 11/03/15
28
![Page 29: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/29.jpg)
29© Cloudera, Inc. All rights reserved.
Spark Streaming
29
Live ingestion
Get latest data
Build models
![Page 30: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/30.jpg)
30© Cloudera, Inc. All rights reserved.
Spark SQL
• Great fit for Dataframes• Kudu natively supports types
• Can replace Parquet storage• Kudu’s goal is to be almost as fast as Parquet• Predicate pushdown + lazy materialization• Plus makes it possible to mutate data
• A better alternative to other “NoSQL” storage engines• Aims to offer sequential consistency, 2nd only to strict consistency• Consistency can be relaxed to enable faster reading of stale data
![Page 31: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/31.jpg)
31© Cloudera, Inc. All rights reserved.
Code snippetsval kuduContext = new KuduContext(sparkContext, kuduMaster)
val kuduRdd = kuduContext.kuduRDD(tableName, ”c1").map(r => {
val row = r._2
val c1 = row.getLong(0)etc… })
someStream.kuduForeachPartition(kuduContext, (it, kuduClient, asyncClient) => {
val table = kuduClient.openTable(tableName)
etc… })
sqlContext.load("org.kududb.spark”,
Map("kudu.table" -> ”example_table”,
"kudu.master”-> kuduMaster)).registerTempTable(“example_table")
![Page 32: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/32.jpg)
32© Cloudera, Inc. All rights reserved.
Demo
32
![Page 33: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/33.jpg)
33© Cloudera, Inc. All rights reserved.
Demo
33
• Code currently at https://github.com/tmalaska/SparkOnKudu/
Gamer data points
Producer sends datapoints to Kafka
Spark pulls from Kafka Spark loads basedata from Kudu
Aggregates are storedback into Kudu
Live queries comefrom Impala
![Page 34: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/34.jpg)
34© Cloudera, Inc. All rights reserved.
Demo
![Page 35: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/35.jpg)
35© Cloudera, Inc. All rights reserved.
Getting started
35
![Page 36: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/36.jpg)
36© Cloudera, Inc. All rights reserved.
Project status
• Public Beta released September 28th 2015, version 0.5.0• Not ready for production• No security• Feedback/jiras/patches welcome
• Next release in this month (0.6.0):•Mac OSX support for single node deployment• Initial Apache Spark integration• Lots of small fixes and improvements
• GA sometime next year (hopefully!)
36
![Page 37: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/37.jpg)
37© Cloudera, Inc. All rights reserved.
Getting started as a user
• http://getkudu.io• kudu-‐[email protected]• http://getkudu-‐slack.herokuapp.com/
• Quickstart VM• Easiest way to get started• Impala and Kudu in an easy-‐to-‐install VM
• CSD and Parcels• For installation on a Cloudera Manager-‐managed cluster
37
![Page 38: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/38.jpg)
38© Cloudera, Inc. All rights reserved.
Getting started as a developer
• http://github.com/cloudera/kudu• All commits go here first
• Public gerrit: http://gerrit.cloudera.org• All code reviews happening here
• Public JIRA: http://issues.cloudera.org• Includes bugs going back to 2013. Come see our dirty laundry!
• kudu-‐[email protected]
• Apache 2.0 license open source• Contributions are welcome and encouraged!
38
![Page 39: dc area spark interactive nov 2015](https://reader034.vdocuments.us/reader034/viewer/2022042800/58a1ab191a28ab64388bc818/html5/thumbnails/39.jpg)
39© Cloudera, Inc. All rights reserved.
http://getkudu.io/@getkudu