intro to apache kudu (short) - big data application meetup

1© Cloudera, Inc. All rights reserved.

Intro to Apache Kudu (incubating) Hadoop Storage for Fast Analytics on Fast DataMike Percy, Kudu committer & PPMC [email protected] | @mike_percy

mailto:[email protected]


Agenda

• Why Kudu? Motivations and design goals• Use cases• Technical design and selected internals • Performance• How to get started


Previous storage landscape of the Hadoop ecosystem

HDFS (GFS) excels at:• Batch ingest only (eg hourly)• Efficiently scanning large amounts

of data (analytics)HBase (BigTable) excels at:

• Efficiently finding and writing individual rows

• Making data mutable

Gaps exist when these properties are needed simultaneously


• High throughput for big scansGoal: Within 2x of Parquet

• Low-latency for short accesses Goal: 1ms read/write on SSD

• Relational data model (tables, schemas)• Enable easy integration with SQL engines

(including Impala, Hive, and Drill)• Still provide “NoSQL” style scan/insert/update APIs

(for client programs written in Java, C++, Python)

Kudu design goals


Where Kudu fits in the Hadoop big data stackStorage for fast (low latency) analytics on fast (high throughput) data

• Simplifies the architecture for building analytic applications on changing data

• Optimized for fast analytic performance

• Natively integrated with the Hadoop ecosystem of components

FILESYSTEMHDFS

NoSQLHBASE

INGEST – SQOOP, FLUME, KAFKA

DATA INTEGRATION & STORAGE

SECURITY – SENTRY

RESOURCE MANAGEMENT – YARN

UNIFIED DATA SERVICES

BATCH STREAM SQL SEARCH MODEL ONLINE

DATA ENGINEERING DATA DISCOVERY & ANALYTICS DATA APPS

SPARK, HIVE, PIG

SPARK IMPALA SOLR SPARK HBASE

RELATIONALKUDU


Kudu: Scalable and fast tabular storage

• Tabular• Store tables like a “normal” database• Individual record-level access to 100+ billion row tables

• Scalable• Tested up to 275 nodes (~3PB cluster)• Designed to scale to 1000s of nodes, tens of PBs

• Fast• Millions of read/write operations per second across cluster• Multiple GB/second read throughput per node


Kudu: Strong schema, NoSQL + SQL (via integrations)

• Each table has a SQL-like schema• Finite number of columns (unlike HBase/Cassandra)• Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY,

TIMESTAMP• Some subset of columns makes up a possibly-composite primary key• Fast ALTER TABLE

• Java, Python, and C++ “NoSQL” style APIs• Insert(), Update(), Delete(), Scan()

• Integrations with MapReduce, Apache Spark, and Apache Impala (incubating)• Apache Drill work-in-progress


Kudu use cases

Kudu is best for use cases requiring a simultaneous combination ofsequential and random reads and writes

• Time series• Examples: Streaming market data, fraud detection / prevention, risk monitoring• Workload: Insert, updates, scans, lookups

• Machine data analytics• Example: Network threat detection• Workload: Inserts, scans, lookups

• Online reporting• Example: Operational data store (ODS)• Workload: Inserts, updates, scans, lookups


Xiaomi use case

• World’s 4th largest smart-phone maker (most popular in China)• Gather important RPC tracing events from mobile app and backend service. • Service monitoring & troubleshooting tool.

High write throughput• >5 Billion records/day and growing

Query latest data and quick response• Identify and resolve issues quickly

Can search for individual records• Easy for troubleshooting


Xiaomi big data analytics pipelineBefore Kudu

Large ETL pipeline delays● High data visibility latency

(from 1 hour up to 1 day)

● Data format conversion woes

Ordering issues● Log arrival (storage) not

exactly in correct order

● Must read 2 – 3 days of data to get all of the data points for a single day


Xiaomi big data analytics pipelineSimplified with Kudu

Low latency ETL pipeline● ~10s data latency

● For apps that need to avoid direct backpressure or need ETL for record enrichment

Direct zero-latency path● For apps that can tolerate

backpressure and can use the NoSQL APIs

● Apps that don’t need ETL enrichment for storage / retrieval

OLAP scanSide table lookupResult store


Xiaomi benchmark

• 6 real queries from application trace analysis application• Q1: SELECT COUNT(*)• Q2: SELECT hour, COUNT(*) WHERE module = ‘foo’ GROUP BY HOUR• Q3: SELECT hour, COUNT(DISTINCT uid) WHERE module = ‘foo’ AND app=‘bar’

GROUP BY HOUR• Q4: analytics on RPC success rate over all data for one app• Q5: same as Q4, but filter by time range• Q6: SELECT * WHERE app = … AND uid = … ORDER BY ts LIMIT 30 OFFSET 30


Xiaomi benchmark results

Q1 Q2 Q3 Q4 Q5 Q6

1.4 2.0 2.3 3.1 1.3 0.9 1.3

2.8 4.0

5.7 7.5

16.7 kuduparquet

Query latency (seconds):

• HDFS parquet file replication = 3• Kudu table replication = 3• Each query run 5 times then averaged


How it works (whirlwind tour)


Kudu tables are partitioned into “tablets” (regions)Partitioning based on primary key (PK)• Native support for

range partitioning and/or hash partitioning (salting)

• Hash example: PRIMARY KEY (tweet_id) DISTRIBUTE BY HASH(tweet_id) INTO 100 BUCKETS


Each tablet is fault tolerant via Raft consensus

• A single Tablet Server can host many tablets

• Metadata is stored on just another tablet, but only Master Server processes host that tablet

• Raft consensus:• Strong consistency• Leader election on failure• Replication factor 3 or 5

is typical


Kudu: Cluster metadata management

• Replicated master• Acts as a tablet directory• Acts as a catalog (which tables exist, etc)• Acts as a load balancer (tracks TS liveness, re-replicates under-replicated

tablets)• Caches all metadata in RAM for high performance

• Under heavy load, 99.99% response times still in microseconds (650us)• Client configured with master addresses

• Client asks master for tablet locations as needed and caches them locally


Example usage: Spark & Impala


Spark DataSource integration (work-in-progress)

sqlContext.load("org.kududb.spark", Map("kudu.table" -> “foo”, "kudu.master" -> “master.example.com”)) .registerTempTable(“mytable”)df = sqlContext.sql( “select col_a, col_b from mytable “ + “where col_c = 123”)


Impala integration

• CREATE TABLE … DISTRIBUTE BY HASH(col1) INTO 16 BUCKETS AS SELECT … FROM …

• INSERT/UPDATE/DELETE

• Optimizations like predicate pushdown, scan parallelism, plans for more on the way


Benchmarks


TPC-H (analytics benchmark)

• 75 server cluster• 12 (spinning) disks each, enough RAM to fit dataset• TPC-H Scale Factor 100 (100GB)

• Example query:• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;


• Kudu outperforms Parquet by 31% (geometric mean) for RAM-resident data


Versus other NoSQL storage• Apache Phoenix: OLTP SQL engine built on HBase• 10 node cluster (9 worker, 1 master)• TPC-H LINEITEM table only (6B rows)

Load TPCH Q1 COUNT(*)COUNT(*)WHERE…

single-rowlookup

0.01

0.1

1

10

100

1000

100002152

21976

131

0.04

1918

13.2

1.7

0.7 0.15

155

9.3

1.4 1.5 1.37

PhoenixKuduParquet

Tim

e in

sec (

log

scal

e)


Getting started with Kudu


Kudu: Project status

• First open source beta released by Cloudera in September ‘15• Latest version 0.6.0 released end of November

• Usable for many applications• Have not experienced data loss, reasonably stable (almost no crashes reported)• Still requires some expert assistance, and you’ll probably find some bugs

• Version 0.7 planned for February• Now a part of the Apache Software Foundation (ASF) Incubator


Getting started as a user

• Home page: http://getkudu.io• User mailing list: [email protected]• Chat room: http://getkudu-slack.herokuapp.com/ • Technical whitepaper: http://getkudu.io/kudu.pdf• Quickstart VM

• Easiest way to get started: Impala and Kudu in an easy-to-install VM• Yum / APT repos• CSD and Parcels

• For automated installation on a Cloudera Manager-managed cluster

http://getkudu.io/


http://getkudu-slack.herokuapp.com/

http://getkudu.io/kudu.pdf


Getting started as a developer

• Source code: http://github.com/cloudera/kudu• All commits go upstream, nothing is held back

• Public gerrit: http://gerrit.cloudera.org• All code reviews happening here

• Public JIRA: http://issues.cloudera.org• Includes bugs going back to 2013. Come see our dirty laundry!

• Developer mailing list: [email protected]

• Apache 2.0 licensed open source and an ASF incubator project.• Contributions are welcome and encouraged!

http://github.com/cloudera/kudu



http://gerrit.cloudera.org/

http://issues.cloudera.org/



Questions?http://getkudu.io

@ApacheKudu

intro to apache kudu (short) - big data application meetup

Data & Analytics