big data camp la 2014 - apache tajo: a big data warehouse system on hadoop

Apache Tajo: A Big Data Warehouse Syste-mon HadoopHyunsik Choi

Director of Research, Gruter

Big Data Camp LA 2014

Talk Outline

• Introduction to Apache Tajo

• What you can do with Tajo

• Why you should use Tajo

• Current Status of Tajo Project

• Demonstration

About Me

• Hyunsik Choi (pronounced “Hyeon-shick Cheh”)• PhD (Computer Science & Engineering, 2013), Korea Univ.• Director of Research, Gruter Corp

• Open-source Involvement– Full-time contributor to Apache Tajo (2013.6 ~ )– Apache Tajo PMC member and committer (2013.3 ~ )– Apache Giraph PMC member and committer (2011. 8 ~ )

• Contact Info– Email: [email protected]– Linkedin: http://linkedin.com/in/hyunsikchoi/

mailto:[email protected]

http://linkedin.com/in/hyunsikchoi/

http://linkedin.com/in/hyunsikchoi/

Apache Tajo

• Open-source “SQL-on-H” “Big DW” system

• Apache Top-level project since March 2014

• Supports SQL standards

• Low latency, long running batch queries

• Features– Supports Joins (inner and all outer), Groupby, and Sort– Window function– Most SQL data types supported (except for Decimal)

• Recent 0.8.0 release– https://blogs.apache.org/tajo/entry/apache_tajo_0_8_0

Overall Architecture

What You Can Do with Tajo

• Batch queries– Long-running queries (~ hours)

• Dynamic Scheduling• Fault Tolerance

– ETL workloads

• Interactive Ad-hoc Queries– Very low-latency (100 ms ~)– Few seconds on several TB dataset if you cluster

capability is enough

Why You Should Use Tajo

• SQL Standards– Non standard features – PgSQL and Oracle

• Simple Installation and Operation– http://tajo.apache.org/docs/0.8.0/getting_started.html

• Simple Software Stack Requirement– No MapReduce and No Tez– Yarn support but not mandatory– Tajo + Linux system for single node cluster– Tajo + HDFS for a distributed cluster


• Mature SQL Feature Set– Fully distributed query executions

• Inner join, and left/right/full outer join• Groupby, sort, multiple distinct aggregation, window function

– SQL data types • CHAR, BOOL, INT, BIGINT, REAL, DOUBLE, and TEXT• TIMESTAMP, DATE, TIME, and INTERVAL• DECIMAL (working)

– Various file formats• Text file (CSV), RCFile, Parquet (flat schema), and

Avro (flat schema)


• Fully community-driven open source

• Stable development team– 5 fulltime contributors + many contributors

• Performance and speed– Faster than Hive 0.10 (1.5 – 10 times)– Tajo v.s. Hive 0.13 ?– Tajo v.s. Impala ?


• Integration with Hadoop Ecosystem– Hadoop 2.2.0 – 2.4.0 support– Be able to connect to Hive Metastore– Directly process tables managed by Hive– Yarn support (backport)

• Enable Tajo to deploy and run on Yarn cluster• Allow users to add/remove cluster nodes to/from Tajo

cluster in runtime• Contributed by Min Zhou (committer), Linkedin Engineer• https://github.com/coderplay/tajo-yarn

Current Status – Overall

• Under beta stage – majority of key features are getting ready

• Most of SQL features implemented

• Working on hundreds of clusters for production– Collaboration with the biggest telco in S. Korea

• We’ve just started works on low-level optimization.– Runtime byte code generation (v0.9)– Unsafe-based hash table for hash aggregation/join– Vectorized execution engine

Current Status – Logical Plan Optimizer

• Basic Rewrite Rule– Common sub expression elimination– Constant folding (CF), and Null propagation

• Projection Push Down (PPD)– push expressions to operators lower as possible– narrow read columns– remove duplicated expressions

• if some expressions has common expression

• Filter Push Down (FPD)– reduce rows to be processed earlier as possible

• Extensible Rewrite Rule– Allow developers to write their own rewrite rules


SELECT item_id, order_id sum_price * (1.2 * 0.3) as total, FROM ( SELECT item_id, order_id, sum(price) as sum_price FROM ITEMS GROUP BY item_id, order_id) a WHERE item_id = 17234

SELECT item_id, order_id, sum(price) * (3.6)FROM ITEMSGROUP BY item_id, order_idWHERE item_id = 17234

Original Rewritten

CF + PPD

FPD


• Cost-based Join Order (since v0.2)– Don’t need to guess right join orders anymore– Greedy heuristic algorithm

• Resulting in a bushy join tree instead of left-deep join tree

Left-deep Join Tree Bush Join Tree

Current Status – Window Function

• OVER clause– row_number() and rank()– Aggregation function support– PARTITION and ORDER BY clause

SELECT depname, empno, salary, enroll_date FROM ( SELECT depname, empno, salary, enroll_date, rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos FROM empsalary ) AS ss WHERE pos < 3;

Current Status – Join

• Join– NATURAL, INNER, OUTER (LEFT, RIGHT, FULL)– SEMI, ANTI Join (planned for v0.9)

• Join Predicates– WHERE and ON predicates– de-factor standard outer join behavior with both

predicates

SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';

SELECT * FROM t1 LEFT JOIN t2 WHERE t1.num = t2.num and t2.value = ‘xxx’;

Current Status – Table Partitions

• Column Value Partition– Hive Compatible Partition

• Range Partition (planned for 1.0)– Table will be partitioned by disjoint ranges.– Will remove the partition granularity problem of

Hive Partition

CREATE TABLE T1 (C1 INT, C2 TEXT) using PARQUET WITH (‘parquet.compression’ = ‘SNAPPY’) PARTITION BY COLUMN (C3 INT, C4 TEXT);

Future Works

• Multi-tenant Scheduler (v0.9)– Support multiple users and multiple queries

• Runtime byte code generation for expressions (v0.9)– Eliminate interpret overhead of expression evaluation

• Authentication and SQL Standard Access Control

• JIT-based Vectorized Processing Engine– Refer to Hadoop Summit 2014 Slide

(http://goo.gl/jWghhp)

Get Involved!

• We are recruiting contributors!

• General– http://tajo.apache.org

• Getting Started– http://tajo.apache.org/docs/0.8.0/getting_started.html

• Downloads– http://tajo.apache.org/docs/0.8.0/getting_started/downloading_source.html

• Jira – Issue Tracker– https://issues.apache.org/jira/browse/TAJO

• Join the mailing list– [email protected]– [email protected]

http://tajo.apache.org/

http://tajo.apache.org/docs/0.8.0/getting_started.html

http://tajo.apache.org/docs/0.8.0/getting_started.html

http://tajo.apache.org/docs/0.8.0/getting_started/downloading_source.html

http://tajo.apache.org/docs/0.8.0/getting_started/downloading_source.html

https://issues.apache.org/jira/browse/TAJO

https://issues.apache.org/jira/browse/TAJO




Demonstration

big data camp la 2014 - apache tajo: a big data warehouse system on hadoop

Software

subscribetajo

flat schema

items group

num t2

apache tajo

tajo

http

partition