big data sql support in apache apex / hadoop

15
SQL on Apache Apex Chinmay Kolhatkar ([email protected]) January 24 th 2017

Upload: datatorrent

Post on 13-Apr-2017

423 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Big Data SQL Support in Apache Apex / Hadoop

SQL on Apache Apex

Chinmay Kolhatkar ([email protected])January 24th 2017

Page 2: Big Data SQL Support in Apache Apex / Hadoop

Apache Apex - Stream ProcessingEasily Operable - Exposes an easy API for developing Operators (part of an

application) and Applications

Highly Scalable - Scales statically as well as dynamically

Highly Performant - Can reach single digit millisecond end-to-end latency

Fault Tolerant - Automatically recovers from failures - without manual intervention

Stateful - Guarantees that no state will be lost

Apex Malhar library

YARN - Native - Uses Hadoop YARN framework for resource negotiation

Page 3: Big Data SQL Support in Apache Apex / Hadoop

Apex Platform Overview

3

Page 4: Big Data SQL Support in Apache Apex / Hadoop

An Apex Application is a DAG(Directed Acyclic Graph)

A DAG is composed of vertices (Operators) and edges (Streams).A Stream is a sequence of data tuples which connects operators at end-points called PortsAn Operator takes one or more input streams, performs computations & emits one or more output streams

● Each operator is USER’s business logic, or built-in operator from our open source library● Operator may have multiple instances that run in parallel

Page 5: Big Data SQL Support in Apache Apex / Hadoop

Typical application example

Page 6: Big Data SQL Support in Apache Apex / Hadoop

Brief about SQL1969 - CODASYL (network database)1979 - First commercial SQL RDBMSs1990 - Acceptance - transaction processing on SQL1993 - Multi-dimensional databases1996 - SQL EDWs2006 - Hadoop and other “big data” technologies2008 - NoSQL2011 - SQL on Hadoop2014 - Interactive analytics on {Hadoop, NoSQL, RDBMS} using SQL

SQL remains popular.Why?

“SQL on everything, in memory” by Julian Hyde, Strata NYC, Oct 16 2014

Page 7: Big Data SQL Support in Apache Apex / Hadoop

Brief about CalciteTraditional Architecture

“SQL on everything, in memory” by Julian Hyde, Strata NYC, Oct 16 2014

Page 8: Big Data SQL Support in Apache Apex / Hadoop

Brief about CalciteCalcite Architecture

“SQL on everything, in memory” by Julian Hyde, Strata NYC, Oct 16 2014

Page 9: Big Data SQL Support in Apache Apex / Hadoop

Brief about CalciteExpression Tree

“SQL on everything, in memory” by Julian Hyde, Strata NYC, Oct 16 2014

Page 10: Big Data SQL Support in Apache Apex / Hadoop

Brief about CalciteExpression Tree (Optimized)

“SQL on everything, in memory” by Julian Hyde, Strata NYC, Oct 16 2014

Page 11: Big Data SQL Support in Apache Apex / Hadoop

Apex-Calcite APIKafk

aInpu

t

CSVParser

FilterCSV

Formattter

FilteredWordsLines

Kafka File

Project

Projected LineWriter

Formatted

SQLExecEnvironment.getEnvironment() .registerTable("ORDERS", new KafkaEndpoint(conf.get("broker"), conf.get("topic"), new CSVMessageFormat(conf.get("schemaInDef")))) .registerTable("SALES", new FileEndpoint(conf.get("destFolder"), conf.get("destFileName"), new CSVMessageFormat(conf.get("schemaOutDef")))) .registerFunction("APEXCONCAT", this.getClass(), "apex_concat_str") .executeSQL(dag, "INSERT INTO SALES " + "SELECT STREAM ROWTIME, FLOOR(ROWTIME TO DAY), APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7) " + "FROM ORDERS WHERE ID > 3 AND PRODUCT LIKE 'paint%'");

Page 12: Big Data SQL Support in Apache Apex / Hadoop

Demo

Page 13: Big Data SQL Support in Apache Apex / Hadoop

Resources

13

• Apache Apex - http://apex.apache.org/• Subscribe to forums

ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users

• Download - https://datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product

ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub

Page 14: Big Data SQL Support in Apache Apex / Hadoop

We Are Hiring

14

[email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders

Page 15: Big Data SQL Support in Apache Apex / Hadoop

15