supercharge sqoop with magical jdbc drivers | strata+hadoop world 2015

14
© 2014 Progress Software Corporation. All rights reserved. 1 Supercharge Sqoop with magical JDBC drivers Sumit Sarkar, Principal Systems Engineer www.linkedin.com/in/meetsumit @SAsinSumit

Upload: sumit-sarkar

Post on 16-Jul-2015

446 views

Category:

Technology


8 download

TRANSCRIPT

Page 1: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.1

Supercharge Sqoop with magical JDBC drivers

Sumit Sarkar, Principal Systems Engineer

www.linkedin.com/in/meetsumit

@SAsinSumit

Page 2: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.2

Supercharge Sqoop with magical JDBC drivers

What is Sqoop?

Who is Progress DataDirect and their magical JDBC drivers?

How do you supercharge Sqoop?

Page 3: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.3

What is Sqoop?

Page 4: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.4

What is Sqoop?

Apache Sqoop

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache

Hadoop and structured datastores such as relational databases.

Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level

Apache project

http://sqoop.apache.org/

Page 5: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.5

What’s the deal with DataDirect?

Page 6: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.6

We connecting the world’s application to the world’s data using industry standards

20+ years experience

100,000+ of end users

250+ ISV’s

200 team members

Relational

Databases

SaaS

Application

Data

NoSQL,

NewSQL,

Big Data

Custom

Connectivity

EDI / XML

Transformation

Page 7: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.7

Disrupting the disruption from the Hadoop Ecoystem

Big Data ODBC drivers are for workloads too large for traditional drivers

Hive 0.8-0.13 CDH 4.0-5.1 HDP 1.3-2.1

Elastic Map Reduce

Hive 0.8-0.11 HDP 1.2-2.0

Page 8: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.9

How do you supercharge Sqoop?

Page 9: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.10

Closer look at magical JDBC with Sqoop

Unique support for SQL Server NTLM authentication from Linux

Breadth of JDBC drivers well beyond relational (NoSQL,SaaS,Cloud,NewSQL,etc)

Improve performance on LOAD and EXPORT

For EXPORT, hearing impressive performance w/ DataDirect bulk load for Sqoop

• Sample #s: 39 GB loaded in < 10 minutes (SQL Server)

For LOAD, we’ve seen 2x performance (Oracle)

For EXPORT/LOAD, single driver supports multiple versions (Oracle 9i-12c; SQL

Server 2000-2014; DB2 across z/OS; iSeries, Windows/Unix

Page 10: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.11

Magical JDBC drivers for use with Sqoop

Big Data/NoSQL

Pivotal HAWQ

MongoDB

Cassandra

Data Warehouses

Amazon Redshift

SAP Sybase IQ

Teradata

Oracle Exadata

Pivotal Greenplum

Relational

Oracle DB

Microsoft SQL Server

IBM DB2

MySQL

MemSQL

PostgreSQL

IBM Informix

SAP Sybase

Pervasive SQL

Progress OpenEdge

Progress Rollbase

SaaS/Cloud

Salesforce.com

Database.com

FinancialForce

Veeva CRM

ServiceMAX

Hubspot

Marketo

Microsoft Dynamics CRM

Microsoft SQL Azure

Oracle Eloqua

Oracle Service Cloud

Google Analytics

Page 11: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.12

Example of DataDirect JDBC bulk load for Sqoop

sqoop export --connect 'jdbc:datadirect:sqlserver://nc-

sqlserver:1433;database=test;user=test01;password=test01;EnableBulkLoad=true;Bulk

LoadBatchSize=1024;BulkLoadOptions=0' --driver

com.ddtek.jdbc.sqlserver.SQLServerDriver --table 'blah_1024MB' --export-dir

/user/hdfs/blah_1024MB/ --input-lines-terminated-by "n" --input-fields-terminated-by ',' --

batch -m 10Notes:

–batch mode is used for underlying insert statement execution.

–driver must be specified when using a Generic JDBC connector.

–connect is the JDBC URL. “EnableBulkLoad=true” authorizes the DataDirect SQL Server driver to utilize the bulk load protocol for the inserting of rows. The “BulkLoadBatchSize” value indicates to

the driver the number of rows it will attempt to bulk load on a single roundtrip to the server. If this value is less than the sqoop.export.records.per.statement value, then each call to “executeBatch”

will result in more than one round trip to the server in order to insert the batch of rows.

–table: the table to be populated in the target relational database as data is transferred from HDFS

–export-dir: identifies the HDFS directory which contains the Hadoop table to be exported.

–input-lines-terminated-by: identifies the character which separates rows in the HDFS files.

–input-fields-terminated-by: identifies the character which separates columns in the HDFS files.

-D sqoop.export.records.per.statement is not recommended nor the equivalent of JDBC batch size. Rather, it specifies the number of rows per SQL statements for data sources that support multi

row inserts such as Postgres.

Page 12: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.13

Apache Hive ODBC Driver

Page 13: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015

© 2014 Progress Software Corporation. All rights reserved.14

Progress DataDirect“We have on life outside of data connectivity”

Page 14: Supercharge Sqoop with magical JDBC drivers | Strata+Hadoop World 2015