sqoop hcatalog integration venkat ranganathan sqoop meetup 10/28/13

15
SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Upload: lilian-cooter

Post on 31-Mar-2015

222 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

SQOOP HCatalog Integration

Venkat RanganathanSqoop Meetup

10/28/13

Page 2: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Agenda

• HCatalog Overview• Sqoop HCatalog integration Goals• Features• Demo• Benefits

Page 3: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

HCatalog Overview

• Table and Storage Management Service for Hadoop– Enables PIG/MR and Hive to more easily share

data on the grid

• Uses the Hive Meta-store.• Abstracts location and format of the data• Supports reading and writing files in any format

for which there is a Hive Serde available.• Now part of Hive.

Page 4: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Sqoop HCatalog Integration Goals

• Support HCatalog features consistent with Sqoop usage.– Support both imports into and exports from

HCatalog table– Enable Sqoop read and write data in various

formats.– Automatic table schema mapping– Data fidelity– Support for static and dynamic partition keys

Page 5: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Support imports and exports

• Allows the HCatalog tables to be either the source or destination of a Sqoop job.

• In an HCatalog import, target-dir and warehouse-dir are replaced with the HCatalog table name.

• Similarly for exports, the export directory is substituted with the HCatalog table name.

Page 6: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

File format support

• HCatalog integration into Sqoop now enables Sqoop to– Import/Export files of various formats that

have hive serde created– Textfiles, Sequence files, RCFiles, ORCFile,…– This makes Sqoop agnostic of the file format

used which can change over time based on new innovations/needs.

Page 7: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Automatic table schema mapping

• Sqoop allows a hive table to be created based on the enterprise data store schema

• This is enabled for HCatalog table imports as well.

• Automatic mapping with optional user overrides.

• Ability to provide a storage options for the newly created table.

• All HCatalog primitive types supported

Page 8: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Data fidelity

• With Text based imports (as in Sqoop hive-import option), the text values have to be massaged so that delimiters are not misinterpreted.

• Sqoop provides two options to handle this.--hive-delims-replacement

--hive-drop-import-delims

• Error prone and data is modified to be stored on Hive

Page 9: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Data fidelity

• With HCatalog table imports to file formats like RCFile, ORCFile etc, there is no need to strip these delimiters in column values.

• Data is preserved without any massaging• If the target Hcatalog table file format is

Text, then the two options can still be used as before.

--hive-delims-replacement

--hive-drop-import-delims

Page 10: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Support for static and dynamic partitioning

• HCatalog tables partition keys can be dynamic or static.

• Static partitioning keys have values provided as part of the DML (known at Query compile time)

• Dynamic partitioning keys have values provided at execution time.– Based on value of a column being imported

Page 11: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Support for static and dynamic partitioning

• Both types of tables supported during import.

• Multiple partition keys per table are supported.

• Only one can be a static partition key can be specified (Sqoop restriction).

• Only table with one partitioning key can be automatically created.

Page 12: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Benefits

• Future proof your Sqoop jobs by making them agnostic of file-formats used

• Remove additional steps before taking data to the target table format

• Preserve data contents

Page 13: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

Availability & Documentation

• Part of Sqoop 1.4.4 release• A chapter devoted to HCatalog integration

in the User Guide• URL:

https://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_sqoop_hcatalog_integration

Page 14: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

© Hortonworks Inc. 2013

DEMO

Page 15: SQOOP HCatalog Integration Venkat Ranganathan Sqoop Meetup 10/28/13

© Hortonworks Inc. 2013

Questions?