an introduction to apache hadoop hive

Download An introduction to Apache Hadoop Hive

If you can't read please download the document

Upload: semtech-solutions-ltd

Post on 16-Apr-2017

2.378 views

Category:

Technology


0 download

TRANSCRIPT

Apache Hadoop Hive

What is it ?

Architecture

Related Projects

Hive DDL

Hive DML

HiveQL Examples

Business Intelligence

Hadoop What is it ?

A data warehouse for Hadoop

Open source writen in Java

Holds meta data in a relational database

Allows SQL like queries

Supports big data data sets

Offers built in and user defined functions

Has indexing

Hive Architecture

Where does Hive sit in the Hadoop architecture ?

Hive Architecture

Given an existing HDFS and Hadoop cluster

Then add Hive and the meta data structure

Use Flume and Sqoop to move data

Use Hive LOAD DATA command to load from flat files

Use ODBC for connectivity to your BI layer

Hive Related Projects

Apache Flume move large data sets to Hadoop

Apache Sqoop cmd line, move rdbms data to Hadoop

Apache Hbase Non relational database

Apache Pig analyse large data sets

Apache Oozie work flow scheduler

Apache Mahout machine learning and data mining

Apache Hue Hadoop user interface

Apache Zoo Keeper configuration / build

Hive - DDL

Create table

hive> CREATE TABLE customer (age INT, address STRING);

Partitions

hive> CREATE TABLE customer (age INT, address STRING) PARTITIONED BY ( sdate STRING) ;

Show table

hive> SHOW TABLES ;

Describe table

hive> DESCRIBE customer;

Hive - DDL

Alter table

hive> ALTER TABLE customer ADD COLUMNS ( age INT) ;

Drop table

hive> DROP TABLE customer;

Hive - DML

Loading flat files into Hive

hive> LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE INTO TABLE customer;

No verification of incoming data

HiveQL Examples

HiveQL, an SQL like language

hive> SELECT a.age FROM customer a WHERE a.sdate ='2008-08-15';

selects all data from table for a partition but doesnt store it

hive> INSERT OVERWRITE DIRECTORY '/data/hdfs_file' SELECT a.* FROM customer a WHERE a.sdate='2008-08-15';

writes all of customer table to an hdfs directory

Hive Business Intelligence

Use ODBC to connect Hive to your BI layer

Now you can use BI tools like Business Objects

Create a universe over the Hive instance

Create reports against the universe

Create add hoc queries against the universe

Contact Us

Feel free to contact us at

www.semtech-solutions.co.nz

[email protected]

We offer IT project consultancy

We are happy to hear about your problems

You can just pay for those hours that you need

To solve your problems