Download - 1.Apache Hive
-
8/14/2019 1.Apache Hive
1/23
Execution
Environments forDistributedComputing
Apache Hive
EEDC 34330
Master in Computer Architecture,Networks and Systems - CANS
Homework number: 3Group number: EEDC-1
Group members:Hugo Prez [email protected]
Sergio Mendoza [email protected] Fenoy [email protected]
-
8/14/2019 1.Apache Hive
2/23
Outline
Introduction
Hive Database Data Model Query Language
Hive Arquitecture
Conclusions
-
8/14/2019 1.Apache Hive
3/23
Introduction
Origins on Facebook...
Facebook has 500.000.000 logs per day
Facebook shares a billion pieces of content daily
Facebook stores a vast amount of data
-
8/14/2019 1.Apache Hive
4/23
Introduction
What's the problem?
250 million photos per day 2.7 billion likes and comments per day 2 billion total registered users 100 billion friendships ...
TOO MUCH DATA!!
-
8/14/2019 1.Apache Hive
5/23
Introduction
What is Apache Hive?
Hive is a data warehouse infrastructure
-
8/14/2019 1.Apache Hive
6/23
Introduction
What is Apache Hive?
Hive is a data warehouse infrastructure
and what is a Data Warehouse (DW)?
a DW is a database for reporting and analysis
-
8/14/2019 1.Apache Hive
7/23
Introduction
How does Apache Hive works?
Hive is built on top of Hadoop
Hive stores data in the HDFS
Hive compile SQL queries as MapReducejobsand run the jobs in the cluster
http://en.wikipedia.org/wiki/MapReduce -
8/14/2019 1.Apache Hive
8/23
Introduction
How does Apache Hive works?
HiveQL query
-
8/14/2019 1.Apache Hive
9/23
Introduction
How does a simple web app works?
MySQL query
-
8/14/2019 1.Apache Hive
10/23
Outline
Introduction
Hive Database Data Model Query Language
Hive Arquitecture
Conclusions
-
8/14/2019 1.Apache Hive
11/23
-
8/14/2019 1.Apache Hive
12/23
Hive defines a simple SQL-like query language,called QL
- Supports DDL and DML.
- Users can embed custom map-reduce scripts
- Supports UDF, UDAF and UDTF.
HiveQL
-
8/14/2019 1.Apache Hive
13/23
REDUCE subq2.school, subq2.meme, subq2.cnt
USING top10.pyAS (school,meme,cnt)FROM (SELECT subq1.school, subq1.meme, COUNT(1)
AS cnt FROM (MAP b.school, a.statusUSING meme-extractor.pyAS (school,meme)
FROM status_updates a JOIN profiles b ON (a.userid = b.userid) )subq1GROUP BY subq1.school, subq1.memeDISTRIBUTE BY school, memeSORT BY school, meme, cnt desc
) subq2;
HiveQL Extract
-
8/14/2019 1.Apache Hive
14/23
Outline
Introduction
Hive Database Data Model Query Language
Hive Arquitecture
Conclusions
-
8/14/2019 1.Apache Hive
15/23
Architecture
-
8/14/2019 1.Apache Hive
16/23
Architecture
External Interfaces- provides both user interfaces likecommand line (CLI) and web UI, and applicationprogramming interfaces (API) like JDBC and ODBC
Thrift Serverexposes a very simple client API toexecute HiveQL statements
Metastoreis the system catalog. All other componentsof Hive interact with the metastore.
-
8/14/2019 1.Apache Hive
17/23
Architecture
Drivermanages the life cycle of a HiveQL statementduring compilation, optimization and execution
Compilertranslates statements into a plan which
consists of a DAG of map-reduce jobs
The driver submits the individual map-reduce jobsfrom the DAG to the Execution Enginein atopological order
-
8/14/2019 1.Apache Hive
18/23
Metastore
The metastore is the system catalog which containsmetadata about the tables stored in Hive.
Database- is a namespace for tables. Table- Metadata for table contains list of columns
and their types, owner, storage and SerDe information Partition- Each partition can have its own columns and
SerDe and storage information
-
8/14/2019 1.Apache Hive
19/23
Query Compiler
Parsertransforms a query string to a parsetree representation.
Semantic Analyzertransforms the parse tree to a block-based internal query representation.
Logical Plan Generatorconverts the internalquery representation to a logical plan, which consists of atree of logical operators
Optimizerperforms multiple passes over the logical planand rewrites it in several ways
Physical Plan Generator converts the logical plan into aphysical plan, consisting of a DAG of map-reduce jobs
-
8/14/2019 1.Apache Hive
20/23
Outline
Introduction
Hive Database Data Model Query Language
Hive Arquitecture
Conclusions
-
8/14/2019 1.Apache Hive
21/23
-
8/14/2019 1.Apache Hive
22/23
-
8/14/2019 1.Apache Hive
23/23
Links:
http://i.stanford.edu/~ragho/hive-icde2010.pdfhttp://www.vldb.org/pvldb/2/vldb09-938.pdfhttp://hive.apache.org/https://cwiki.apache.org/Hive/languagemanual-
transform.htmlhttp://biggdata.blogspot.com/2011/04/refreshing-trendingtopics-website-data.htmlhttp://code.google.com/p/hive-
mrc/wiki/AboutHiveCore