big data hadoop analytic and data warehouse comparison guide

59
Big Data Hadoop – Hands On Workshop Data Processing Solutions – Comparison Guide Big Data Workshop Series Danairat T. Results Data Inputs Cloud 1 2 Data Inputs Results Staging Staging Staging Big DWH Data Mart Data Mart Data Mart Data Mart Staging Analy tic Resul ts Layer Cube Layer Data Mart Layer Data Warehouse Layer Data Staging Layer Data Source Layer 1 2 3 4 5 6 Core Hadoop Traditional Data Warehouse VS.

Upload: danairat-thanabodithammachari

Post on 14-Apr-2017

3.519 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop – Hands On Workshop

Data Processing Solutions – Comparison GuideBig Data Workshop Series

Danairat T.

ResultsData Inputs

Cloud

1 2

Data Inputs

Results

Staging

Staging

Staging

Big

DWH

Data

Mart

Data

Mart

Data

Mart

Data

Mart

C

u

b

e

C

u

b

e

C

u

b

e

C

u

b

e

C

u

b

e

Staging

Analy

tic

Resul

ts

Layer

Cube

Layer

Data

Mart

Layer

Data

Warehouse

Layer

Data

Staging

Layer

Data

Source

Layer

1 2 3 4 5 6

Core Hadoop Traditional Data Warehouse

VS.

Page 2: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Solution 1. Core Hadoop processing

NO data staging transformation and NO data move required!!

Analytic Results

Data Inputs

Top Benefits1. Cloud and IoT ready architecture roadmap

2. No data duplication with reduce cost of data store/storage

3. Fast data processing and all processing are built-in fault tolerant

4. Align with unify data architecture and data governance

5. Less steps of data processing comparing with traditional DWH

The Effort Investment:-1. Learn core Hadoop

Cloud Ready

1 2

Page 3: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Solution 2. Using BI Tools to analyze Hadoop data

Required single transformation to CSV raw text and store in Hadoop HDFS for BI

Tools to connect and represent the visualization

Hadoop HDFS

(CSV Raw Text)

Data Inputs

Top Benefits1. Lower cost with cloud/IoT ready architecture

2. Fast data processing and all processing are built-in fault tolerant

3. Less steps of data processing comparing with traditional DWH

The Effort Investment:-1. Learn Hadoop

2. Require transformation to CSV

RAW text for BI Tools

Cloud Ready

1 2 3

Results

Page 4: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Solution 3. Creating data warehouse in Hadoop

Required single transformation with DWH set up on Hadoop for BI Tools

Top Benefits1. Lower cost with cloud/IoT ready architecture

2. Fast data processing and all processing are built-in fault tolerant

3. Less steps of data processing comparing with traditional DWH

The Effort Investment:-1. Learn core Hadoop

2. Require transformation to CSV RAW

text for BI Tools

3. Require DWH on Hadoop set up

(Hive, Cassandra, HBase)

Hadoop HDFSData Inputs

Cloud Ready

Hadoop

DWH

Hive, (or

Cassandra,

Hbase)

1 2 3 4

Results

Page 5: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Solution 4. Implementing traditional data warehouse

Staging

Staging

Staging

The more data

grow, the

slower data

processing

Data Mart

Data Mart

Data Mart

Data Mart

Top Concerns from Traditional Data Warehouse Architecture1. A lot of data duplication lead to cost of data store/storage issue

2. Very slow of data processing and need to restart/roll back the job if any failed

3. Data security issue due to keep data too many copies and various formats

Cube

Cube

Cube

Cube

Cube

Staging

Analytic

Results

Layer

Cube

Layer

Data Mart

Layer

Data

Warehouse

Layer

Data

Staging

Layer

Data Source

Layer

1 2 3 4 5 6

Data Inputs

Results

Page 6: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Benefits Comparison Summary

Benefits

Criteria

Solutions

Cloud

Ready

Archit

ecture

Built-In

Parallel

Proces

sing

IoT

Archite

ctureRoadma

p

Without

DB cube

investm

ent

Witho

ut data

mart

invest

ment

Without

DWH

investme

nt

Without

Staging

data

(RAW

Text)

Unstruct

ured and

RAW

Source

Content Processin

g

1. Core

Hadoop

Yes Yes Yes Yes Yes Yes Yes Yes

2. Hadoop and Pentaho/Power

BI

Yes Yes Yes Yes Yes Yes No(require

CSV)

No (require

CSV)

3. Hadoop and Cognos,

RapidMiner,

BO, Cognos,

Tableau

Yes Yes Yes Yes Yes No(require

Hive

connector)

No(require

Hive

connector)

No(require

Hive

connector)

4. Traditional

Data

Warehouse

No No No No No No No No

Page 7: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Appendix

Page 8: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Pentaho supports Big Data Inputs

Page 9: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

PowerBI supports Big Data Inputs

Page 10: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Tableau supports Big Data Inputs

Page 11: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Rapid Miner supports Big Data Inputs

Page 12: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Hadoop Cluster Installation and Excel Parser Processing

Page 13: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Clone hadoop master to slave1 and slave2

master

slave1

slave2

Page 14: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master node: Edit host file

Page 15: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master node : Copy key file to slave1 and slave2

scp /home/ubuntu/.ssh/id_dsa.pub ip-172-31-1-8:/home/ubuntu/.ssh/master.pub

scp /home/ubuntu/.ssh/id_dsa.pub 172.31.15.16:/home/ubuntu/.ssh/master.pub

Page 16: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

After this slide, we will use 3 cascaded windows to represent master node, slave1

node and slave2 node

master node

slave1 node

slave2 node

Page 17: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1 and slave2: cat /home/ubuntu/.ssh/master.pub >> /home/ubuntu/.ssh/authorized_keys

Page 18: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: Test ssh to slave1 and slave 2

$ ssh ip-172-31-1-8

$ exit

$ ssh ip-172-31-15-16

$ exit

Page 19: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: add slave1 and slave2 to Hadoop slave file

Page 20: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: add slave1 and slave2 to Hadoop slave file

Page 21: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: edit hdfs-site.xml

Page 22: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: edit hdfs-site.xml for 2 replication servers

Page 23: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At all nodes: remove directories of namenode and datanode

Page 24: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: format namenode

Page 25: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: format namenode

Page 26: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: Execute start-dfs.sh

Page 27: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1: Check jps result, you will see DataNode has been started

Page 28: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave2: Check jps result, you will see DataNode has been started

Page 29: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: Execute start-yarn.sh

Page 30: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1: Check jps result, you will see NodeManager has been started

Page 31: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave2: Check jps result, you will see NodeManager has been started

Page 32: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Importing data into HDFS Cluster

Page 33: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: import data to hdfs

Page 34: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1: review imported result data from hdfs

Page 35: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave2: review imported result data from hdfs

Page 36: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Running MapReduce in Cluster Mode

Page 37: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: execute YARN mapreduce program

Page 38: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1, slave2: you will see Application Master and Yarn Child Container

Page 39: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: review output file from hdfs

Page 40: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: review output file from hdfs

Page 41: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1, slave2: review output file from hdfs by using command:-hdfs dfs -cat /outputs/wordcount_output_dir01/part-r-00000

Page 42: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: review output result data from web console

Page 43: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: review output result data from web console

Page 44: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: review output result data from web console

Page 45: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: review output result data from web console

Page 46: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Process Excel Worksheet

Page 47: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

1. Create Java Class using POI Libs

Page 48: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

2. Transversal Data in Excel Spreadsheet

Workbook workbook = new XSSFWorkbook(inputStream);

Sheet firstSheet = workbook.getSheetAt(0);

Iterator<Row> iterator = firstSheet.iterator();

while (iterator.hasNext()) {

Row nextRow = iterator.next();

Iterator<Cell> cellIterator = nextRow.cellIterator();

while (cellIterator.hasNext()) {

Cell cell = cellIterator.next();

Page 49: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

3. Extract Data from Excel Spreadsheet

switch (cell.getCellType()) {

case Cell.CELL_TYPE_STRING:

System.out.print(cell.getStringCellValue());

break;

case Cell.CELL_TYPE_BOOLEAN:

System.out.print(cell.getBooleanCellValue());

break;

case Cell.CELL_TYPE_NUMERIC:

System.out.print(cell.getNumericCellValue());

break;

}

For further integration into HDFS, please emit data to output collector.

Page 50: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

4. Close Excel Spreadsheet

workbook.close();

inputStream.close();

Page 51: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Excel Processing Results in Hadoop

Page 52: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Stopping Hadoop Cluster

Page 53: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: execute stop-yarn.sh

Page 54: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1: use jps to review NodeManager has been stopped

Page 55: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave2: use jps to review NodeManager has been stopped

Page 56: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At master: execute stop-dfs.sh

Page 57: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave1: use jps to review DataNode has been stopped

Page 58: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

At slave2: use jps to review DataNode has been stopped

Page 59: Big data Hadoop Analytic and Data warehouse comparison guide

Big Data Hadoop

Thank you very much