hadoop conf 2014 - hadoop bigquery connector

55
Hadoop BigQuery Connector Simon Su & Sunny Hu @ MiCloud

Upload: simon-su

Post on 21-Nov-2014

660 views

Category:

Technology


2 download

DESCRIPTION

Hadoop Conference Taiwan 2014 Presentation.

TRANSCRIPT

Page 1: Hadoop Conf 2014 - Hadoop BigQuery Connector

Hadoop BigQuery ConnectorSimon Su & Sunny Hu @ MiCloud

Page 2: Hadoop Conf 2014 - Hadoop BigQuery Connector

I am Simon Su

var simon = {};simon.aboutme = 'http://about.me/peihsinsu';simon.nodejs = ‘http://opennodes.arecord.us';simon.googleshare = 'http://gappsnews.blogspot.tw'simon.nodejsblog = ‘http://nodejs-in-example.blogspot.tw';simon.blog = ‘http://peihsinsu.blogspot.com';simon.slideshare = ‘http://slideshare.net/peihsinsu/';simon.email = ‘[email protected]’;simon.say(‘Good luck to everybody!');

Page 3: Hadoop Conf 2014 - Hadoop BigQuery Connector

I am Sunny Hu

var sunny = {};

sunny.aboutme = 'https://plus.google.com/u/0/+sunnyHU/posts';

sunny.email = [email protected]’;

sunny.language =[‘Java’,’.NET’,’NodeJS’,’SQL’ ]

sunny.skill = [ ‘Project management’,’System Analysis’,

’System design’,’Car ho lan’]

sunny.say(‘寫code太苦悶,心情要sunny');

Page 4: Hadoop Conf 2014 - Hadoop BigQuery Connector

● We are 蘇 胡 二人組 ...

Page 5: Hadoop Conf 2014 - Hadoop BigQuery Connector

● 2011/11 MiCloud Launch

● 2013/2 Google Apps Partner

● 2013/9 Google Cloud Partner

● 2014/4 Google Cloud Launch

We are MiCloud

Page 6: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 7: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 8: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 9: Hadoop Conf 2014 - Hadoop BigQuery Connector

緣起

● Dremel (BigQuery) 能提供大量及穩定服務● 2013, 平均每日服務量: 5,922,000,000 人次● 2012, 平均每日服務量: 5,134,000,000 人次

● 2011, 平均每日服務量: 4,717,000,000 人次

● 2010, 平均每日服務量: 3,627,000,000 人次

● 2009, 平均每日服務量: 2,610,000,000 人次

● 2008, 平均每日服務量: 1,745,000,000 人次

Page 10: Hadoop Conf 2014 - Hadoop BigQuery Connector

What is the components of Hadoop...

HDFS

MapReduce

Strategy

Persistence storage for parallel access, better with good performance...

Mass computing power to parallel load and process the requirements

Your idea for filtering information from the given datasets

Page 11: Hadoop Conf 2014 - Hadoop BigQuery Connector

You have better choice in Cloud...

HDFS

MapReduce

Strategy

Object storage services, like: Google Cloud Storage, AWS S3...

Cloud machines with unlimited resources, better with lower and scalable pricing...

Nothing can replace a good idea…, but fast...

Page 12: Hadoop Conf 2014 - Hadoop BigQuery Connector

● The fast way run hadoop - docker

Page 13: Hadoop Conf 2014 - Hadoop BigQuery Connector

Google Provide Resources

Page 14: Hadoop Conf 2014 - Hadoop BigQuery Connector

● GCE Hadoop Utility

Page 15: Hadoop Conf 2014 - Hadoop BigQuery Connector

● GCE Cluster Tool - bdutil

Page 16: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 17: Hadoop Conf 2014 - Hadoop BigQuery Connector

Before Demo… Prepare

1. Install google_cloud_sdk2. Install bdutil

Page 18: Hadoop Conf 2014 - Hadoop BigQuery Connector

google cloud sdkcurl https://sdk.cloud.google.com | bash

Page 19: Hadoop Conf 2014 - Hadoop BigQuery Connector

● Auth the gcloud utility

Page 20: Hadoop Conf 2014 - Hadoop BigQuery Connector

● Setup default project

● Test configuration….

Page 21: Hadoop Conf 2014 - Hadoop BigQuery Connector

Using bdutil...https://developers.google.com/hadoop/setting-up-a-hadoop-cluster

Page 22: Hadoop Conf 2014 - Hadoop BigQuery Connector

bdutil scopes

● Design for fast create hadoop cluster● Quick run a hadoop task● Quick integrate google’s resources● Quick clear finished resources

Page 23: Hadoop Conf 2014 - Hadoop BigQuery Connector

Demo start first….

Page 24: Hadoop Conf 2014 - Hadoop BigQuery Connector

● Config your bdutil env.

Page 25: Hadoop Conf 2014 - Hadoop BigQuery Connector

● bdutil deploy -e bigquery_env.sh

Page 26: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 27: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 28: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 29: Hadoop Conf 2014 - Hadoop BigQuery Connector

● Checking the result...

Page 30: Hadoop Conf 2014 - Hadoop BigQuery Connector

● The Administration console

Page 31: Hadoop Conf 2014 - Hadoop BigQuery Connector

TeraSorthttps://www.mapr.com/fr/company/press/mapr-and-google-compute-engine-set-new-world-record-hadoop-terasort

Page 32: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 33: Hadoop Conf 2014 - Hadoop BigQuery Connector

You can win the game, too...

…. (skip)

Page 34: Hadoop Conf 2014 - Hadoop BigQuery Connector

BigQuery Connectorhttps://developers.google.com/hadoop/running-with-bigquery-connector

Page 35: Hadoop Conf 2014 - Hadoop BigQuery Connector

hadoop-mhadoop-w-0 hadoop-w-1

Page 36: Hadoop Conf 2014 - Hadoop BigQuery Connector

Demo start first….

Page 37: Hadoop Conf 2014 - Hadoop BigQuery Connector

Run a BigQuery Connector job...

Page 38: Hadoop Conf 2014 - Hadoop BigQuery Connector

Workflow...

1. Dump sample data from [publicdata:samples.shakespeare]2. MapReduce to count the word display 3. Update result to BigQuery specific table

Page 39: Hadoop Conf 2014 - Hadoop BigQuery Connector

Look into source code...

● BigQueryInputFormat class● Input parameters● Mapper● BigQueryOutputFormat class● Output parameters● Reducer

Page 40: Hadoop Conf 2014 - Hadoop BigQuery Connector

BigQueryInputFormat

● Using a user-specified query to select the appropriate BigQuery objects.

● Splitting the results of the query evenly among the Hadoop nodes.

● Parsing the splits into java objects to pass to the mapper

Page 41: Hadoop Conf 2014 - Hadoop BigQuery Connector

Input parameters

● Project Id : GCP project id , eg. hadoop-conf-2014● Input Table Id :[optional projectId]:[datasetId].[table id]

Page 42: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 43: Hadoop Conf 2014 - Hadoop BigQuery Connector

BigqueryOutputFormat Class

● Provides Hadoop with the ability to write JsonObject values directly into a BigQuery table

● An extension of the Hadoop OutputFormat class

Page 44: Hadoop Conf 2014 - Hadoop BigQuery Connector

Output parameters

● Project Id : GCP project id ,eg. hadoop-conf-2014● Output Table Id :[optional projectId]:[datasetId].[table id]● Output Table Schema :[{'name': 'Name','type': 'STRING'},

{'name': 'Number','type': 'INTEGER'}]

Page 45: Hadoop Conf 2014 - Hadoop BigQuery Connector
Page 46: Hadoop Conf 2014 - Hadoop BigQuery Connector

bdutil house keeping...https://developers.google.com/hadoop/setting-up-a-hadoop-cluster

Page 47: Hadoop Conf 2014 - Hadoop BigQuery Connector

Delete the hadoop cluster● Game over - Delete the hadoop cluster

Page 48: Hadoop Conf 2014 - Hadoop BigQuery Connector

● Check project….

Page 49: Hadoop Conf 2014 - Hadoop BigQuery Connector

You cost in this lab...

VM (n1-standard-1) machines hours*

* *

*

$0.070 USD/Hour 24 1

Page 50: Hadoop Conf 2014 - Hadoop BigQuery Connector

Today’s Demo

Using Docker...

Page 51: Hadoop Conf 2014 - Hadoop BigQuery Connector

● Using google optimized docker container

localhost:~$ gcloud compute instances create simon-docker \

> --image https://www.googleapis.com/compute/v1/projects/google-containers/global/images/container-vm-v20140522\

> --zone asia-east1-a\

> --machine-type f1-micro

localhost:~$ gcloud compute ssh simon-docker

simonsu@simon-docker:~$ sudo docker search bdutil

simonsu@simon-docker:~$ docker run -it peihsinsu/bdutil bash

Page 53: Hadoop Conf 2014 - Hadoop BigQuery Connector

http://goo.gl/PbHdDc

Page 54: Hadoop Conf 2014 - Hadoop BigQuery Connector

http://micloud.tw

Page 55: Hadoop Conf 2014 - Hadoop BigQuery Connector

http://jsdc-tw.kktix.cc/events/jsdc2014