exploring bigdata with google bigquery

28
Dharmesh Vaya @DRVaya http://drvaya.wordpress.com/

Upload: dharmesh-vaya

Post on 12-Jul-2015

699 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Exploring BigData with Google BigQuery

Dharmesh Vaya @DRVaya

http://drvaya.wordpress.com/

Page 2: Exploring BigData with Google BigQuery

Agenda● What is Big Data ?

● Available Big Data Solutions & Issues

● Why Google BigQuery ?

● Inside BigQuery

● Features & Components

● RESTful API

● Development with BigQuery (Live Demo)○ Query History, Projects, DataSets, Public Datasets, Table Details, Writing

Queries, Save Results.

○ Integration with Applications.

● BigQuery Tools

● Big Data Solution with BigQuery & Google Cloud Platform

● Pricing Model

● Any questions ?

Page 3: Exploring BigData with Google BigQuery

What is Big Data ?

Is it a Data Type ? No

Its a buzzword - massive volume of structured and/or unstructured data.

It is so large that it is difficult to process/analyze using traditional databases.

Page 4: Exploring BigData with Google BigQuery

What is Big Data ?

Data that has following attributes can be ‘Big Data’

Page 5: Exploring BigData with Google BigQuery

So how Big is B - I - G ?

Page 6: Exploring BigData with Google BigQuery

So how Big is B - I - G ?

Library of Congress - Textual Data

20 Terabytes

(20 000 000 000 000 bytes)

Page 7: Exploring BigData with Google BigQuery

So how Big is B - I - G ?

Amazon.com - Inventory &Customer Data

42 Terabytes

(42 000 000 000 000 bytes)

Page 8: Exploring BigData with Google BigQuery

So how Big is B - I - G ?

YouTube.com - Media Data

100+ Terabytes

(100 000 000 000 000 bytes)

Page 9: Exploring BigData with Google BigQuery

So how Big is B - I - G ?

Google.com - Search, Mail, Media & anything you can think of !!

850+ Terabytes

(850 000 000 000 000 bytes)(Speculated Figures)

Page 10: Exploring BigData with Google BigQuery

So how Big is B - I - G ?

World Data Center for Climate - Meteorology Data

6.2 Petabytes

(7 000 000 000 000 000 bytes)

Page 11: Exploring BigData with Google BigQuery

Available Big Data Solutions & Issues

- Highly Scalable and Distributed Computing.- Storage (HDFS) optimized for high throughput

- Security, disabled by default- MapReduce is batch based, hence no real time operations.- Costly to maintain.

- Highly Scalable, talks of handling Petabytes- Elastic set of resources to return result sets - Almost 10x fast as compared to Hadoop.

- High costs of Data Migration and integration- Operations/Maintenance cost may shoot up

Page 12: Exploring BigData with Google BigQuery

Why Google BigQuery ?

Hadoop (with Hive)

AmazonRedshift

Google BigQuery

= 1.4 TB

On an average its within 8-10 seconds !!

Page 13: Exploring BigData with Google BigQuery

Inside Google BigQuery

● BigQuery is based on Dremel, a technology pioneered by Google & extensively used within.

● It used Columnar storage & multi-level execution trees to achieve interactive performance for queries against multi-terabyte datasets.

● BigQuery's performance advantage comes from its parallel processing architecture.

● The query is processed by thousands of servers in a multi-level execution tree structure, with the final results aggregated at the root. BigQuery stores the data in a columnar format so that only data from the columns being queried are real.

● All this & more is now available as a publicly available service for any business or developer to use. This release made it possible for those outside of Google to utilize the power of Dremel for their Big Data processing requirements.

Page 14: Exploring BigData with Google BigQuery

Columnar Storage & Trees

Page 15: Exploring BigData with Google BigQuery

Inside Google BigQuery

There’s a difference

● Dremel is designed as an interactive data analysis tool for large datasets.

● MapReduce is designed as a programming framework to batch process large datasets

Hey you mentioned Dremel,

isn’t Map Reduce based on it ?

Page 16: Exploring BigData with Google BigQuery

Features & Components

Features:● Web GUI for BigQuery● Affordable● Run in Background● Easy Data Importation● Flexible (Addition of Columns, Native Support For Timestamp Type

Of Data)● REST API Support● More than just Standard SQL

Components:● Project● Tables● DataSets● Jobs

Page 17: Exploring BigData with Google BigQuery

RESTful APIMethod HTTP Request

delete DELETE /projects/projectId/datasets/datasetId

get GET /projects/projectId/datasets/datasetId

insert POST /projects/projectId/datasets

list GET /projects/projectId/datasets

patch PATCH /projects/projectId/datasets/datasetId

update PUT /projects/projectId/datasets/datasetId

For Datasets

Page 18: Exploring BigData with Google BigQuery

RESTful API

Method HTTP Request

delete GET /projects/projectId/jobs/jobId

getQueryResults

GET /projects/projectId/queries/jobId

insert POST

https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobsandPOST /projects/projectId/jobs

list GET /projects/projectId/jobs

query POST /projects/projectId/queries

For Jobs

Similar methods for -

● Projects● Tables● TableData

Page 19: Exploring BigData with Google BigQuery

Demo using Web Interface

Page 20: Exploring BigData with Google BigQuery

Demo : Excel Connector

+

Page 21: Exploring BigData with Google BigQuery

BigQuery ToolsBigQuery Excel Connector bq Command LineBigQuery Browser Tool

Virtualization & BI Tools

ETL Tools

ODBC Connector

Page 22: Exploring BigData with Google BigQuery

Big Data Solution with BigQuery

Page 23: Exploring BigData with Google BigQuery

Big Data Solution with BigQuery

Data Pipeline - transforming and loading data into BigQuery

The process of using the Google Cloud Platform to upload data into BigQuery involves

uploading the CSV files or Javascript Object Notation (JSON) files to Google Cloud Storage before

loading the data into BigQuery. Alternatively, REST API can also be used to provide programmatic

integration into the current computing environment.

Data Visualization - performing data analysis on BigQuery and visualizing the results

A custom, web-based dashboard can be built on Google App Engine using the BigQuery REST

API to execute the queries and using Google Chart Tools to visualize the results

Page 24: Exploring BigData with Google BigQuery

Pricing Model

Action Example

Loading Data Loading files/data into BigQuery

Exporting Data Exporting data, Saving Results from BigQuery

Table Reads Browsing through data

Table Copies Copy existing table to new table

Storage Action Cost

Storage $0.020 per GB, per month.

Streaming Inserts Free until January 1, 2015. After January 1, 2015, $0.01 per 100,000 rows

Query Pricing Cost

On-demand $5 per TB

Reserved Capacity

5GB per second$20k/ month

Wow that’s like 800MB for 1 Rupee, even Internet ain’t that cheap here.

Page 25: Exploring BigData with Google BigQuery

Where to use ?

● Not a replacement to traditional systems, but it compliments the eco-system !!

● Major strength is Handling Large DataSets

● Major usage in Data Analytics

● Important component of Google Cloud Platform

● People are interested in numbers/data and that too quick….

Google BigQuery is the future of Analytics!!

Page 26: Exploring BigData with Google BigQuery

Any questions ?

What we covered ...

✓ What is Big Data ?✓ Available Big Data Solutions & Issues✓ Why Google BigQuery ?✓ Features, Components & Tools✓ RESTful API✓ Demo using Web Interface✓ Big Query Tools✓ Big Data Solution with BigQuery✓ Pricing Model✓ Usage

Page 27: Exploring BigData with Google BigQuery

https://bigquery.cloud.google.comNo registration, just sign-in with your Google account

Follow Dharmesh Vaya on @DRVaya

or subscribe to my http://drvaya.wordpress.com/

You can also add me on +DharmeshVaya

About the presenter