google bigquery

24
Google BigQuery

Upload: matthias-feys

Post on 07-Jan-2017

1.979 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Google BigQuery

Google BigQuery

Page 2: Google BigQuery

About myselfMatthias Feys

work @Datatonic: - big data (with Google Cloud) - machine learning - data visualizations (Tableau/Spotfire)

Google Qualified Cloud Developer

contact:

- @FsMatt- [email protected]

Page 3: Google BigQuery

About DatatonicDatatonic is a team of data science experts that help corporations unleash the power of data. They use Google Cloud Platform, data visualisation technologies (like Tableau or Spotfire) and machine learning to build breakthrough solutions. Either as expert advisors, or built as a fully managed solution with support end-to-end (A3S).

Some references:

@teamdatatonic

Page 4: Google BigQuery

● What is BigQuery?● How does it scale/work?● How does it compare to:

- NoSQL datastores- MapReduce

● Demo● Pricing Model● Best Practices

This talk

Page 5: Google BigQuery

What is BigQuery?

“BigQuery is a fully-managed and cloud-based interactive query service for massive datasets.”

It’s the externalization of Dremel, one of

Google’s core technologies

Page 6: Google BigQuery

What is BigQuery? (2)BigQuery Service is available via:

● Web UI (bigquery.cloud.google.com)● console (gcloud)● API (+ client libraries)● external tools (Tableau, Excel, …)● ODBC connector

Page 7: Google BigQuery

How Does it Scale?

Page 8: Google BigQuery

Dremel Architecture

Data Model/Storage:- Columnar Storage- Nested/Repeated Fields- No Index! -> Single Full Table Scan (from disk)

Query Execution:- Tree Architecture- Using tens of thousands machines over fast Google network (+1Petabit/s)

Page 9: Google BigQuery

Columnar Storage

● Traffic minimization:○ only read selected

columns

● Higher Compression Ratio:○ Similar values in the

same column○ From 1:3 → 1:10

Page 10: Google BigQuery

Tree Architecture- root server: ->receives query + reads table metadata ->rewrites the query(s) ->sends queries to the next level <-returns final query results

- intermediate servers: ->(similar steps) <-parallel partial aggregation

- leaf servers: ->actually scan (parts) of the table <-send data to intermediate servers

Page 11: Google BigQuery

NoSQL Datastore vs. BigQuery?

NoSQL Datastore

● Index based (expected queries)

● Read-write

BigQuery

● Non-index based (ad hoc queries)

● Read-only (append-only)

Page 12: Google BigQuery

MapReduce vs. BigQuery?

MapReduce

● High latency● Flexible (complex)

batch processing● Unstructured data

BigQuery

● Low latency ● SQL-like queries

● Structured data

Page 13: Google BigQuery

Demo’s

Page 15: Google BigQuery

Pricing Model

Category Price Note

Storage Cost $0.020 per GB, per month

Query Cost $5 per TB 1st TB per month is free

Page 16: Google BigQuery

Best Practices

Page 17: Google BigQuery

Denormalize / Pre-Join Where Possible● Best performance● Only pay for the columns you need● Nested/repeated fields!

Relational Database Design Denormalized Nested/Repeated (JSON)

Page 18: Google BigQuery

Table Sharding● You pay for what you read

→ Read less, pay less● Table wildcards allow for easy reading over multiple tables

https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions

Page 19: Google BigQuery

Optimize for Query vs. Storage CostsCommon Queries?

- Materialized views(save intermediate results in tables)with pre-aggregated data:

→ faster + cheaper queries

- Store data in multiple tables:- table for daily data- table for weekly data- table for monthly data

Page 20: Google BigQuery

Narrow the Table ScansYou only pay for the columns you read

Don’t use “SELECT *” !!!

Page 21: Google BigQuery

Table DecoratorsOnly way to avoid doing full table scans!

Allows undeleting tables

options:

● snapshot decorator + range decorator● relative value + absolute values

https://cloud.google.com/bigquery/table-decorators

https://cloud.google.com/bigquery/table-decorators

Page 22: Google BigQuery

Query optimizationsQuery Plan

https://cloud.google.com/bigquery/query-plan-explanation

Page 23: Google BigQuery

Big Data Reference Architecture

Page 24: Google BigQuery

Questions?You can reach me at:

- mail: [email protected] Twitter: @FsMatt