redshift vs bigquery

40
BigQuery Redshift VS

Upload: kostas-pardalis

Post on 23-Jan-2018

363 views

Category:

Data & Analytics


0 download

TRANSCRIPT

BigQueryRedshiftVS

Amazon RedshiftReleased on 2012 (beta)

based on ParAccel (PostgreSQL clone)

Designed for OLAP and BI applications

Relational and Columnar database

Petabyte to Exabyte scale (Spectrum)

Google BigQueryevolution of Dremel (2006)

Initially launched in 2010

Web Service on top of Dremel Technology

More of a hybrid system (columnar + nested data)

Petabyte scale

Amazon Redshift Google BigQueryBuild on top of a proven technology

Relational

SQL

Analysts

Build something from scratch

Nested data structures are a first class citizen

NoSQL

Developers

VS

Loading Data

Amazon Redshift Google BigQueryS3

Kinesis

CSV, Avro, JSON

Google Cloud Storage

Streaming Inserts

Google Analytics Premium

CSV, Avro, JSON

VS

Data Modeling

Amazon Redshift Google BigQuerySchemas

Tables

Datasets

Tables

VS

Data Types

Data Types

Redshift: Closer to the Standard SQL data types (e.g. INT4, INT8) but does not support the full range of PostgreSQL data types

BigQuery: Smaller set of data types supported. But...

Data Types

Redshift: Very basic support for JSON

BigQuery: Support for Array and STRUCT types. Nested data structures are first class citizens.

Working with Data

Data ManipulationBigQuery used to be append only, now it supports Updates and Deletes (DML). But still limited.

Redshift always had this Supported via SQL but with a catch (Vacuum)

Table Manipulation

BigQuery: Limited and expensive via standard SQL, or via HTTP API (but you have to unload and reload the table).

Redshift: Supported via SQL

Both support views but not materialized

Data Consistency

Data Consistency

Redshift supports transactions VS BigQuery No Deduplication harder to be achieved on BigQuery (costly also).Even more complex when we go streaming.

Data Consistency

Kinesis: At least once semantics

BigQuery: best-effort deduplication time window + insertId.

Cluster Management

Cluster ManagementHere is where BigQuery really shines. It is fully managed with support for HA.

Redshift does not abstract completely the hardware from the user and it is difficult to implement it as a HA service.

This changes with Spectrum.

Cluster Management Connectivity

Connectivity

Redshift: API + Full JDBC/ODBC support. & access to all the standard PostgreSQL tools.

BigQuery: Mainly through the REST API, JDBC/ODBC drivers only for queries.

Authentication

Redshift: AWS IAM

BigQuery: OAuth

Cluster Management Quotas

Amazon Redshift Google BigQueryResources capped by your cluster size

No quotas related to inserts/updates etc

2,000 slots per account

Encourages the append only model with strict DML quotas

Both have a limit of 50 concurrent QueriesCluster resizing a pain with Redshift.

Cluster Management Optimization

Optimization

Redshift: Distribution Keys (Partitioning), Sort Keys, Column Compression

BigQuery: You don’t have to worry about all these, but it allows you to define time based partitioning.

Optimization

Redshift: Distribution Keys (Partitioning), Sort Keys, Column Compression

With great power comes great responsibility: Vacuuming

Optimization

With BigQuery you can only optimize your queries on the statement level and partition over time.

Cluster Management Costs

Costs

BigQuery: more difficult to estimate costs but costs scale based on usage. $5 / TB (on demand)

Redshift: you know exactly how much you will pay but you pay regardless of usage.

Ecosystem

Performance

BigQueryRedshiftOR

??

Amazon Redshift Google BigQueryVS

similarities are greater than the differences

Amazon Redshift Google BigQueryMore predictable costs

More intuitive data modeling (Analysts)

Options for optimizations

Easier & cheaper to start with

Good for nested data

Easier to work with time series

VS

[email protected]

www.blendo.co

Get the free Amazon Redshift guide http://bit.ly/redshift-guide