Download - Redshift VS BigQuery
Amazon RedshiftReleased on 2012 (beta)
based on ParAccel (PostgreSQL clone)
Designed for OLAP and BI applications
Relational and Columnar database
Petabyte to Exabyte scale (Spectrum)
Google BigQueryevolution of Dremel (2006)
Initially launched in 2010
Web Service on top of Dremel Technology
More of a hybrid system (columnar + nested data)
Petabyte scale
Amazon Redshift Google BigQueryBuild on top of a proven technology
Relational
SQL
Analysts
Build something from scratch
Nested data structures are a first class citizen
NoSQL
Developers
VS
Amazon Redshift Google BigQueryS3
Kinesis
CSV, Avro, JSON
Google Cloud Storage
Streaming Inserts
Google Analytics Premium
CSV, Avro, JSON
VS
Data Types
Redshift: Closer to the Standard SQL data types (e.g. INT4, INT8) but does not support the full range of PostgreSQL data types
BigQuery: Smaller set of data types supported. But...
Data Types
Redshift: Very basic support for JSON
BigQuery: Support for Array and STRUCT types. Nested data structures are first class citizens.
Data ManipulationBigQuery used to be append only, now it supports Updates and Deletes (DML). But still limited.
Redshift always had this Supported via SQL but with a catch (Vacuum)
Table Manipulation
BigQuery: Limited and expensive via standard SQL, or via HTTP API (but you have to unload and reload the table).
Redshift: Supported via SQL
Both support views but not materialized
Data Consistency
Redshift supports transactions VS BigQuery No Deduplication harder to be achieved on BigQuery (costly also).Even more complex when we go streaming.
Data Consistency
Kinesis: At least once semantics
BigQuery: best-effort deduplication time window + insertId.
Cluster ManagementHere is where BigQuery really shines. It is fully managed with support for HA.
Redshift does not abstract completely the hardware from the user and it is difficult to implement it as a HA service.
This changes with Spectrum.
Connectivity
Redshift: API + Full JDBC/ODBC support. & access to all the standard PostgreSQL tools.
BigQuery: Mainly through the REST API, JDBC/ODBC drivers only for queries.
Amazon Redshift Google BigQueryResources capped by your cluster size
No quotas related to inserts/updates etc
2,000 slots per account
Encourages the append only model with strict DML quotas
Both have a limit of 50 concurrent QueriesCluster resizing a pain with Redshift.
Optimization
Redshift: Distribution Keys (Partitioning), Sort Keys, Column Compression
BigQuery: You don’t have to worry about all these, but it allows you to define time based partitioning.
Optimization
Redshift: Distribution Keys (Partitioning), Sort Keys, Column Compression
With great power comes great responsibility: Vacuuming
Optimization
With BigQuery you can only optimize your queries on the statement level and partition over time.
Costs
BigQuery: more difficult to estimate costs but costs scale based on usage. $5 / TB (on demand)
Redshift: you know exactly how much you will pay but you pay regardless of usage.
Performance
Benchmark by PanoplyDebunking Misleading Benchmarks Of Redshift vs BigQueryHN DiscussionBenchmark by Periscope Data
Amazon Redshift Google BigQueryMore predictable costs
More intuitive data modeling (Analysts)
Options for optimizations
Easier & cheaper to start with
Good for nested data
Easier to work with time series
VS