amazon redshift to bigquery sql translation reference

47
  Amazon Redshift to BigQuery SQL translation reference   

Upload: others

Post on 17-Nov-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Amazon Redshift to BigQuery SQL translation reference

 

  

Amazon Redshift to BigQuery SQL translation reference    

  

 

Page 2: Amazon Redshift to BigQuery SQL translation reference

 

 

About this SQL translation reference 3 

Data types 3 Implicit conversion types 5 Explicit conversion types 5 

Query syntax 5 SELECT statement 6 FROM clause 6 JOIN types 7 WITH clause 7 Set operators 7 ORDER BY clause 8 Conditions 9 Functions 10 

Aggregate functions 10 Bitwise aggregate functions 11 Window functions 11 Conditional expressions 15 Date and time functions 16 Mathematical operators 22 Math functions 24 String functions 25 Data type formatting functions 30 

DML syntax 31 INSERT statement 31 COPY statement 31 UPDATE statement 32 DELETE, TRUNCATE statements 32 MERGE statement 33 

Merge operation by replacing existing rows 33 

DDL syntax 35 SELECT INTO statement 35 CREATE TABLE statement 35 

Temporary tables 37 CREATE VIEW statement 37 

User-defined functions (UDFs) 38 

  

Page 3: Amazon Redshift to BigQuery SQL translation reference

 

CREATE FUNCTION syntax 38 DROP FUNCTION syntax 40 UDF components 41 

Metadata and transaction SQL statements 43 Multi-statement and multi-line SQL statements 43 

Procedural SQL statements 44 CREATE PROCEDURE statement 44 Variable declaration and assignment 44 Error condition handlers 44 Cursor declarations and operations 45 Dynamic SQL statements 45 Flow-of-control statements 45 

Consistency guarantees and transaction isolation 46 Transactions 46 Rollback 46 

Database limits 46 

                    

  

Page 4: Amazon Redshift to BigQuery SQL translation reference

 

About this SQL translation reference This document details the similarities and differences in SQL syntax between Redshift and BigQuery. The document can help accelerate the planning and execution of moving your enterprise data warehouse (EDW) to BigQuery. Redshift data warehousing is designed to work with Redshift-specific SQL syntax. Scripts written for Redshift might need to be altered before you can use them in BigQuery because the SQL dialects vary between the services.  

Note: In some cases, there is no direct mapping between a SQL element in Redshift and BigQuery. However, in most cases, you can achieve the same functionality in BigQuery that you can in Redshift using alternative means, as shown in the examples in this document. 

 This document is part of a series that discusses migrating data from Redshift to BigQuery. It is a companion to the following document: 

● Amazon Redshift to BigQuery migration guide  

Highlights 

Purpose  To detail common similarities and differences in SQL syntax between Redshift and BigQuery and help accelerate the planning and execution of moving your enterprise data warehouse (EDW) to BigQuery.  

Intended audience  Enterprise architects, DBAs, application developers, and IT security. 

Key assumptions  That the audience is familiar with Redshift and is looking for guidance on transitioning to BigQuery. 

 

Data types This section shows equivalents between data types in Redshift and in BigQuery.   

Redshift  BigQuery  Notes 

Data type  Alias 

SMALLINT  INT2  INT64  Redshift’s SMALLINT is 2 bytes, whereas BigQuery’s INT64 is 8 bytes. 

  

Page 5: Amazon Redshift to BigQuery SQL translation reference

 

INTEGER  INT, INT4 

INT64  Redshift’s INTEGER is 4 bytes, whereas BigQuery’s INT64 is 8 bytes. 

BIGINT  INT8  INT64  Both Redshift’s BIGINT and BigQuery’s INT64 are 8 bytes. 

DECIMAL  NUMERIC  NUMERIC   

REAL  FLOAT4  FLOAT64  Redshift’s REAL is 4 bytes, whereas BigQuery’s FLOAT64 is 8 bytes. 

DOUBLE PRECISION 

FLOAT8, FLOAT 

FLOAT64  

BOOLEAN  BOOL  BOOL  Redshift’s BOOLEAN can use TRUE, t, true, y, yes, and 1 as valid literal values for true. BigQuery’s BOOL data type uses case-insensitive TRUE. 

CHAR  CHARACTER, NCHAR, BPCHAR 

STRING  

VARCHAR  CHARACTER VARYING, NVARCHAR, TEXT 

STRING  

DATE    DATE   

TIMESTAMP  TIMESTAMP WITHOUT TIME ZONE 

DATETIME  

TIMESTAMPTZ TIMESTAMP WITH TIME ZONE 

TIMESTAMP  Note: In BigQuery, time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC using (-|+)HH:MM are supported, but time zone abbreviations such as PDT are not supported.  

GEOMETRY    GEOGRAPHY  Support for querying geospatial data.  BigQuery also has the following data types that do not have a direct Redshift analog: 

● ARRAY ● BYTES ● TIME ● STRUCT 

  

 4 

Page 6: Amazon Redshift to BigQuery SQL translation reference

 

Implicit conversion types 

When migrating to BigQuery, you need to convert most of your Redshift implicit conversions to BigQuery’s explicit conversions except for the following data types, which BigQuery implicitly converts.  BigQuery performs implicit conversions for the following data types:  

From BigQuery type  To BigQuery type 

INT64  FLOAT64 

INT64  NUMERIC 

NUMERIC  FLOAT64  BigQuery also performs implicit conversions for the following literals:  

From BigQuery type  To BigQuery type 

STRING literal  (e.g. "2008-12-25") 

DATE 

STRING literal (e.g. "2008-12-25 15:30:00") 

TIMESTAMP 

STRING literal (e.g. "2008-12-25T07:30:00") 

DATETIME 

STRING literal (e.g. "15:30:00”) 

TIME 

 

Explicit conversion types 

You can convert Redshift data types that BigQuery doesn’t implicitly convert using BigQuery’s CAST(expression AS type) function or any of the DATE and TIMESTAMP conversion functions.  When migrating your queries, change any occurrences of the Redshift CONVERT(type, expression) function (or the :: syntax) to BigQuery’s CAST(expression AS type) function, as shown in the table in the Data type formatting functions section.  

Query syntax This section addresses differences in query syntax between Redshift and BigQuery. 

  

Page 7: Amazon Redshift to BigQuery SQL translation reference

 

SELECT statement 

Most Redshift SELECT statements are compatible with BigQuery. The following table contains a list of minor differences.   

Redshift  BigQuery 

SELECT TOP number expression FROM table 

SELECT expression FROM table ORDER BY expression DESC LIMIT number 

SELECT  x/total AS probability,   ROUND(100 * probability, 1) AS pct FROM raw_data  Note: Redshift supports creating and referencing an alias in the same SELECT statement. 

SELECT  x/total AS probability,   ROUND(100 * (x/total), 1) AS pct FROM raw_data 

 BigQuery also supports the following expressions in SELECT statements, which do not have a Redshift equivalent: 

● EXCEPT ● REPLACE 

FROM clause 

A FROM clause in a query lists the table references that data is selected from. In Redshift, possible table references include tables, views, and subqueries. All of these table references are supported in BigQuery.   BigQuery tables can be referenced in the FROM clause using the following: 

● [project_id].[dataset_id].[table_name] ● [dataset_id].[table_name] ● [table_name] 

 BigQuery also supports additional table references: 

● Historical versions of the table definition and rows using FOR SYSTEM_TIME AS OF. ● Field paths, or any path that resolves to a field within a data type (such as a STRUCT). ● Flattened arrays. 

 

  

Page 8: Amazon Redshift to BigQuery SQL translation reference

 

JOIN types 

Both Redshift and BigQuery support the following types of join: 

● [INNER] JOIN ● LEFT [OUTER] JOIN ● RIGHT [OUTER] JOIN ● FULL [OUTER] JOIN ● CROSS JOIN and the equivalent implicit comma cross join 

 The following table contains a list of minor differences.   

Redshift  BigQuery 

SELECT col1  FROM table1 NATURAL INNER JOIN table2 

SELECT col1 FROM table1 INNER JOIN table2 USING (col1, col2 [, ...])  Note: In BigQuery, JOIN clauses require a JOIN condition unless the clause is a CROSS JOIN or one of the joined tables is a field within a data type or an array. 

 

WITH clause 

A BigQuery WITH clause contains one or more named subqueries that execute when a subsequent SELECT statement references them. Redshift WITH clauses behave the same as BigQuery’s with the exception that you can evaluate the clause once and reuse its results. 

Set operators 

There are some minor differences between Redshift set operators and BigQuery set operators. However, all set operations that are feasible in Redshift are replicable in BigQuery.   

Redshift  BigQuery 

SELECT * FROM table1 UNION SELECT * FROM table2 

SELECT * FROM table1 UNION DISTINCT SELECT * FROM table2  Note: Both BigQuery and Redshift support the UNION ALL operator. 

SELECT * FROM table1  INTERSECT SELECT * FROM table2 

SELECT * FROM table1 INTERSECT DISTINCT SELECT * FROM table2 

  

Page 9: Amazon Redshift to BigQuery SQL translation reference

 

SELECT * FROM table1 EXCEPT SELECT * FROM table2 

SELECT * FROM table1 EXCEPT DISTINCT SELECT * FROM table2 

SELECT * FROM table1 MINUS SELECT * FROM table2 

SELECT * FROM table1 EXCEPT DISTINCT SELECT * FROM table2 

SELECT * FROM table1 UNION SELECT * FROM table2 EXCEPT SELECT * FROM table3 

SELECT * FROM table1 UNION ALL (  SELECT * FROM table2  EXCEPT  SELECT * FROM table3 )  Note: BigQuery requires parentheses to separate different set operations. If the same set operator is repeated, parentheses are not necessary. 

 

ORDER BY clause 

There are some minor differences between Redshift ORDER BY clauses and BigQuery ORDER BY clauses.   

Redshift  BigQuery 

In Redshift, NULLS are ranked last by default (ascending order). 

In BigQuery, NULLS are ranked first by default (ascending order). 

SELECT * FROM table ORDER BY expression LIMIT ALL 

SELECT * FROM table ORDER BY expression  Note: BigQuery does not use the LIMIT ALL syntax, but ORDER BY sorts all rows by default, resulting in the same behavior as Redshift’s LIMIT ALL clause. We highly recommend including a LIMIT clause with every ORDER BY clause. Ordering all result rows unnecessarily degrades query execution performance. 

SELECT * FROM table ORDER BY expression OFFSET 10 

SELECT * FROM table ORDER BY expression LIMIT count OFFSET 10  Note: In BigQuery, OFFSET must be used together with a LIMIT count. Make sure to set the count INT64 value to the 

  

Page 10: Amazon Redshift to BigQuery SQL translation reference

 

minimum necessary ordered rows. Ordering all result rows unnecessarily degrades query execution performance. 

Conditions 

The following table shows Redshift conditions, or predicates, that are specific to Redshift and must be converted to their BigQuery equivalent.  

Redshift  BigQuery 

a = ANY (subquery)  a = SOME (subquery) 

a IN subquery 

a <> ALL (subquery)  a != ALL (subquery) 

a NOT IN subquery 

a IS UNKNOWN  a IS NULL 

expression ILIKE pattern  LOWER(expression) LIKE LOWER(pattern) 

expression LIKE pattern ESCAPE 'escape_char' 

expression LIKE pattern  Note: BigQuery does not support custom escape characters. You must use two backslashes \\ as escape characters for BigQuery. 

expression [NOT] SIMILAR TO pattern 

IF(  LENGTH(  REGEXP_REPLACE(  expression,  pattern,  ''  ) = 0,  True,  False )  Note: If NOT is specified, wrap the above IF expression in a NOT expression as shown below:  NOT(   IF(   LENGTH(...  ) 

expression [!] ~ pattern  [NOT] REGEXP_CONTAINS(  expression,  regex ) 

 

  

Page 11: Amazon Redshift to BigQuery SQL translation reference

 

Functions 

The following sections list Redshift functions and their BigQuery equivalents.  

Aggregate functions The following table shows mappings between common Redshift aggregate, aggregate analytic, and approximate aggregate functions with their BigQuery equivalents. 

 

Redshift  BigQuery 

APPROXIMATE COUNT(DISTINCT expression) APPROX_COUNT_DISTINCT(expression) 

APPROXIMATE PERCENTILE_DISC(  percentile ) WITHIN GROUP (ORDER BY expression) 

APPROX_QUANTILES(expression, 100)[OFFSET(CAST(TRUNC(percentile * 100) as INT64))] 

AVG([DISTINCT] expression)  AVG([DISTINCT] expression) 

COUNT(expression)  COUNT(expression) 

LISTAGG(  [DISTINCT] aggregate_expression  [, delimiter] ) [WITHIN GROUP (ORDER BY order_list)] 

STRING_AGG(  [DISTINCT] aggregate_expression  [, delimiter]  [ORDER BY order_list] )  

MAX(expression)  MAX(expression) 

MEDIAN(median_expression)  PERCENTILE_CONT(  median_expression,  0.5 ) OVER() 

MIN(expression)  MIN(expression) 

PERCENTILE_CONT(  percentile ) WITHIN GROUP (ORDER BY expression) 

PERCENTILE_CONT(  median_expression,  percentile ) OVER()  Note: Does not cover aggregation use cases. 

STDDEV([DISTINCT] expression)  STDDEV([DISTINCT] expression) 

STDDEV_SAMP([DISTINCT] expression)  STDDEV_SAMP([DISTINCT] expression) 

STDDEV_POP([DISTINCT] expression)  STDDEV_POP([DISTINCT] expression) 

SUM([DISTINCT] expression)  SUM([DISTINCT] expression) 

VARIANCE([DISTINCT] expression)  VARIANCE([DISTINCT] expression) 

VAR_SAMP([DISTINCT] expression)  VAR_SAMP([DISTINCT] expression) 

VAR_POP([DISTINCT] expression)  VAR_POP([DISTINCT] expression) 

  

10 

Page 12: Amazon Redshift to BigQuery SQL translation reference

 

 BigQuery also offers the following aggregate, aggregate analytic, and approximate aggregate functions, which do not have a direct analogue in Redshift: 

● ANY_VALUE ● APPROX_TOP_COUNT ● APPROX_TOP_SUM ● ARRAY_AGG ● ARRAY_CONCAT_AGG ● COUNTIF ● CORR ● COVAR_POP ● COVAR_SAMP 

 

Bitwise aggregate functions The following table shows mappings between common Redshift bitwise aggregate functions with their BigQuery equivalents.  

Redshift  BigQuery 

BIT_AND(expression)  BIT_ADD(expression) 

BIT_OR(expression)  BIT_OR(expression) 

BOOL_AND(expression)  LOGICAL_AND(expression) 

BOOL_OR(expression)  LOGICAL_OR(expression) 

 BigQuery also offers the following bit-wise aggregate function, which does not have a direct analogue in Redshift: 

● BIT_XOR  

Window functions The following table shows mappings between common Redshift window functions with their BigQuery equivalents. Windowing functions in BigQuery include analytic aggregate functions, aggregate functions, navigation functions, and numbering functions.  

Redshift  BigQuery 

AVG(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

AVG(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause] ) 

  

11 

Page 13: Amazon Redshift to BigQuery SQL translation reference

 

COUNT(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

COUNT(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause] ) 

CUME_DIST() OVER (  [PARTITION BY partition_expression]  [ORDER BY order_list] ) 

CUME_DIST() OVER (  [PARTITION BY partition_expression]  ORDER BY order_list ) 

DENSE_RANK() OVER (  [PARTITION BY expr_list]  [ORDER BY order_list] ) 

DENSE_RANK() OVER (  [PARTITION BY expr_list]  ORDER BY order_list ) 

FIRST_VALUE(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

FIRST_VALUE(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause] ) 

LAST_VALUE(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

LAST_VALUE(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

LAG(value_expr [, offset]) OVER (  [PARTITION BY window_partition]  ORDER BY window_ordering ) 

LAG(value_expr [, offset]) OVER (  [PARTITION BY window_partition]  ORDER BY window_ordering ) 

LEAD(value_expr [, offset]) OVER (  [PARTITION BY window_partition]  ORDER BY window_ordering ) 

LEAD(value_expr [, offset]) OVER (  [PARTITION BY window_partition]  ORDER BY window_ordering ) 

LISTAGG(  [DISTINCT] expression  [, delimiter] 

STRING_AGG(  [DISTINCT] aggregate_expression  [, delimiter] 

  

12 

Page 14: Amazon Redshift to BigQuery SQL translation reference

 

) [WITHIN GROUP (ORDER BY order_list)] OVER (  [PARTITION BY partition_expression] ) 

) OVER  (  [PARTITION BY partition_list]  [ORDER BY order_list] ) 

MAX(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

MAX(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause] ) 

MEDIAN(median_expression) OVER (  [PARTITION BY partition_expression] ) 

PERCENTILE_CONT(  median_expression,  0.5 ) OVER (  [PARTITION BY partition_expression] ) 

MIN(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

MIN(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause] ) 

NTH_VALUE(expression, offset) OVER (  [PARTITION BY window_partition]  [ORDER BY window_ordering   frame_clause] ) 

NTH_VALUE(expression, offset) OVER (  [PARTITION BY window_partition]  ORDER BY window_ordering  [frame_clause] ) 

NTILE(expr) OVER (  [PARTITION BY expression_list]  [ORDER BY order_list]  ) 

NTILE(expr) OVER (  [PARTITION BY expression_list]  ORDER BY order_list ) 

PERCENT_RANK() OVER (  [PARTITION BY partition_expression]  [ORDER BY order_list]  ) 

PERCENT_RANK() OVER (  [PARTITION BY partition_expression]  ORDER BY order_list ) 

  

13 

Page 15: Amazon Redshift to BigQuery SQL translation reference

 

PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expr) OVER (  [PARTITION BY expr_list] ) 

PERCENTILE_CONT(expr, percentile) OVER (  [PARTITION BY expr_list] ) 

PERCENTILE_DISC(percentile) WITHIN GROUP (ORDER BY expr) OVER (  [PARTITION BY expr_list] ) 

PERCENTILE_DISC(expr, percentile) OVER (  [PARTITION BY expr_list] ) 

RANK() OVER (  [PARTITION BY expr_list]  [ORDER BY order_list] ) 

RANK() OVER (  [PARTITION BY expr_list]  ORDER BY order_list ) 

RATIO_TO_REPORT(ratio_expression) OVER (  [PARTITION BY partition_expression] ) 

ratio_expression/SUM(ratio_expression) OVER (  [PARTITION BY partition_expression] ) 

ROW_NUMBER() OVER (  [PARTITION BY expr_list]  [ORDER BY order_list] ) 

ROW_NUMBER() OVER (  [PARTITION BY expr_list]  [ORDER BY order_list] ) 

STDDEV(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

STDDEV(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause]  ) 

STDDEV_SAMP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

STDDEV_SAMP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause]  ) 

  

14 

Page 16: Amazon Redshift to BigQuery SQL translation reference

 

STDDEV_POP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

STDDEV_POP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause]  ) 

SUM(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

SUM(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause]  ) 

VAR_SAMP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

VAR_SAMP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause]  )  

VAR_POP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

VAR_POP(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause]  ) 

VARIANCE(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list  frame_clause] ) 

VARIANCE(expression) OVER (  [PARTITION BY expr_list]  [ORDER BY order_list]  [frame_clause]  ) 

 

Conditional expressions The following table shows mappings between common Redshift conditional expressions with their BigQuery equivalents. 

Redshift  BigQuery 

CASE expression  WHEN value THEN result  [WHEN...]  [ELSE else_result]  END 

CASE expression  WHEN value THEN result  [WHEN...]  [ELSE else_result]  END 

COALESCE(expression1[, ...])  COALESCE(expression1[, ...]) 

DECODE(  CASE expression 

  

15 

Page 17: Amazon Redshift to BigQuery SQL translation reference

 

expression,  search1, result1  [, search2, result2...]  [, default] ) 

WHEN value1 THEN result1  [WHEN value2 THEN result2]  [ELSE default]  END 

GREATEST(value [, ...])  GREATEST(value [, ...]) 

LEAST(value [, ...])  LEAST(value [, ...]) 

NVL(expression1[, ...])  COALESCE(expression1[, ...]) 

NVL2(  expression,  not_null_return_value,  null_return_value ) 

IF(  expression IS NULL,  null_return_value,  not_null_return_value ) 

NULLIF(expression1, expression2)  NULLIF(expression1, expression2) 

 BigQuery also offers the following conditional expressions, which do not have a direct analogue in Redshift: 

● IF ● IFNULL 

 

Date and time functions The following table shows mappings between common Redshift date and time functions with their BigQuery equivalents. BigQuery data and time functions include date functions, datetime functions, time functions, and timestamp functions.   Keep in mind that functions that seem identical in Redshift and BigQuery might return different data types. 

Redshift  BigQuery 

ADD_MONTHS(date, integer)  CAST(  DATE_ADD(  date,  INTERVAL integer MONTH  ) AS TIMESTAMP ) 

  

16 

Page 18: Amazon Redshift to BigQuery SQL translation reference

 

timestamptz_or_timestamp AT TIME ZONE timezone 

PARSE_TIMESTAMP(  "%c%z",  FORMAT_TIMESTAMP(  "%c%z",  timestamptz_or_timestamp,  timezone  ) )  Note: Time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC (-HH:MM) are supported, but time zone abbreviations (such as PDT) are not supported.  

CONVERT_TIMEZONE(  [source_timezone],  target_timezone,  timestamp ) 

PARSE_TIMESTAMP(  "%c%z",  FORMAT_TIMESTAMP(  "%c%z",  timestamp,  target_timezone  ) )  Note: source_timezone is UTC in BigQuery. 

CURRENT_DATE  Note: Returns start date for the current transaction in the current session time zone (UTC by default). 

CURRENT_DATE()  Note: Returns start date for the current statement in UTC time zone. 

DATE_CMP(date1, date2)  CASE  WHEN date1 = date2 THEN 0  WHEN date1 > date2 THEN 1  ELSE -1  END 

  

17 

Page 19: Amazon Redshift to BigQuery SQL translation reference

 

DATE_CMP_TIMESTAMP(date1, date2)  CASE  WHEN date1 = CAST(date2 AS DATE)    THEN 0  WHEN date1 > CAST(date2 AS DATE)  THEN 1  ELSE -1  END 

DATE_CMP_TIMESTAMPTZ(date, timestamptz)  CASE WHEN date > DATE(timestamptz) THEN 1 WHEN date < DATE(timestamptz) THEN -1 ELSE 0  END 

DATE_PART_YEAR(date)  EXTRACT(YEAR FROM date) 

DATEADD(date_part, interval, date)  CAST(  DATE_ADD(  date,  INTERVAL interval datepart  ) AS TIMESTAMP ) 

DATEDIFF(  date_part,  date_expression1,  date_expression2 ) 

DATE_DIFF(  date_expression1,  date_expression2,  date_part ) 

DATE_PART(date_part, date)  EXTRACT(date_part FROM date) 

DATE_TRUNC('date_part', timestamp)  TIMESTAMP_TRUNC(timestamp, date_part) 

EXTRACT(date_part FROM timestamp)  EXTRACT(date_part FROM timestamp) 

GETDATE()  PARSE_TIMESTAMP(  "%c",  FORMAT_TIMESTAMP(  "%c",  CURRENT_TIMESTAMP()  ) ) 

INTERVAL_CMP(interval_literal1, interval_literal2) 

For intervals in Redshift, there are 360 days in a year. In BigQuery, you can use the following UDF to parse a Redshift interval and translate it to seconds.  CREATE TEMP FUNCTION parse_interval(interval_literal STRING) AS ( 

  

18 

Page 20: Amazon Redshift to BigQuery SQL translation reference

 

(select sum(case  when unit in ('minutes', 'minute', 'm' ) then num * 60  when unit in ('hours', 'hour', 'h') then num * 60 * 60  when unit in ('days', 'day', 'd' ) then num * 60 * 60 * 24  when unit in ('weeks', 'week', 'w') then num * 60 * 60 * 24 * 7  when unit in ('months', 'month' ) then num * 60 * 60 * 24 * 30   when unit in ('years', 'year') then num * 60 * 60 * 24 * 360  else num end) from (  select  cast(regexp_extract(value, r'^[0-9]*\.?[0-9]+') as numeric) num,  substr(value, length(regexp_extract(value, r'^[0-9]*\.?[0-9]+')) + 1) unit  from   UNNEST(SPLIT(replace(interval_literal, ' ', ''), ',')) value )));  To compare interval literals, perform: IF(  parse_interval(interval_literal1) > parse_interval(interval_literal2),   1,   IF(  parse_interval(interval_literal1) > parse_interval(interval_literal2),   -1,   0  ) ) 

LAST_DAY(date)  DATE_SUB(  DATE_ADD(  date,  INTERVAL 1 MONTH  ),  INTERVAL 1 DAY ) 

  

19 

Page 21: Amazon Redshift to BigQuery SQL translation reference

 

MONTHS_BETWEEN(date1, date2)  DATE_DIFF(  date1,  date2,  MONTH ) 

NEXT_DAY(date, day)  DATE_ADD(  DATE_TRUNC(  date,  WEEK(day)  ),  INTERVAL 1 WEEK ) 

SYSDATE  Note: Returns start timestamp for the current transaction in the current session time zone (UTC by default). 

CURRENT_TIMESTAMP()  Note: Returns start timestamp for the current statement in UTC time zone.  

TIMEOFDAY()  FORMAT_TIMESTAMP("%a %b %d %H:%M:%E6S %E4Y %Z", CURRENT_TIMESTAMP()) 

TIMESTAMP_CMP(  timestamp1,  timestamp2 ) 

CASE  WHEN timestamp1 = timestamp2  THEN 0  WHEN timestamp1 > timestamp2  THEN 1  ELSE -1  END 

TIMESTAMP_CMP_DATE(  timestamp,  date ) 

CASE  WHEN EXTRACT(DATE FROM timestamp) = date THEN 0  WHEN EXTRACT(DATE FROM timestamp) > date  THEN 1  ELSE -1  END 

TIMESTAMP_CMP_TIMESTAMPTZ(  timestamp,  timestamptz )  Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC. 

CASE  WHEN timestamp = timestamptz  THEN 0  WHEN timestamp > timestamptz  THEN 1  ELSE -1  END  Note: BigQuery compares timestamps in UTC time zone. 

TIMESTAMPTZ_CMP(  CASE 

  

20 

Page 22: Amazon Redshift to BigQuery SQL translation reference

 

timestamptz1,  timestamptz2 )  Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC. 

WHEN timestamptz1 = timestamptz2  THEN 0  WHEN timestamptz1 > timestamptz2  THEN 1  ELSE -1  END  Note: BigQuery compares timestamps in UTC time zone. 

TIMESTAMPTZ_CMP_DATE(  timestamptz,  date )  Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC. 

CASE  WHEN EXTRACT(DATE FROM timestamptz) = date THEN 0  WHEN EXTRACT(DATE FROM timestamptz) > date  THEN 1  ELSE -1  END  Note: BigQuery compares timestamps in UTC time zone. 

TIMESTAMPTZ_CMP_TIMESTAMP(  timestamptz,  Timestamp )  Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC. 

CASE  WHEN timestamp = timestamptz  THEN 0  WHEN timestamp > timestamptz  THEN 1  ELSE -1  END  Note: BigQuery compares timestamps in UTC time zone. 

TIMEZONE(  timezone,  Timestamptz_or_timestamp ) 

PARSE_TIMESTAMP(  "%c%z",  FORMAT_TIMESTAMP(  "%c%z",  timestamptz_or_timestamp,  timezone  ) )  Note: Time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC (-HH:MM) are 

  

21 

Page 23: Amazon Redshift to BigQuery SQL translation reference

 

supported but time zone abbreviations (such as PDT) are not supported.  

TO_TIMESTAMP(timestamp, format) 

PARSE_TIMESTAMP(  format,  FORMAT_TIMESTAMP(  format,  timestamp  ) )  Note: BigQuery follows a different set of format elements. Time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC (-HH:MM) are supported in the format string but time zone abbreviations (such as PDT) are not supported.  

TRUNC(timestamp)  CAST(timestamp AS DATE) 

 BigQuery also offers the following date and time functions, which do not have a direct analogue in Redshift: 

● EXTRACT ● DATE ● DATE_SUB ● DATE_ADD (returning DATE 

data type) ● DATE_FROM_UNIX_DATE ● FORMAT_DATE ● PARSE_DATE ● UNIX_DATE ● DATETIME 

● DATETIME_ADD ● DATETIME_SUB ● DATETIME_DIFF ● DATETIME_TRUNC ● FORMAT_DATETIME ● PARSE_DATETIME ● CURRENT_TIME ● TIME ● TIME_ADD ● TIME_SUB 

● TIME_DIFF ● TIME_TRUNC ● FORMAT_TIME ● PARSE_TIME ● TIMESTAMP_SECONDS ● TIMESTAMP_MILLIS ● TIMESTAMP_MICROS ● UNIX_SECONDS ● UNIX_MILLIS ● UNIX_MICROS 

 

Mathematical operators The following table shows mappings between common Redshift mathematical operators with their BigQuery equivalents.   

  

22 

Page 24: Amazon Redshift to BigQuery SQL translation reference

 

 

Redshift  BigQuery 

X + Y  X + Y 

X - Y  X - Y 

X * Y  X * Y 

X / Y  Note: If the operator is performing integer division (in other words, if X and Y are both integers), an integer is returned. If the operator is performing non-integer division, a non-integer is returned. 

If integer division: CAST(FLOOR(X / Y) AS INT64)  If not integer division: CAST(X / Y AS INT64)  Note: Division in BigQuery returns a non-integer.  To prevent errors from a division operation (division by zero error), use SAFE_DIVIDE(X, Y) or IEEE_DIVIDE(X, Y). 

X % Y  MOD(X, Y)  Note: To prevent errors from a division operation (division by zero error), use SAFE.MOD(X, Y). SAFE.MOD(X, 0) results in 0. 

X ^ Y  POW(X, Y)  POWER(X, Y)  Note: Unlike Redshift, the ^ operator in BigQuery performs Bitwise xor. 

|/ X  SQRT(X)  Note: To prevent errors from a square root operation (negative input), use SAFE.SQRT(X). Negative input with SAFE.SQRT(X) results in NULL. 

||/ X  SIGN(X) * POWER(ABS(X), 1/3)  Note: BigQuery’s POWER(X, Y) returns an error if X is a finite value less than 0 and Y is a noninteger. 

@ X  ABS(X) 

X << Y  X << Y  

  

23 

Page 25: Amazon Redshift to BigQuery SQL translation reference

 

Note: This operator returns 0 or a byte sequence of b'\x00' if the second operand Y is greater than or equal to the bit length of the first operand X (for example, 64 if X has the type INT64). This operator throws an error if Y is negative. 

X >> Y  X >> Y 

Note: Shifts the first operand X to the right. This operator does not do sign bit extension with a signed type (it fills vacant bits on the left with 0). This operator returns 0 or a byte sequence of b'\x00' if the second operand Y is greater than or equal to the bit length of the first operand X (for example, 64 if X has the type INT64). This operator throws an error if Y is negative. 

X & Y  X & Y 

X | Y  X | Y 

~X  ~X 

 BigQuery also offers the following mathematical operator, which does not have a direct analog in Redshift: 

● X ^ Y (Bitwise xor)  

Math functions 

Redshift  BigQuery 

ABS(number)  ABS(number) 

ACOS(number)  ACOS(number) 

ASIN(number)  ASIN(number) 

ATAN(number)  ATAN(number) 

ATAN2(number1, number2)  ATAN2(number1, number2) 

CBRT(number)  POWER(number, 1/3) 

CEIL(number)  CEIL(number) 

CEILING(number)  CEILING(number) 

CHECKSUM(expression)  FARM_FINGERPRINT(expression) 

COS(number)  COS(number) 

COT(number)  1/TAN(number) 

DEGREES(number)  number*180/ACOS(-1) 

DEXP(number)  EXP(number) 

DLOG1(number)  LN(number) 

  

24 

Page 26: Amazon Redshift to BigQuery SQL translation reference

 

DLOG10(number)  LOG10(number) 

EXP(number)  EXP(number) 

FLOOR(number)  FLOOR(number) 

LN(number)  LN(number) 

LOG(number)  LOG10(number) 

MOD(number1, number2)  MOD(number1, number2) 

PI  ACOS(-1) 

POWER(expression1, expression2)  POWER(expression1, expression2) 

RADIANS(number)  ACOS(-1)*(number/180) 

RANDOM()  RAND() 

ROUND(number [, integer])  ROUND(number [, integer]) 

SIN(number)  SIN(number) 

SIGN(number)  SIGN(number) 

SQRT(number)  SQRT(number) 

TAN(number)  TAN(number) 

TO_HEX(number)  FORMAT('%x', number) 

TRUNC(number [, integer])  TRUNC(number [, integer]) 

 

String functions 

Redshift  BigQuery 

string1 || string2  CONCAT(string1, string2) 

BPCHARCMP(string1, string2)  CASE   WHEN string1 = string2 THEN 0  WHEN string1 > string2 THEN 1  ELSE -1  END 

BTRIM(string [, matching_string])  TRIM(string [, matching_string]) 

BTTEXT_PATTERN_CMP(string1, string2)  CASE   WHEN string1 = string2 THEN 0  WHEN string1 > string2 THEN 1  ELSE -1  END 

CHAR_LENGTH(expression)  CHAR_LENGTH(expression) 

CHARACTER_LENGTH(expression)  CHARACTER_LENGTH(expression) 

CHARINDEX(substring, string)  STRPOS(string, substring) 

CHR(number)  CODE_POINTS_TO_STRING([number]) 

  

25 

Page 27: Amazon Redshift to BigQuery SQL translation reference

 

CONCAT(string1, string2)  CONCAT(string1, string2)  Note: BigQuery’s CONCAT(...) supports concatenating any number of strings. 

CRC32  Custom user-defined function 

FUNC_SHA1(string)  SHA1(string) 

INITCAP  Custom user-defined function 

LEFT(string, integer)  SUBSTR(string, 0, integer) 

RIGHT(string, integer)  SUBSTR(string, -integer) 

LEN(expression)  LENGTH(expression) 

LENGTH(expression)  LENGTH(expression) 

LOWER(string)  LOWER(string) 

LPAD(string1, length[, string2])  LPAD(string1, length[, string2]) 

RPAD(string1, length[, string2])  RPAD(string1, length[, string2]) 

LTRIM(string, trim_chars)  LTRIM(string1, trim_chars) 

MD5(string)  MD5(string) 

OCTET_LENGTH(expression)  BYTE_LENGTH(expression) 

POSITION(substring IN string)  STRPOS(string, substring) 

QUOTE_IDENT(string)  CONCAT('"',string,'"') 

QUOTE_LITERAL(string)  CONCAT("'",string,"'") 

REGEXP_COUNT(  source_string,  pattern  [,position] ) 

ARRAY_LENGTH(  REGEXP_EXTRACT_ALL(  source_string,  pattern  ) )  If position is specified: ARRAY_LENGTH(  REGEXP_EXTRACT_ALL(  SUBSTR(source_string, IF(position <= 0, 1, position)),  pattern  ) )  Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax. 

  

26 

Page 28: Amazon Redshift to BigQuery SQL translation reference

 

REGEXP_INSTR(  source_string,  pattern  [, position  [, occurrence]] ) 

IFNULL(  STRPOS(  source_string,    REGEXP_EXTRACT(  source_string,  pattern)  ), 0)  If position is specified: IFNULL(  STRPOS(  SUBSTR(source_string, IF(position <= 0, 1, position)),    REGEXP_EXTRACT(  SUBSTR(source_string, IF(position <= 0, 1, position)),  pattern)  ) + IF(position <= 0, 1, position) - 1, 0)  If occurrence is specified: IFNULL(  STRPOS(  SUBSTR(source_string, IF(position <= 0, 1, position)),    REGEXP_EXTRACT_ALL(  SUBSTR(source_string, IF(position <= 0, 1, position)),  pattern  )[SAFE_ORDINAL(occurrence)]  ) + IF(position <= 0, 1, position) - 1, 0)  Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax. 

REGEXP_REPLACE(  source_string,  pattern   [, replace_string   [, position]] ) 

REGEXP_REPLACE(  source_string,  pattern,  “” )  If replace_string is specified: REGEXP_REPLACE(  source_string, 

  

27 

Page 29: Amazon Redshift to BigQuery SQL translation reference

 

pattern,  replace_string )  If position is specified: CASE  WHEN position > LENGTH(source_string) THEN source_string WHEN position <= 0 THEN  REGEXP_REPLACE(  source_string,  pattern,  “”  ) ELSE  CONCAT(  SUBSTR(  source_string, 1, position - 1),  REGEXP_REPLACE(  SUBSTR(source_string, position),  pattern,  replace_string  )  ) END  Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax. 

REGEXP_SUBSTR(  source_string,  pattern  [, position  [, occurrence]] ) 

REGEXP_EXTRACT(  source_string,  pattern )  If position is specified: REGEXP_EXTRACT(  SUBSTR(source_string, IF(position <= 0, 1, position)),  pattern )  If occurrence is specified: REGEXP_EXTRACT_ALL(  SUBSTR(source_string, IF(position <= 0, 1, position)), 

  

28 

Page 30: Amazon Redshift to BigQuery SQL translation reference

 

pattern )[SAFE_ORDINAL(occurrence)]  Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax. 

REPEAT(string, integer)  REPEAT(string, integer) 

REPLACE(string1, old_chars, new_chars) REPLACE(string1, old_chars, new_chars) 

REPLICATE(string, integer)  REPEAT(string, integer) 

REVERSE(expression)  REVERSE(expression) 

RTRIM(string, trim_chars)  RTRIM(string, trim_chars) 

SPLIT_PART(string, delimiter, part)  SPLIT(  string  delimiter )SAFE_ORDINAL(part) 

STRPOS(string, substring)  STRPOS(string, substring) 

STRTOL(string, base)   

SUBSTRING(  string,  start_position,  number_characters ) 

SUBSTR(  string,  start_position,  number_characters ) 

TEXTLEN(expression)  LENGTH(expression) 

TRANSLATE(  expression,  characters_to_replace,   characters_to_substitute ) 

Can be implemented using UDF: CREATE TEMP FUNCTION translate(expression STRING, characters_to_replace STRING, characters_to_substitute STRING) AS (  IF(LENGTH(characters_to_replace) < LENGTH(characters_to_substitute) OR LENGTH(expression) < LENGTH(characters_to_replace), expression,  (SELECT  STRING_AGG(  IFNULL(  (SELECT ARRAY_CONCAT([c], SPLIT(characters_to_substitute, ''))[SAFE_OFFSET((  SELECT IFNULL(MIN(o2) + 1, 0) FROM UNNEST(SPLIT(characters_to_replace, 

  

29 

Page 31: Amazon Redshift to BigQuery SQL translation reference

 

'')) AS k WITH OFFSET o2  WHERE k = c))]  ),  ''),  '' ORDER BY o1)  FROM UNNEST(SPLIT(expression, '')) AS c WITH OFFSET o1  )) ); 

TRIM([BOTH] string)  TRIM(string) 

TRIM([BOTH] characters FROM string)  TRIM(string, characters) 

UPPER(string)  UPPER(string) 

 

Data type formatting functions 

Redshift  BigQuery 

CAST(expression AS type)  CAST(expression AS type) 

expression :: type  CAST(expression AS type) 

CONVERT(type, expression)  CAST(expression AS type) 

TO_CHAR(  timestamp_expression,  format ) 

FORMAT_TIMESTAMP(  format,  timestamp_expression )  Note: BigQuery and Redshift differ in how to specify a format string for timestamp_expression. 

TO_CHAR(  numeric_expression,  format ) 

FORMAT(format, numeric_expression)  Note: BigQuery and Redshift differ in how to specify a numeric format string. 

TO_DATE(date_string, format)  PARSE_DATE(date_string, format)  Note: BigQuery and Redshift differ in how to specify a format string for date_string. 

TO_NUMBER(string, format)  CAST(  FORMAT(  format,  numeric_expression  ) TO INT64 

  

30 

Page 32: Amazon Redshift to BigQuery SQL translation reference

 

)  Note: BigQuery and Redshift differ in how to specify a numeric format string. 

 BigQuery also supports SAFE_CAST(expression AS typename), which returns NULL if BigQuery is unable to perform a cast; for example, SAFE_CAST(“apple” AS INT64) returns NULL). 

DML syntax This section addresses differences in data management language syntax between Redshift and BigQuery. 

INSERT statement 

Redshift offers a configurable DEFAULT keyword for columns. In BigQuery, the DEFAULT value for nullable columns is NULL, and DEFAULT is not supported for required columns. Most Redshift INSERT statements are compatible with BigQuery. The following table shows exceptions.  Redshift  BigQuery 

INSERT INTO table (column1 [, ...]) DEFAULT VALUES 

INSERT [INTO] table (column1 [, ...]) VALUES (DEFAULT [, ...]) 

INSERT INTO table (column1, [,...]) VALUES (  SELECT ...  FROM ... )  

INSERT [INTO] table (column1, [,...]) SELECT ... FROM ... 

 BigQuery also supports inserting values using a subquery (where one of the values is computed using a subquery), which is not supported in Redshift. For example:  

INSERT INTO table (column1, column2) VALUES ('value_1', (  SELECT column2  FROM table2 )) 

 

COPY statement 

Redshift’s COPY command loads data into a table from data files or from an Amazon DynamoDB table. BigQuery does not use the SQL COPY command to load data, but you can 

  

31 

Page 33: Amazon Redshift to BigQuery SQL translation reference

 

use any of several non-SQL tools and options to load data into BigQuery tables. You can also use data pipeline sinks provided in Apache Spark or Apache Beam to write data into BigQuery.   

UPDATE statement 

Most Redshift UPDATE statements are compatible with BigQuery. The following table shows exceptions.  Redshift  BigQuery 

UPDATE table SET column = expression [,...] [FROM ...]  

UPDATE table SET column = expression [,...] [FROM ...] WHERE TRUE  Note: All UPDATE statements in BigQuery require a WHERE keyword, followed by a condition. 

UPDATE table SET column = DEFAULT [,...] [FROM ...] [WHERE ...] 

UPDATE table SET column = NULL [, ...] [FROM ...] WHERE ...  Note: BigQuery’s UPDATE command does not support DEFAULT values.   If the Redshift UPDATE statement does not include a WHERE clause, the BigQuery UPDATE statement should be conditioned WHERE TRUE. 

 

DELETE, TRUNCATE statements 

The DELETE and TRUNCATE statements are both ways to remove rows from a table without affecting the table schema or indexes.   In Redshift, TRUNCATE is recommended over an unqualified DELETE because it is faster and does not require a VACUUM and ANALYZE afterward. However, you can use DELETE statements to achieve the same effect.  In BigQuery, the DELETE statement must have a WHERE clause. For more information about DELETE in BigQuery, see the BigQuery DELETE examples in the DML documentation.  Redshift  BigQuery 

  

32 

Page 34: Amazon Redshift to BigQuery SQL translation reference

 

DELETE [FROM] table_name   TRUNCATE [TABLE] table_name  

DELETE FROM table_name WHERE TRUE  BigQuery DELETE statements require a WHERE clause. 

DELETE FROM table_name USING other_table WHERE table_name.id=other_table.id 

DELETE FROM table_name WHERE table_name.id IN (  SELECT id  FROM other_table )   DELETE FROM table_name WHERE EXISTS (  SELECT id  FROM other_table  WHERE table_name.id = other_table.id )  In Redshift, USING allows additional tables to be referenced in the WHERE clause. This can be achieved in BigQuery by using a subquery in the WHERE clause. 

 

MERGE statement 

The MERGE statement can combine INSERT, UPDATE, and DELETE operations into a single upsert statement and perform the operations atomically. The MERGE operation must match at most one source row for each target row.   Redshift does not support a single MERGE command. However, a merge operation can be performed in Redshift by performing INSERT, UPDATE, and DELETE operations in a transaction.  

Merge operation by replacing existing rows In Redshift, an overwrite of all of the columns in the target table can be performed using a DELETE statement and then an INSERT statement. The DELETE statement removes rows that should be updated, and then the INSERT statement inserts the updated rows. BigQuery tables are limited to 1,000 DML statements per day, so you should consolidate INSERT, UPDATE, and DELETE statements into a single MERGE statement as shown in the following table.   

  

33 

Page 35: Amazon Redshift to BigQuery SQL translation reference

 

 Redshift  BigQuery 

See Performing a merge operation by replacing existing rows.  CREATE TEMP TABLE temp_table;  INSERT INTO temp_table SELECT * FROM source WHERE source.filter = 'filter_exp';  BEGIN TRANSACTION;  DELETE FROM target USING temp_table WHERE target.key = temp_table.key;  INSERT INTO target SELECT * FROM temp_table;  END TRANSACTION;  DROP TABLE temp_table; 

MERGE target USING source ON target.key = source.key WHEN MATCHED AND source.filter = 'filter_exp' THEN  UPDATE SET   target.col1 = source.col1,  target.col2 = source.col2,  ...  Note: All columns must be listed if updating all columns. 

See Performing a merge operation by specifying a column list.  CREATE TEMP TABLE temp_table;  INSERT INTO temp_table SELECT * FROM source WHERE source.filter = 'filter_exp';  BEGIN TRANSACTION;  UPDATE target SET   col1 = temp_table.col1,  col2 = temp_table.col2 FROM temp_table WHERE target.key=temp_table.key;  INSERT INTO target SELECT * FROM   

MERGE target USING source ON target.key = source.key WHEN MATCHED AND source.filter = 'filter_exp' THEN  UPDATE SET   target.col1 = source.col1,  target.col2 = source.col2 

 

  

34 

Page 36: Amazon Redshift to BigQuery SQL translation reference

 

DDL syntax This section addresses differences in data definition language syntax between Redshift and BigQuery. 

SELECT INTO statement 

In Redshift, the SELECT INTO statement can be used to insert the results of a query into a new table, combining table creation and insertion.  Redshift  BigQuery 

SELECT expression, ... INTO table FROM ... 

INSERT table SELECT expression, ... FROM ... 

WITH subquery_table AS (  SELECT ... ) SELECT expression, ... INTO table FROM subquery_table ... 

INSERT table WITH subquery_table AS (  SELECT ... ) SELECT expression, ... FROM subquery_table ... 

SELECT expression INTO TEMP table FROM ...  SELECT expression INTO TEMPORARY table FROM ... 

BigQuery offers several ways to emulate temporary tables. See the Temporary tables section for more information. 

 

CREATE TABLE statement 

Most Redshift CREATE TABLE statements are compatible with BigQuery, except for the following syntax elements, which are not used in BigQuery:  Redshift  BigQuery 

CREATE TABLE table_name (  col1 data_type1 NOT NULL,  col2 data_type2 NULL,  col3 data_type3 UNIQUE,  col4 data_type4 PRIMARY KEY,  col5 data_type5 ) 

CREATE TABLE table_name (  col1 data_type1 NOT NULL,  col2 data_type2,  col3 data_type3,  col4 data_type4,  col5 data_type5, )   

  

35 

Page 37: Amazon Redshift to BigQuery SQL translation reference

 

Note: UNIQUE and PRIMARY KEY constraints are informational and are not enforced by the Redshift system. 

CREATE TABLE table_name (  col1 data_type1[,...]  table_constraints )  where table_constraints are:  [UNIQUE(column_name [, ... ])]  [PRIMARY KEY(column_name [, ...])]  [FOREIGN KEY(column_name [, ...])    REFERENCES reftable [(refcolumn)]  Note: UNIQUE and PRIMARY KEY constraints are informational and are not enforced by the Redshift system. 

CREATE TABLE table_name (  col1 data_type1[,...] ) PARTITION BY column_name CLUSTER BY column_name [, ...]  Note: BigQuery does not use UNIQUE, PRIMARY KEY, or FOREIGN KEY table constraints. To achieve similar optimization that these constraints provide during query execution, partition and cluster your BigQuery tables. CLUSTER BY supports up to 4 columns. 

CREATE TABLE table_name LIKE original_table_name 

Reference this example to learn how to use the INFORMATION_SCHEMA tables to copy column names, data types, and NOT NULL constraints to a new table.  

CREATE TABLE table_name (  col1 data_type1 ) BACKUP NO  Note: In Redshift, the BACKUP NO setting is specified to save processing time and reduce storage space.” 

The BACKUP NO table option is not used or needed because BigQuery automatically keeps up to 7 days of historical versions of all of your tables with no effect on processing time or billed storage.  

CREATE TABLE table_name (  col1 data_type1 ) table_attributes  where table_attributes are:  [DISTSTYLE {AUTO|EVEN|KEY|ALL}]   [DISTKEY (column_name)]  [[COMPOUND|INTERLEAVED] SORTKEY   (column_name [, ...])]    

BigQuery supports clustering, which allows storing keys in sorted order. 

CREATE TABLE table_name AS SELECT ... 

CREATE TABLE table_name AS SELECT ... 

CREATE TABLE IF NOT EXISTS table_name ... 

CREATE TABLE IF NOT EXISTS table_name ... 

  

36 

Page 38: Amazon Redshift to BigQuery SQL translation reference

 

 BigQuery also supports the DDL statement CREATE OR REPLACE TABLE statement, which overwrites a table if it already exists.  BigQuery’s CREATE TABLE statement also supports the following clauses, which do not have a Redshift equivalent: 

● PARTITION BY partition_statement ● CLUSTER BY clustering_column_list ● OPTIONS(table_options_list) 

 For more information about CREATE TABLE in BigQuery, see the BigQuery CREATE TABLE examples in the DML documentation.  

Temporary tables Redshift supports temporary tables, which are only visible within the current session. There are several ways to emulate temporary tables in BigQuery: 

● Dataset TTL: Create a dataset that has a short time to live (for example, one hour) so that any tables created in the dataset are effectively temporary because they won’t persist longer than the dataset’s time to live. You can prefix all of the table names in this dataset with temp to clearly denote that the tables are temporary. 

● Table TTL: Create a table that has a table-specific short time to live using DDL statements similar to the following: 

 CREATE TABLE temp.name (col1, col2, ...) OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)); 

CREATE VIEW statement 

The following table shows equivalents between Redshift and BigQuery for the CREATE VIEW statement.   

Redshift  BigQuery 

CREATE VIEW view_name AS SELECT ...  CREATE VIEW view_name AS SELECT ... 

CREATE OR REPLACE VIEW view_name AS SELECT ... 

CREATE OR REPLACE VIEW view_name AS  SELECT ... 

CREATE VIEW view_name  (column_name, ...) AS SELECT ... 

CREATE VIEW view_name  AS SELECT ... 

  

37 

Page 39: Amazon Redshift to BigQuery SQL translation reference

 

Not supported.  CREATE VIEW IF NOT EXISTS view_name OPTIONS(view_option_list) AS SELECT …  Creates a new view only if the view does not exist in the specified dataset. 

CREATE VIEW view_name  AS SELECT ... WITH NO SCHEMA BINDING  In Redshift, a late binding view is required in order to reference an external table. 

In BigQuery, to create a view, all referenced objects must already exist.  BigQuery allows you to query external data sources. 

User-defined functions (UDFs) A UDF lets you create functions for custom operations. These functions accept columns of input, perform actions, and return the result of those actions as a value.  Both Redshift and BigQuery support UDF using SQL expressions. Additionally, in Redshift you can create a Python-based UDF, and in BigQuery you can create a JavaScript-based UDF.  Refer to the Google Cloud BigQuery utilities GitHub repository for a library of common BigQuery UDFs. 

CREATE FUNCTION syntax 

The following table addresses differences in SQL UDF creation syntax between Redshift and BigQuery.  

Redshift  BigQuery 

CREATE [OR REPLACE] FUNCTION function_name  ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type IMMUTABLE AS $$  sql_function_definition $$ LANGUAGE sql  

CREATE [OR REPLACE] FUNCTION function_name  ([sql_arg_name sql_arg_data_type[,..]]) AS    sql_function_definition  Note: In BigQuery SQL UDF, return data type is optional. BigQuery infers the result type of the function from the SQL function body when a query calls the function.  

  

38 

Page 40: Amazon Redshift to BigQuery SQL translation reference

 

CREATE [OR REPLACE] FUNCTION  function_name  ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type { VOLATILE | STABLE | IMMUTABLE }  AS $$  sql_function_definition $$ LANGUAGE sql  

CREATE [OR REPLACE] FUNCTION function_name  ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type AS sql_function_definition   Note: Function volatility is not a configurable parameter in BigQuery. All BigQuery UDF volatility is equivalent to Redshift’s IMMUTABLE volatility (that is, it does not do database lookups or otherwise use information not directly present in its argument list).  

CREATE [OR REPLACE] FUNCTION  function_name  ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type IMMUTABLE AS $$  SELECT_clause $$ LANGUAGE sql  Note: Redshift supports only a SQL SELECT clause as function definition. Also, the SELECT clause cannot include any of the FROM, INTO, WHERE, GROUP BY, ORDER BY, and LIMIT clauses.  

CREATE [OR REPLACE] FUNCTION function_name  ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type AS sql_expression   Note: BigQuery supports any SQL expressions as function definition. However, referencing tables, views, or models is not supported. 

CREATE [OR REPLACE] FUNCTION  function_name  ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type IMMUTABLE AS $$  sql_function_definition $$ LANGUAGE sql  

CREATE [OR REPLACE] FUNCTION function_name  ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type AS sql_function_definition   Note: Language literal need not be specified in BigQuery SQL UDF. BigQuery interprets the SQL expression by default. Also, the Redshift dollar quoting ($$) is not supported in BigQuery.   

CREATE [OR REPLACE] FUNCTION function_name  (integer, integer) RETURNS integer IMMUTABLE AS $$  SELECT $1 + $2 $$ LANGUAGE sql 

CREATE [OR REPLACE] FUNCTION function_name  (x INT64, y INT64) RETURNS INT64 AS    SELECT x + y  Note: BigQuery UDFs require all input arguments to be named. The Redshift argument variables ($1, $2, …) are not supported in BigQuery.  

  

39 

Page 41: Amazon Redshift to BigQuery SQL translation reference

 

CREATE [OR REPLACE] FUNCTION function_name  (integer, integer) RETURNS integer IMMUTABLE AS $$  SELECT $1 + $2 $$ LANGUAGE sql  Note: Redshift does not support ANY TYPE for SQL UDFs. However, it supports using ANYELEMENT data type in python based UDFs.  

CREATE [OR REPLACE] FUNCTION function_name  (x ANY TYPE, y ANY TYPE) AS    SELECT x + y  Note: BigQuery supports using ANY TYPE as argument type. The function accepts an input of any type for this argument. For more information, see templated parameter in BigQuery.  

 BigQuery also supports the CREATE FUNCTION IF NOT EXISTS statement, which treats the query as successful and takes no action if a function with the same name already exists.  BigQuery’s CREATE FUNCTION statement also supports creating TEMPORARY or TEMP functions, which do not have a Redshift equivalent.  See calling UDFs for details on executing a BigQuery persistent UDF. 

DROP FUNCTION syntax 

The following table addresses differences in DROP FUNCTION syntax between Redshift and BigQuery.  

Redshift  BigQuery 

DROP FUNCTION  function_name  ( [arg_name] arg_type [, ...] ) [ CASCADE | RESTRICT ]  

DROP FUNCTION dataset_name.function_name  Note: BigQuery does not require using the function’s signature for deleting the function. Also, removing function dependencies is not supported in BigQuery.  

 BigQuery also supports the DROP FUNCTION IF EXISTS statement, which deletes the function only if the function exists in the specified dataset.  BigQuery requires that you specify the project_name if the function is not located in the current project. 

  

40 

Page 42: Amazon Redshift to BigQuery SQL translation reference

 

UDF components 

This section highlights the similarities and differences in UDF components between Redshift and BigQuery.  

Component  Redshift  BigQuery 

Name  Redshift recommends using the prefix _f for function names to avoid conflicts with existing or future built-in SQL function names. 

In BigQuery, you can use any custom function name. 

Arguments   Arguments are optional. You can use name and data types for Python UDF arguments and only data types for SQL UDF arguments. In SQL UDF, you must refer to arguments using $1, $2, and so on. Redshift also restricts the number of arguments to 32. 

Arguments are optional, but if you specify arguments, they must use both name and data types for both JavaScript and SQL UDFs. The maximum number of arguments for a persistent UDF is 256. 

Data type  Redshift supports a different set of data types for SQL and Python UDFs.   For a Python UDF, the data type might also be ANYELEMENT.  You must specify a RETURN data type for both SQL and Python UDFs.  See Data types in this document for equivalents between data types in Redshift and in BigQuery.  

BigQuery supports a different set of data types for SQL and JavaScript UDFs.  For a SQL UDF, the data type might also be ANY TYPE. For more information, see templated parameters in BigQuery.  The RETURN data type is optional for SQL UDFs.  See SQL type encodings in JavaScript for information on how BigQuery data types map to JavaScript data types.  

Definition  For both SQL and Python UDFs, you must enclose the function definition using dollar quoting, as in a pair of dollar signs ($$), to indicate the start and end of the function statements.  For SQL UDF, Redshift supports only a SQL SELECT clause as the function definition. Also, the SELECT clause cannot include any 

In BigQuery, you need to enclose the JavaScript code in quotes. See Quoting rules for more information.  For SQL UDF, you can use any SQL expressions as the function definition. However, BigQuery doesn’t support referencing tables, views, or models.   

  

41 

Page 43: Amazon Redshift to BigQuery SQL translation reference

 

of the FROM, INTO, WHERE, GROUP BY, ORDER BY, and LIMIT clauses.  For Python UDF, you can write a Python program using the Python 2.7 Standard Library or import your custom modules by creating one using the CREATE LIBRARY command. 

For JavaScript UDF, you can include external code libraries directly using the OPTIONS section. You can also use the BigQuery UDF test tool to test your functions. 

Language  You must use the LANGUAGE literal to specify the language as either sql for SQL UDF or plpythonu for Python UDF. 

You need not specify LANGUAGE for SQL UDF but must specify the language as js for JavaScript UDF. 

State  Redshift does not support creating temporary UDFs.   Redshift provides an option to define the volatility of a function using VOLATILE, STABLE, or IMMUTABLE literals. This is used for optimization by the query optimizer. 

BigQuery supports both persistent and temporary UDFs. You can reuse persistent UDFs across multiple queries, whereas you can only use temporary UDFs in a single query.   Function volatility is not a configurable parameter in BigQuery. All BigQuery UDF volatility is equivalent to Redshift’s IMMUTABLE volatility. 

Security and privileges 

To create a UDF, you must have permission for usage on language for SQL or plpythonu (Python). By default, USAGE ON LANGUAGE SQL is granted to PUBLIC, but you must explicitly grant USAGE ON LANGUAGE PLPYTHONU to specific users or groups.  Also, you must be a superuser to replace a UDF. 

Granting explicit permissions for creating or deleting any type of UDF is not necessary in BigQuery. Any user assigned a role of BigQuery Data Editor (having bigquery.routines.* as one of the permissions) can create or delete functions for the specified dataset.   BigQuery also supports creating custom roles. This can be managed using Cloud IAM. 

Limits  See Python UDF limits.  See UDF limits. 

  

   

  

42 

Page 44: Amazon Redshift to BigQuery SQL translation reference

 

Metadata and transaction SQL statements 

Redshift  BigQuery 

SELECT * FROM STL_ANALYZE WHERE name = 'T'; 

Not used in BigQuery. You don’t need to gather statistics in order to improve query performance. To get information about your data distribution, you can use approximate aggregate functions. 

ANALYZE [[ table_name[(column_name [, ...])]] 

Not used in BigQuery. 

LOCK TABLE table_name;  Not used in BigQuery. 

BEGIN TRANSACTION; SELECT ... END TRANSACTION; 

BigQuery uses snapshot isolation. For details, see Consistency guarantees elsewhere in this document. 

EXPLAIN ...  Not used in BigQuery.  Similar features are the query plan explanation in the BigQuery Cloud Console and the slot allocation, and in audit logging in Cloud Monitoring. 

SELECT * FROM SVV_TABLE_INFO WHERE table = 'T'; 

SELECT  * EXCEPT(is_typed) FROM mydataset.INFORMATION_SCHEMA.TABLES;  For more information see Introduction to BigQuery INFORMATION_SCHEMA. 

VACUUM [table_name]  Not used in BigQuery. BigQuery clustered tables are automatically sorted.  

Multi-statement and multi-line SQL statements 

Redshift supports transactions (sessions) and therefore supports statements separated by semicolons that are consistently executed together.   BigQuery scripting enables you to send multiple statements to BigQuery in one request, to use variables, and to use control flow statements such as IF and WHILE. In BigQuery, a script is a SQL statement list to be executed in sequence. A SQL statement list is a list of any valid BigQuery statements that are separated by semicolons. The scripting feature in BigQuery runs these statements as individual queries without rollback. 

  

43 

Page 45: Amazon Redshift to BigQuery SQL translation reference

 

Procedural SQL statements 

CREATE PROCEDURE statement 

Redshift  BigQuery 

CREATE or REPLACE PROCEDURE  CREATE PROCEDURE if a name is required. Otherwise, use inline with BEGIN or in a single line with CREATE TEMP FUNCTION. 

CALL  CALL  

Variable declaration and assignment 

Redshift  BigQuery 

DECLARE  DECLARE Declares a variable of the specified type. 

SET  SET Sets a variable to have the value of the provided expression, or sets multiple variables at the same time based on the result of multiple expressions. 

 

Error condition handlers 

In Redshift, an error encountered during the execution of a stored procedure ends the execution flow, ends the transaction, and rolls back the transaction. This occurs because subtransactions are not supported..In a Redshift-stored procedure, the only supported handler_statement is RAISE. In BigQuery, error handling is a core feature of the main control flow, similar to what other languages provide with TRY ... CATCH blocks.  

Redshift  BigQuery 

BEGIN ... EXCEPTION WHEN OTHERS THEN 

BEGIN ... EXCEPTION WHEN ERROR THEN 

RAISE  RAISE 

[ <<label>> ] [ DECLARE  declarations ] BEGIN  statements EXCEPTION 

BEGIN  BEGIN  ...  EXCEPTION WHEN ERROR THEN  SELECT 1/0;  END; 

  

44 

Page 46: Amazon Redshift to BigQuery SQL translation reference

 

WHEN OTHERS THEN  handler_statements END;     

EXCEPTION WHEN ERROR THEN  -- The exception thrown from the inner exception handler lands here. END;  

Cursor declarations and operations 

Because BigQuery doesn’t support cursors or sessions, the following statements aren’t used in BigQuery: 

● DECLARE cursor_name CURSOR [FOR] ... ● PREPARE plan_name [ (datatype [, ...] ) ] AS statement ● OPEN cursor_name FOR SELECT ...    ● FETCH [ NEXT | ALL | {FORWARD [ count | ALL ] } ] FROM cursor_name ● CLOSE cursor_name; 

 If you’re using the cursor to return a result set, you can achieve similar behavior using temporary tables in BigQuery. 

Dynamic SQL statements 

The scripting feature in BigQuery supports dynamic SQL statements like those shown in the following table.  

Redshift  BigQuery 

EXECUTE  EXECUTE IMMEDIATE 

Flow-of-control statements 

Redshift  BigQuery 

IF..THEN..ELSIF..THEN..ELSE..END IF  IF condition THEN stmts ELSE stmts END IF 

name CURSOR [ ( arguments ) ] FOR query 

Cursors or sessions are not used in BigQuery. 

[<<label>>] LOOP  statements END LOOP [ label ]; 

LOOP  sql_statement_list END LOOP; 

WHILE condition LOOP stmts END LOOP  WHILE condition DO stmts END WHILE 

EXIT  BREAK 

  

45 

Page 47: Amazon Redshift to BigQuery SQL translation reference

 

Consistency guarantees and transaction isolation Both Redshift and BigQuery are atomic—that is, ACID-compliant on a per-mutation level across many rows. 

Transactions 

Redshift supports serializable isolation by default for transactions. Redshift lets you specify any of the four SQL standard transaction isolation levels but processes all isolation levels as serializable.  BigQuery helps ensure optimistic concurrency control (first to commit wins) with snapshot isolation, in which a query reads the last committed data before the query starts. This approach guarantees the same level of consistency on a per-row, per-mutation basis and across rows within the same DML statement, yet avoids deadlocks. In the case of multiple DML updates against the same table, BigQuery switches to pessimistic concurrency control. Load jobs can run completely independently and append to tables. However, BigQuery does not yet provide an explicit transaction boundary or session. 

Rollback 

If Redshift encounters any error while running a stored procedure, it rolls back all changes made in a transaction . Additionally, you can use the ROLLBACK transaction control statement in a stored procedure to discard all changes. There is also no concept of an explicit rollback in BigQuery because there is no explicit transaction boundary in BigQuery. The workarounds are table decorators or using FOR SYSTEM_TIME AS OF. 

Database limits Check the BigQuery public documentation for the latest quotas and limits. Many quotas for large-volume users can be raised by contacting the Cloud support team. The following table shows a comparison of the Redshift and BigQuery database limits.   

Limit  Redshift  BigQuery 

Tables per database for large and xlarge cluster node types 

9,900   Unrestricted 

Tables per database for 8xlarge cluster node types 

20,000  Unrestricted 

User-defined databases you can create per cluster 

60  Unrestricted 

Maximum row size  4 MB  100 MB 

 

  

46