amazon redshift to bigquery sql translation reference
TRANSCRIPT
Amazon Redshift to BigQuery SQL translation reference
About this SQL translation reference 3
Data types 3 Implicit conversion types 5 Explicit conversion types 5
Query syntax 5 SELECT statement 6 FROM clause 6 JOIN types 7 WITH clause 7 Set operators 7 ORDER BY clause 8 Conditions 9 Functions 10
Aggregate functions 10 Bitwise aggregate functions 11 Window functions 11 Conditional expressions 15 Date and time functions 16 Mathematical operators 22 Math functions 24 String functions 25 Data type formatting functions 30
DML syntax 31 INSERT statement 31 COPY statement 31 UPDATE statement 32 DELETE, TRUNCATE statements 32 MERGE statement 33
Merge operation by replacing existing rows 33
DDL syntax 35 SELECT INTO statement 35 CREATE TABLE statement 35
Temporary tables 37 CREATE VIEW statement 37
User-defined functions (UDFs) 38
1
CREATE FUNCTION syntax 38 DROP FUNCTION syntax 40 UDF components 41
Metadata and transaction SQL statements 43 Multi-statement and multi-line SQL statements 43
Procedural SQL statements 44 CREATE PROCEDURE statement 44 Variable declaration and assignment 44 Error condition handlers 44 Cursor declarations and operations 45 Dynamic SQL statements 45 Flow-of-control statements 45
Consistency guarantees and transaction isolation 46 Transactions 46 Rollback 46
Database limits 46
2
About this SQL translation reference This document details the similarities and differences in SQL syntax between Redshift and BigQuery. The document can help accelerate the planning and execution of moving your enterprise data warehouse (EDW) to BigQuery. Redshift data warehousing is designed to work with Redshift-specific SQL syntax. Scripts written for Redshift might need to be altered before you can use them in BigQuery because the SQL dialects vary between the services.
Note: In some cases, there is no direct mapping between a SQL element in Redshift and BigQuery. However, in most cases, you can achieve the same functionality in BigQuery that you can in Redshift using alternative means, as shown in the examples in this document.
This document is part of a series that discusses migrating data from Redshift to BigQuery. It is a companion to the following document:
● Amazon Redshift to BigQuery migration guide
Highlights
Purpose To detail common similarities and differences in SQL syntax between Redshift and BigQuery and help accelerate the planning and execution of moving your enterprise data warehouse (EDW) to BigQuery.
Intended audience Enterprise architects, DBAs, application developers, and IT security.
Key assumptions That the audience is familiar with Redshift and is looking for guidance on transitioning to BigQuery.
Data types This section shows equivalents between data types in Redshift and in BigQuery.
Redshift BigQuery Notes
Data type Alias
SMALLINT INT2 INT64 Redshift’s SMALLINT is 2 bytes, whereas BigQuery’s INT64 is 8 bytes.
3
INTEGER INT, INT4
INT64 Redshift’s INTEGER is 4 bytes, whereas BigQuery’s INT64 is 8 bytes.
BIGINT INT8 INT64 Both Redshift’s BIGINT and BigQuery’s INT64 are 8 bytes.
DECIMAL NUMERIC NUMERIC
REAL FLOAT4 FLOAT64 Redshift’s REAL is 4 bytes, whereas BigQuery’s FLOAT64 is 8 bytes.
DOUBLE PRECISION
FLOAT8, FLOAT
FLOAT64
BOOLEAN BOOL BOOL Redshift’s BOOLEAN can use TRUE, t, true, y, yes, and 1 as valid literal values for true. BigQuery’s BOOL data type uses case-insensitive TRUE.
CHAR CHARACTER, NCHAR, BPCHAR
STRING
VARCHAR CHARACTER VARYING, NVARCHAR, TEXT
STRING
DATE DATE
TIMESTAMP TIMESTAMP WITHOUT TIME ZONE
DATETIME
TIMESTAMPTZ TIMESTAMP WITH TIME ZONE
TIMESTAMP Note: In BigQuery, time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC using (-|+)HH:MM are supported, but time zone abbreviations such as PDT are not supported.
GEOMETRY GEOGRAPHY Support for querying geospatial data. BigQuery also has the following data types that do not have a direct Redshift analog:
● ARRAY ● BYTES ● TIME ● STRUCT
4
Implicit conversion types
When migrating to BigQuery, you need to convert most of your Redshift implicit conversions to BigQuery’s explicit conversions except for the following data types, which BigQuery implicitly converts. BigQuery performs implicit conversions for the following data types:
From BigQuery type To BigQuery type
INT64 FLOAT64
INT64 NUMERIC
NUMERIC FLOAT64 BigQuery also performs implicit conversions for the following literals:
From BigQuery type To BigQuery type
STRING literal (e.g. "2008-12-25")
DATE
STRING literal (e.g. "2008-12-25 15:30:00")
TIMESTAMP
STRING literal (e.g. "2008-12-25T07:30:00")
DATETIME
STRING literal (e.g. "15:30:00”)
TIME
Explicit conversion types
You can convert Redshift data types that BigQuery doesn’t implicitly convert using BigQuery’s CAST(expression AS type) function or any of the DATE and TIMESTAMP conversion functions. When migrating your queries, change any occurrences of the Redshift CONVERT(type, expression) function (or the :: syntax) to BigQuery’s CAST(expression AS type) function, as shown in the table in the Data type formatting functions section.
Query syntax This section addresses differences in query syntax between Redshift and BigQuery.
5
SELECT statement
Most Redshift SELECT statements are compatible with BigQuery. The following table contains a list of minor differences.
Redshift BigQuery
SELECT TOP number expression FROM table
SELECT expression FROM table ORDER BY expression DESC LIMIT number
SELECT x/total AS probability, ROUND(100 * probability, 1) AS pct FROM raw_data Note: Redshift supports creating and referencing an alias in the same SELECT statement.
SELECT x/total AS probability, ROUND(100 * (x/total), 1) AS pct FROM raw_data
BigQuery also supports the following expressions in SELECT statements, which do not have a Redshift equivalent:
● EXCEPT ● REPLACE
FROM clause
A FROM clause in a query lists the table references that data is selected from. In Redshift, possible table references include tables, views, and subqueries. All of these table references are supported in BigQuery. BigQuery tables can be referenced in the FROM clause using the following:
● [project_id].[dataset_id].[table_name] ● [dataset_id].[table_name] ● [table_name]
BigQuery also supports additional table references:
● Historical versions of the table definition and rows using FOR SYSTEM_TIME AS OF. ● Field paths, or any path that resolves to a field within a data type (such as a STRUCT). ● Flattened arrays.
6
JOIN types
Both Redshift and BigQuery support the following types of join:
● [INNER] JOIN ● LEFT [OUTER] JOIN ● RIGHT [OUTER] JOIN ● FULL [OUTER] JOIN ● CROSS JOIN and the equivalent implicit comma cross join
The following table contains a list of minor differences.
Redshift BigQuery
SELECT col1 FROM table1 NATURAL INNER JOIN table2
SELECT col1 FROM table1 INNER JOIN table2 USING (col1, col2 [, ...]) Note: In BigQuery, JOIN clauses require a JOIN condition unless the clause is a CROSS JOIN or one of the joined tables is a field within a data type or an array.
WITH clause
A BigQuery WITH clause contains one or more named subqueries that execute when a subsequent SELECT statement references them. Redshift WITH clauses behave the same as BigQuery’s with the exception that you can evaluate the clause once and reuse its results.
Set operators
There are some minor differences between Redshift set operators and BigQuery set operators. However, all set operations that are feasible in Redshift are replicable in BigQuery.
Redshift BigQuery
SELECT * FROM table1 UNION SELECT * FROM table2
SELECT * FROM table1 UNION DISTINCT SELECT * FROM table2 Note: Both BigQuery and Redshift support the UNION ALL operator.
SELECT * FROM table1 INTERSECT SELECT * FROM table2
SELECT * FROM table1 INTERSECT DISTINCT SELECT * FROM table2
7
SELECT * FROM table1 EXCEPT SELECT * FROM table2
SELECT * FROM table1 EXCEPT DISTINCT SELECT * FROM table2
SELECT * FROM table1 MINUS SELECT * FROM table2
SELECT * FROM table1 EXCEPT DISTINCT SELECT * FROM table2
SELECT * FROM table1 UNION SELECT * FROM table2 EXCEPT SELECT * FROM table3
SELECT * FROM table1 UNION ALL ( SELECT * FROM table2 EXCEPT SELECT * FROM table3 ) Note: BigQuery requires parentheses to separate different set operations. If the same set operator is repeated, parentheses are not necessary.
ORDER BY clause
There are some minor differences between Redshift ORDER BY clauses and BigQuery ORDER BY clauses.
Redshift BigQuery
In Redshift, NULLS are ranked last by default (ascending order).
In BigQuery, NULLS are ranked first by default (ascending order).
SELECT * FROM table ORDER BY expression LIMIT ALL
SELECT * FROM table ORDER BY expression Note: BigQuery does not use the LIMIT ALL syntax, but ORDER BY sorts all rows by default, resulting in the same behavior as Redshift’s LIMIT ALL clause. We highly recommend including a LIMIT clause with every ORDER BY clause. Ordering all result rows unnecessarily degrades query execution performance.
SELECT * FROM table ORDER BY expression OFFSET 10
SELECT * FROM table ORDER BY expression LIMIT count OFFSET 10 Note: In BigQuery, OFFSET must be used together with a LIMIT count. Make sure to set the count INT64 value to the
8
minimum necessary ordered rows. Ordering all result rows unnecessarily degrades query execution performance.
Conditions
The following table shows Redshift conditions, or predicates, that are specific to Redshift and must be converted to their BigQuery equivalent.
Redshift BigQuery
a = ANY (subquery) a = SOME (subquery)
a IN subquery
a <> ALL (subquery) a != ALL (subquery)
a NOT IN subquery
a IS UNKNOWN a IS NULL
expression ILIKE pattern LOWER(expression) LIKE LOWER(pattern)
expression LIKE pattern ESCAPE 'escape_char'
expression LIKE pattern Note: BigQuery does not support custom escape characters. You must use two backslashes \\ as escape characters for BigQuery.
expression [NOT] SIMILAR TO pattern
IF( LENGTH( REGEXP_REPLACE( expression, pattern, '' ) = 0, True, False ) Note: If NOT is specified, wrap the above IF expression in a NOT expression as shown below: NOT( IF( LENGTH(... )
expression [!] ~ pattern [NOT] REGEXP_CONTAINS( expression, regex )
9
Functions
The following sections list Redshift functions and their BigQuery equivalents.
Aggregate functions The following table shows mappings between common Redshift aggregate, aggregate analytic, and approximate aggregate functions with their BigQuery equivalents.
Redshift BigQuery
APPROXIMATE COUNT(DISTINCT expression) APPROX_COUNT_DISTINCT(expression)
APPROXIMATE PERCENTILE_DISC( percentile ) WITHIN GROUP (ORDER BY expression)
APPROX_QUANTILES(expression, 100)[OFFSET(CAST(TRUNC(percentile * 100) as INT64))]
AVG([DISTINCT] expression) AVG([DISTINCT] expression)
COUNT(expression) COUNT(expression)
LISTAGG( [DISTINCT] aggregate_expression [, delimiter] ) [WITHIN GROUP (ORDER BY order_list)]
STRING_AGG( [DISTINCT] aggregate_expression [, delimiter] [ORDER BY order_list] )
MAX(expression) MAX(expression)
MEDIAN(median_expression) PERCENTILE_CONT( median_expression, 0.5 ) OVER()
MIN(expression) MIN(expression)
PERCENTILE_CONT( percentile ) WITHIN GROUP (ORDER BY expression)
PERCENTILE_CONT( median_expression, percentile ) OVER() Note: Does not cover aggregation use cases.
STDDEV([DISTINCT] expression) STDDEV([DISTINCT] expression)
STDDEV_SAMP([DISTINCT] expression) STDDEV_SAMP([DISTINCT] expression)
STDDEV_POP([DISTINCT] expression) STDDEV_POP([DISTINCT] expression)
SUM([DISTINCT] expression) SUM([DISTINCT] expression)
VARIANCE([DISTINCT] expression) VARIANCE([DISTINCT] expression)
VAR_SAMP([DISTINCT] expression) VAR_SAMP([DISTINCT] expression)
VAR_POP([DISTINCT] expression) VAR_POP([DISTINCT] expression)
10
BigQuery also offers the following aggregate, aggregate analytic, and approximate aggregate functions, which do not have a direct analogue in Redshift:
● ANY_VALUE ● APPROX_TOP_COUNT ● APPROX_TOP_SUM ● ARRAY_AGG ● ARRAY_CONCAT_AGG ● COUNTIF ● CORR ● COVAR_POP ● COVAR_SAMP
Bitwise aggregate functions The following table shows mappings between common Redshift bitwise aggregate functions with their BigQuery equivalents.
Redshift BigQuery
BIT_AND(expression) BIT_ADD(expression)
BIT_OR(expression) BIT_OR(expression)
BOOL_AND(expression) LOGICAL_AND(expression)
BOOL_OR(expression) LOGICAL_OR(expression)
BigQuery also offers the following bit-wise aggregate function, which does not have a direct analogue in Redshift:
● BIT_XOR
Window functions The following table shows mappings between common Redshift window functions with their BigQuery equivalents. Windowing functions in BigQuery include analytic aggregate functions, aggregate functions, navigation functions, and numbering functions.
Redshift BigQuery
AVG(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
AVG(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
11
COUNT(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
COUNT(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
CUME_DIST() OVER ( [PARTITION BY partition_expression] [ORDER BY order_list] )
CUME_DIST() OVER ( [PARTITION BY partition_expression] ORDER BY order_list )
DENSE_RANK() OVER ( [PARTITION BY expr_list] [ORDER BY order_list] )
DENSE_RANK() OVER ( [PARTITION BY expr_list] ORDER BY order_list )
FIRST_VALUE(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
FIRST_VALUE(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
LAST_VALUE(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
LAST_VALUE(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
LAG(value_expr [, offset]) OVER ( [PARTITION BY window_partition] ORDER BY window_ordering )
LAG(value_expr [, offset]) OVER ( [PARTITION BY window_partition] ORDER BY window_ordering )
LEAD(value_expr [, offset]) OVER ( [PARTITION BY window_partition] ORDER BY window_ordering )
LEAD(value_expr [, offset]) OVER ( [PARTITION BY window_partition] ORDER BY window_ordering )
LISTAGG( [DISTINCT] expression [, delimiter]
STRING_AGG( [DISTINCT] aggregate_expression [, delimiter]
12
) [WITHIN GROUP (ORDER BY order_list)] OVER ( [PARTITION BY partition_expression] )
) OVER ( [PARTITION BY partition_list] [ORDER BY order_list] )
MAX(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
MAX(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
MEDIAN(median_expression) OVER ( [PARTITION BY partition_expression] )
PERCENTILE_CONT( median_expression, 0.5 ) OVER ( [PARTITION BY partition_expression] )
MIN(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
MIN(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
NTH_VALUE(expression, offset) OVER ( [PARTITION BY window_partition] [ORDER BY window_ordering frame_clause] )
NTH_VALUE(expression, offset) OVER ( [PARTITION BY window_partition] ORDER BY window_ordering [frame_clause] )
NTILE(expr) OVER ( [PARTITION BY expression_list] [ORDER BY order_list] )
NTILE(expr) OVER ( [PARTITION BY expression_list] ORDER BY order_list )
PERCENT_RANK() OVER ( [PARTITION BY partition_expression] [ORDER BY order_list] )
PERCENT_RANK() OVER ( [PARTITION BY partition_expression] ORDER BY order_list )
13
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expr) OVER ( [PARTITION BY expr_list] )
PERCENTILE_CONT(expr, percentile) OVER ( [PARTITION BY expr_list] )
PERCENTILE_DISC(percentile) WITHIN GROUP (ORDER BY expr) OVER ( [PARTITION BY expr_list] )
PERCENTILE_DISC(expr, percentile) OVER ( [PARTITION BY expr_list] )
RANK() OVER ( [PARTITION BY expr_list] [ORDER BY order_list] )
RANK() OVER ( [PARTITION BY expr_list] ORDER BY order_list )
RATIO_TO_REPORT(ratio_expression) OVER ( [PARTITION BY partition_expression] )
ratio_expression/SUM(ratio_expression) OVER ( [PARTITION BY partition_expression] )
ROW_NUMBER() OVER ( [PARTITION BY expr_list] [ORDER BY order_list] )
ROW_NUMBER() OVER ( [PARTITION BY expr_list] [ORDER BY order_list] )
STDDEV(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
STDDEV(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
STDDEV_SAMP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
STDDEV_SAMP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
14
STDDEV_POP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
STDDEV_POP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
SUM(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
SUM(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
VAR_SAMP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
VAR_SAMP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
VAR_POP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
VAR_POP(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
VARIANCE(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list frame_clause] )
VARIANCE(expression) OVER ( [PARTITION BY expr_list] [ORDER BY order_list] [frame_clause] )
Conditional expressions The following table shows mappings between common Redshift conditional expressions with their BigQuery equivalents.
Redshift BigQuery
CASE expression WHEN value THEN result [WHEN...] [ELSE else_result] END
CASE expression WHEN value THEN result [WHEN...] [ELSE else_result] END
COALESCE(expression1[, ...]) COALESCE(expression1[, ...])
DECODE( CASE expression
15
expression, search1, result1 [, search2, result2...] [, default] )
WHEN value1 THEN result1 [WHEN value2 THEN result2] [ELSE default] END
GREATEST(value [, ...]) GREATEST(value [, ...])
LEAST(value [, ...]) LEAST(value [, ...])
NVL(expression1[, ...]) COALESCE(expression1[, ...])
NVL2( expression, not_null_return_value, null_return_value )
IF( expression IS NULL, null_return_value, not_null_return_value )
NULLIF(expression1, expression2) NULLIF(expression1, expression2)
BigQuery also offers the following conditional expressions, which do not have a direct analogue in Redshift:
● IF ● IFNULL
Date and time functions The following table shows mappings between common Redshift date and time functions with their BigQuery equivalents. BigQuery data and time functions include date functions, datetime functions, time functions, and timestamp functions. Keep in mind that functions that seem identical in Redshift and BigQuery might return different data types.
Redshift BigQuery
ADD_MONTHS(date, integer) CAST( DATE_ADD( date, INTERVAL integer MONTH ) AS TIMESTAMP )
16
timestamptz_or_timestamp AT TIME ZONE timezone
PARSE_TIMESTAMP( "%c%z", FORMAT_TIMESTAMP( "%c%z", timestamptz_or_timestamp, timezone ) ) Note: Time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC (-HH:MM) are supported, but time zone abbreviations (such as PDT) are not supported.
CONVERT_TIMEZONE( [source_timezone], target_timezone, timestamp )
PARSE_TIMESTAMP( "%c%z", FORMAT_TIMESTAMP( "%c%z", timestamp, target_timezone ) ) Note: source_timezone is UTC in BigQuery.
CURRENT_DATE Note: Returns start date for the current transaction in the current session time zone (UTC by default).
CURRENT_DATE() Note: Returns start date for the current statement in UTC time zone.
DATE_CMP(date1, date2) CASE WHEN date1 = date2 THEN 0 WHEN date1 > date2 THEN 1 ELSE -1 END
17
DATE_CMP_TIMESTAMP(date1, date2) CASE WHEN date1 = CAST(date2 AS DATE) THEN 0 WHEN date1 > CAST(date2 AS DATE) THEN 1 ELSE -1 END
DATE_CMP_TIMESTAMPTZ(date, timestamptz) CASE WHEN date > DATE(timestamptz) THEN 1 WHEN date < DATE(timestamptz) THEN -1 ELSE 0 END
DATE_PART_YEAR(date) EXTRACT(YEAR FROM date)
DATEADD(date_part, interval, date) CAST( DATE_ADD( date, INTERVAL interval datepart ) AS TIMESTAMP )
DATEDIFF( date_part, date_expression1, date_expression2 )
DATE_DIFF( date_expression1, date_expression2, date_part )
DATE_PART(date_part, date) EXTRACT(date_part FROM date)
DATE_TRUNC('date_part', timestamp) TIMESTAMP_TRUNC(timestamp, date_part)
EXTRACT(date_part FROM timestamp) EXTRACT(date_part FROM timestamp)
GETDATE() PARSE_TIMESTAMP( "%c", FORMAT_TIMESTAMP( "%c", CURRENT_TIMESTAMP() ) )
INTERVAL_CMP(interval_literal1, interval_literal2)
For intervals in Redshift, there are 360 days in a year. In BigQuery, you can use the following UDF to parse a Redshift interval and translate it to seconds. CREATE TEMP FUNCTION parse_interval(interval_literal STRING) AS (
18
(select sum(case when unit in ('minutes', 'minute', 'm' ) then num * 60 when unit in ('hours', 'hour', 'h') then num * 60 * 60 when unit in ('days', 'day', 'd' ) then num * 60 * 60 * 24 when unit in ('weeks', 'week', 'w') then num * 60 * 60 * 24 * 7 when unit in ('months', 'month' ) then num * 60 * 60 * 24 * 30 when unit in ('years', 'year') then num * 60 * 60 * 24 * 360 else num end) from ( select cast(regexp_extract(value, r'^[0-9]*\.?[0-9]+') as numeric) num, substr(value, length(regexp_extract(value, r'^[0-9]*\.?[0-9]+')) + 1) unit from UNNEST(SPLIT(replace(interval_literal, ' ', ''), ',')) value ))); To compare interval literals, perform: IF( parse_interval(interval_literal1) > parse_interval(interval_literal2), 1, IF( parse_interval(interval_literal1) > parse_interval(interval_literal2), -1, 0 ) )
LAST_DAY(date) DATE_SUB( DATE_ADD( date, INTERVAL 1 MONTH ), INTERVAL 1 DAY )
19
MONTHS_BETWEEN(date1, date2) DATE_DIFF( date1, date2, MONTH )
NEXT_DAY(date, day) DATE_ADD( DATE_TRUNC( date, WEEK(day) ), INTERVAL 1 WEEK )
SYSDATE Note: Returns start timestamp for the current transaction in the current session time zone (UTC by default).
CURRENT_TIMESTAMP() Note: Returns start timestamp for the current statement in UTC time zone.
TIMEOFDAY() FORMAT_TIMESTAMP("%a %b %d %H:%M:%E6S %E4Y %Z", CURRENT_TIMESTAMP())
TIMESTAMP_CMP( timestamp1, timestamp2 )
CASE WHEN timestamp1 = timestamp2 THEN 0 WHEN timestamp1 > timestamp2 THEN 1 ELSE -1 END
TIMESTAMP_CMP_DATE( timestamp, date )
CASE WHEN EXTRACT(DATE FROM timestamp) = date THEN 0 WHEN EXTRACT(DATE FROM timestamp) > date THEN 1 ELSE -1 END
TIMESTAMP_CMP_TIMESTAMPTZ( timestamp, timestamptz ) Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC.
CASE WHEN timestamp = timestamptz THEN 0 WHEN timestamp > timestamptz THEN 1 ELSE -1 END Note: BigQuery compares timestamps in UTC time zone.
TIMESTAMPTZ_CMP( CASE
20
timestamptz1, timestamptz2 ) Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC.
WHEN timestamptz1 = timestamptz2 THEN 0 WHEN timestamptz1 > timestamptz2 THEN 1 ELSE -1 END Note: BigQuery compares timestamps in UTC time zone.
TIMESTAMPTZ_CMP_DATE( timestamptz, date ) Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC.
CASE WHEN EXTRACT(DATE FROM timestamptz) = date THEN 0 WHEN EXTRACT(DATE FROM timestamptz) > date THEN 1 ELSE -1 END Note: BigQuery compares timestamps in UTC time zone.
TIMESTAMPTZ_CMP_TIMESTAMP( timestamptz, Timestamp ) Note: Redshift compares timestamps in the user session defined time zone. Default user session time zone is UTC.
CASE WHEN timestamp = timestamptz THEN 0 WHEN timestamp > timestamptz THEN 1 ELSE -1 END Note: BigQuery compares timestamps in UTC time zone.
TIMEZONE( timezone, Timestamptz_or_timestamp )
PARSE_TIMESTAMP( "%c%z", FORMAT_TIMESTAMP( "%c%z", timestamptz_or_timestamp, timezone ) ) Note: Time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC (-HH:MM) are
21
supported but time zone abbreviations (such as PDT) are not supported.
TO_TIMESTAMP(timestamp, format)
PARSE_TIMESTAMP( format, FORMAT_TIMESTAMP( format, timestamp ) ) Note: BigQuery follows a different set of format elements. Time zones are used when parsing timestamps or formatting timestamps for display. A string-formatted timestamp might include a time zone, but when BigQuery parses the string, it stores the timestamp in the equivalent UTC time. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zone names or offset from UTC (-HH:MM) are supported in the format string but time zone abbreviations (such as PDT) are not supported.
TRUNC(timestamp) CAST(timestamp AS DATE)
BigQuery also offers the following date and time functions, which do not have a direct analogue in Redshift:
● EXTRACT ● DATE ● DATE_SUB ● DATE_ADD (returning DATE
data type) ● DATE_FROM_UNIX_DATE ● FORMAT_DATE ● PARSE_DATE ● UNIX_DATE ● DATETIME
● DATETIME_ADD ● DATETIME_SUB ● DATETIME_DIFF ● DATETIME_TRUNC ● FORMAT_DATETIME ● PARSE_DATETIME ● CURRENT_TIME ● TIME ● TIME_ADD ● TIME_SUB
● TIME_DIFF ● TIME_TRUNC ● FORMAT_TIME ● PARSE_TIME ● TIMESTAMP_SECONDS ● TIMESTAMP_MILLIS ● TIMESTAMP_MICROS ● UNIX_SECONDS ● UNIX_MILLIS ● UNIX_MICROS
Mathematical operators The following table shows mappings between common Redshift mathematical operators with their BigQuery equivalents.
22
Redshift BigQuery
X + Y X + Y
X - Y X - Y
X * Y X * Y
X / Y Note: If the operator is performing integer division (in other words, if X and Y are both integers), an integer is returned. If the operator is performing non-integer division, a non-integer is returned.
If integer division: CAST(FLOOR(X / Y) AS INT64) If not integer division: CAST(X / Y AS INT64) Note: Division in BigQuery returns a non-integer. To prevent errors from a division operation (division by zero error), use SAFE_DIVIDE(X, Y) or IEEE_DIVIDE(X, Y).
X % Y MOD(X, Y) Note: To prevent errors from a division operation (division by zero error), use SAFE.MOD(X, Y). SAFE.MOD(X, 0) results in 0.
X ^ Y POW(X, Y) POWER(X, Y) Note: Unlike Redshift, the ^ operator in BigQuery performs Bitwise xor.
|/ X SQRT(X) Note: To prevent errors from a square root operation (negative input), use SAFE.SQRT(X). Negative input with SAFE.SQRT(X) results in NULL.
||/ X SIGN(X) * POWER(ABS(X), 1/3) Note: BigQuery’s POWER(X, Y) returns an error if X is a finite value less than 0 and Y is a noninteger.
@ X ABS(X)
X << Y X << Y
23
Note: This operator returns 0 or a byte sequence of b'\x00' if the second operand Y is greater than or equal to the bit length of the first operand X (for example, 64 if X has the type INT64). This operator throws an error if Y is negative.
X >> Y X >> Y
Note: Shifts the first operand X to the right. This operator does not do sign bit extension with a signed type (it fills vacant bits on the left with 0). This operator returns 0 or a byte sequence of b'\x00' if the second operand Y is greater than or equal to the bit length of the first operand X (for example, 64 if X has the type INT64). This operator throws an error if Y is negative.
X & Y X & Y
X | Y X | Y
~X ~X
BigQuery also offers the following mathematical operator, which does not have a direct analog in Redshift:
● X ^ Y (Bitwise xor)
Math functions
Redshift BigQuery
ABS(number) ABS(number)
ACOS(number) ACOS(number)
ASIN(number) ASIN(number)
ATAN(number) ATAN(number)
ATAN2(number1, number2) ATAN2(number1, number2)
CBRT(number) POWER(number, 1/3)
CEIL(number) CEIL(number)
CEILING(number) CEILING(number)
CHECKSUM(expression) FARM_FINGERPRINT(expression)
COS(number) COS(number)
COT(number) 1/TAN(number)
DEGREES(number) number*180/ACOS(-1)
DEXP(number) EXP(number)
DLOG1(number) LN(number)
24
DLOG10(number) LOG10(number)
EXP(number) EXP(number)
FLOOR(number) FLOOR(number)
LN(number) LN(number)
LOG(number) LOG10(number)
MOD(number1, number2) MOD(number1, number2)
PI ACOS(-1)
POWER(expression1, expression2) POWER(expression1, expression2)
RADIANS(number) ACOS(-1)*(number/180)
RANDOM() RAND()
ROUND(number [, integer]) ROUND(number [, integer])
SIN(number) SIN(number)
SIGN(number) SIGN(number)
SQRT(number) SQRT(number)
TAN(number) TAN(number)
TO_HEX(number) FORMAT('%x', number)
TRUNC(number [, integer]) TRUNC(number [, integer])
String functions
Redshift BigQuery
string1 || string2 CONCAT(string1, string2)
BPCHARCMP(string1, string2) CASE WHEN string1 = string2 THEN 0 WHEN string1 > string2 THEN 1 ELSE -1 END
BTRIM(string [, matching_string]) TRIM(string [, matching_string])
BTTEXT_PATTERN_CMP(string1, string2) CASE WHEN string1 = string2 THEN 0 WHEN string1 > string2 THEN 1 ELSE -1 END
CHAR_LENGTH(expression) CHAR_LENGTH(expression)
CHARACTER_LENGTH(expression) CHARACTER_LENGTH(expression)
CHARINDEX(substring, string) STRPOS(string, substring)
CHR(number) CODE_POINTS_TO_STRING([number])
25
CONCAT(string1, string2) CONCAT(string1, string2) Note: BigQuery’s CONCAT(...) supports concatenating any number of strings.
CRC32 Custom user-defined function
FUNC_SHA1(string) SHA1(string)
INITCAP Custom user-defined function
LEFT(string, integer) SUBSTR(string, 0, integer)
RIGHT(string, integer) SUBSTR(string, -integer)
LEN(expression) LENGTH(expression)
LENGTH(expression) LENGTH(expression)
LOWER(string) LOWER(string)
LPAD(string1, length[, string2]) LPAD(string1, length[, string2])
RPAD(string1, length[, string2]) RPAD(string1, length[, string2])
LTRIM(string, trim_chars) LTRIM(string1, trim_chars)
MD5(string) MD5(string)
OCTET_LENGTH(expression) BYTE_LENGTH(expression)
POSITION(substring IN string) STRPOS(string, substring)
QUOTE_IDENT(string) CONCAT('"',string,'"')
QUOTE_LITERAL(string) CONCAT("'",string,"'")
REGEXP_COUNT( source_string, pattern [,position] )
ARRAY_LENGTH( REGEXP_EXTRACT_ALL( source_string, pattern ) ) If position is specified: ARRAY_LENGTH( REGEXP_EXTRACT_ALL( SUBSTR(source_string, IF(position <= 0, 1, position)), pattern ) ) Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.
26
REGEXP_INSTR( source_string, pattern [, position [, occurrence]] )
IFNULL( STRPOS( source_string, REGEXP_EXTRACT( source_string, pattern) ), 0) If position is specified: IFNULL( STRPOS( SUBSTR(source_string, IF(position <= 0, 1, position)), REGEXP_EXTRACT( SUBSTR(source_string, IF(position <= 0, 1, position)), pattern) ) + IF(position <= 0, 1, position) - 1, 0) If occurrence is specified: IFNULL( STRPOS( SUBSTR(source_string, IF(position <= 0, 1, position)), REGEXP_EXTRACT_ALL( SUBSTR(source_string, IF(position <= 0, 1, position)), pattern )[SAFE_ORDINAL(occurrence)] ) + IF(position <= 0, 1, position) - 1, 0) Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.
REGEXP_REPLACE( source_string, pattern [, replace_string [, position]] )
REGEXP_REPLACE( source_string, pattern, “” ) If replace_string is specified: REGEXP_REPLACE( source_string,
27
pattern, replace_string ) If position is specified: CASE WHEN position > LENGTH(source_string) THEN source_string WHEN position <= 0 THEN REGEXP_REPLACE( source_string, pattern, “” ) ELSE CONCAT( SUBSTR( source_string, 1, position - 1), REGEXP_REPLACE( SUBSTR(source_string, position), pattern, replace_string ) ) END Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.
REGEXP_SUBSTR( source_string, pattern [, position [, occurrence]] )
REGEXP_EXTRACT( source_string, pattern ) If position is specified: REGEXP_EXTRACT( SUBSTR(source_string, IF(position <= 0, 1, position)), pattern ) If occurrence is specified: REGEXP_EXTRACT_ALL( SUBSTR(source_string, IF(position <= 0, 1, position)),
28
pattern )[SAFE_ORDINAL(occurrence)] Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.
REPEAT(string, integer) REPEAT(string, integer)
REPLACE(string1, old_chars, new_chars) REPLACE(string1, old_chars, new_chars)
REPLICATE(string, integer) REPEAT(string, integer)
REVERSE(expression) REVERSE(expression)
RTRIM(string, trim_chars) RTRIM(string, trim_chars)
SPLIT_PART(string, delimiter, part) SPLIT( string delimiter )SAFE_ORDINAL(part)
STRPOS(string, substring) STRPOS(string, substring)
STRTOL(string, base)
SUBSTRING( string, start_position, number_characters )
SUBSTR( string, start_position, number_characters )
TEXTLEN(expression) LENGTH(expression)
TRANSLATE( expression, characters_to_replace, characters_to_substitute )
Can be implemented using UDF: CREATE TEMP FUNCTION translate(expression STRING, characters_to_replace STRING, characters_to_substitute STRING) AS ( IF(LENGTH(characters_to_replace) < LENGTH(characters_to_substitute) OR LENGTH(expression) < LENGTH(characters_to_replace), expression, (SELECT STRING_AGG( IFNULL( (SELECT ARRAY_CONCAT([c], SPLIT(characters_to_substitute, ''))[SAFE_OFFSET(( SELECT IFNULL(MIN(o2) + 1, 0) FROM UNNEST(SPLIT(characters_to_replace,
29
'')) AS k WITH OFFSET o2 WHERE k = c))] ), ''), '' ORDER BY o1) FROM UNNEST(SPLIT(expression, '')) AS c WITH OFFSET o1 )) );
TRIM([BOTH] string) TRIM(string)
TRIM([BOTH] characters FROM string) TRIM(string, characters)
UPPER(string) UPPER(string)
Data type formatting functions
Redshift BigQuery
CAST(expression AS type) CAST(expression AS type)
expression :: type CAST(expression AS type)
CONVERT(type, expression) CAST(expression AS type)
TO_CHAR( timestamp_expression, format )
FORMAT_TIMESTAMP( format, timestamp_expression ) Note: BigQuery and Redshift differ in how to specify a format string for timestamp_expression.
TO_CHAR( numeric_expression, format )
FORMAT(format, numeric_expression) Note: BigQuery and Redshift differ in how to specify a numeric format string.
TO_DATE(date_string, format) PARSE_DATE(date_string, format) Note: BigQuery and Redshift differ in how to specify a format string for date_string.
TO_NUMBER(string, format) CAST( FORMAT( format, numeric_expression ) TO INT64
30
) Note: BigQuery and Redshift differ in how to specify a numeric format string.
BigQuery also supports SAFE_CAST(expression AS typename), which returns NULL if BigQuery is unable to perform a cast; for example, SAFE_CAST(“apple” AS INT64) returns NULL).
DML syntax This section addresses differences in data management language syntax between Redshift and BigQuery.
INSERT statement
Redshift offers a configurable DEFAULT keyword for columns. In BigQuery, the DEFAULT value for nullable columns is NULL, and DEFAULT is not supported for required columns. Most Redshift INSERT statements are compatible with BigQuery. The following table shows exceptions. Redshift BigQuery
INSERT INTO table (column1 [, ...]) DEFAULT VALUES
INSERT [INTO] table (column1 [, ...]) VALUES (DEFAULT [, ...])
INSERT INTO table (column1, [,...]) VALUES ( SELECT ... FROM ... )
INSERT [INTO] table (column1, [,...]) SELECT ... FROM ...
BigQuery also supports inserting values using a subquery (where one of the values is computed using a subquery), which is not supported in Redshift. For example:
INSERT INTO table (column1, column2) VALUES ('value_1', ( SELECT column2 FROM table2 ))
COPY statement
Redshift’s COPY command loads data into a table from data files or from an Amazon DynamoDB table. BigQuery does not use the SQL COPY command to load data, but you can
31
use any of several non-SQL tools and options to load data into BigQuery tables. You can also use data pipeline sinks provided in Apache Spark or Apache Beam to write data into BigQuery.
UPDATE statement
Most Redshift UPDATE statements are compatible with BigQuery. The following table shows exceptions. Redshift BigQuery
UPDATE table SET column = expression [,...] [FROM ...]
UPDATE table SET column = expression [,...] [FROM ...] WHERE TRUE Note: All UPDATE statements in BigQuery require a WHERE keyword, followed by a condition.
UPDATE table SET column = DEFAULT [,...] [FROM ...] [WHERE ...]
UPDATE table SET column = NULL [, ...] [FROM ...] WHERE ... Note: BigQuery’s UPDATE command does not support DEFAULT values. If the Redshift UPDATE statement does not include a WHERE clause, the BigQuery UPDATE statement should be conditioned WHERE TRUE.
DELETE, TRUNCATE statements
The DELETE and TRUNCATE statements are both ways to remove rows from a table without affecting the table schema or indexes. In Redshift, TRUNCATE is recommended over an unqualified DELETE because it is faster and does not require a VACUUM and ANALYZE afterward. However, you can use DELETE statements to achieve the same effect. In BigQuery, the DELETE statement must have a WHERE clause. For more information about DELETE in BigQuery, see the BigQuery DELETE examples in the DML documentation. Redshift BigQuery
32
DELETE [FROM] table_name TRUNCATE [TABLE] table_name
DELETE FROM table_name WHERE TRUE BigQuery DELETE statements require a WHERE clause.
DELETE FROM table_name USING other_table WHERE table_name.id=other_table.id
DELETE FROM table_name WHERE table_name.id IN ( SELECT id FROM other_table ) DELETE FROM table_name WHERE EXISTS ( SELECT id FROM other_table WHERE table_name.id = other_table.id ) In Redshift, USING allows additional tables to be referenced in the WHERE clause. This can be achieved in BigQuery by using a subquery in the WHERE clause.
MERGE statement
The MERGE statement can combine INSERT, UPDATE, and DELETE operations into a single upsert statement and perform the operations atomically. The MERGE operation must match at most one source row for each target row. Redshift does not support a single MERGE command. However, a merge operation can be performed in Redshift by performing INSERT, UPDATE, and DELETE operations in a transaction.
Merge operation by replacing existing rows In Redshift, an overwrite of all of the columns in the target table can be performed using a DELETE statement and then an INSERT statement. The DELETE statement removes rows that should be updated, and then the INSERT statement inserts the updated rows. BigQuery tables are limited to 1,000 DML statements per day, so you should consolidate INSERT, UPDATE, and DELETE statements into a single MERGE statement as shown in the following table.
33
Redshift BigQuery
See Performing a merge operation by replacing existing rows. CREATE TEMP TABLE temp_table; INSERT INTO temp_table SELECT * FROM source WHERE source.filter = 'filter_exp'; BEGIN TRANSACTION; DELETE FROM target USING temp_table WHERE target.key = temp_table.key; INSERT INTO target SELECT * FROM temp_table; END TRANSACTION; DROP TABLE temp_table;
MERGE target USING source ON target.key = source.key WHEN MATCHED AND source.filter = 'filter_exp' THEN UPDATE SET target.col1 = source.col1, target.col2 = source.col2, ... Note: All columns must be listed if updating all columns.
See Performing a merge operation by specifying a column list. CREATE TEMP TABLE temp_table; INSERT INTO temp_table SELECT * FROM source WHERE source.filter = 'filter_exp'; BEGIN TRANSACTION; UPDATE target SET col1 = temp_table.col1, col2 = temp_table.col2 FROM temp_table WHERE target.key=temp_table.key; INSERT INTO target SELECT * FROM
MERGE target USING source ON target.key = source.key WHEN MATCHED AND source.filter = 'filter_exp' THEN UPDATE SET target.col1 = source.col1, target.col2 = source.col2
34
DDL syntax This section addresses differences in data definition language syntax between Redshift and BigQuery.
SELECT INTO statement
In Redshift, the SELECT INTO statement can be used to insert the results of a query into a new table, combining table creation and insertion. Redshift BigQuery
SELECT expression, ... INTO table FROM ...
INSERT table SELECT expression, ... FROM ...
WITH subquery_table AS ( SELECT ... ) SELECT expression, ... INTO table FROM subquery_table ...
INSERT table WITH subquery_table AS ( SELECT ... ) SELECT expression, ... FROM subquery_table ...
SELECT expression INTO TEMP table FROM ... SELECT expression INTO TEMPORARY table FROM ...
BigQuery offers several ways to emulate temporary tables. See the Temporary tables section for more information.
CREATE TABLE statement
Most Redshift CREATE TABLE statements are compatible with BigQuery, except for the following syntax elements, which are not used in BigQuery: Redshift BigQuery
CREATE TABLE table_name ( col1 data_type1 NOT NULL, col2 data_type2 NULL, col3 data_type3 UNIQUE, col4 data_type4 PRIMARY KEY, col5 data_type5 )
CREATE TABLE table_name ( col1 data_type1 NOT NULL, col2 data_type2, col3 data_type3, col4 data_type4, col5 data_type5, )
35
Note: UNIQUE and PRIMARY KEY constraints are informational and are not enforced by the Redshift system.
CREATE TABLE table_name ( col1 data_type1[,...] table_constraints ) where table_constraints are: [UNIQUE(column_name [, ... ])] [PRIMARY KEY(column_name [, ...])] [FOREIGN KEY(column_name [, ...]) REFERENCES reftable [(refcolumn)] Note: UNIQUE and PRIMARY KEY constraints are informational and are not enforced by the Redshift system.
CREATE TABLE table_name ( col1 data_type1[,...] ) PARTITION BY column_name CLUSTER BY column_name [, ...] Note: BigQuery does not use UNIQUE, PRIMARY KEY, or FOREIGN KEY table constraints. To achieve similar optimization that these constraints provide during query execution, partition and cluster your BigQuery tables. CLUSTER BY supports up to 4 columns.
CREATE TABLE table_name LIKE original_table_name
Reference this example to learn how to use the INFORMATION_SCHEMA tables to copy column names, data types, and NOT NULL constraints to a new table.
CREATE TABLE table_name ( col1 data_type1 ) BACKUP NO Note: In Redshift, the BACKUP NO setting is specified to save processing time and reduce storage space.”
The BACKUP NO table option is not used or needed because BigQuery automatically keeps up to 7 days of historical versions of all of your tables with no effect on processing time or billed storage.
CREATE TABLE table_name ( col1 data_type1 ) table_attributes where table_attributes are: [DISTSTYLE {AUTO|EVEN|KEY|ALL}] [DISTKEY (column_name)] [[COMPOUND|INTERLEAVED] SORTKEY (column_name [, ...])]
BigQuery supports clustering, which allows storing keys in sorted order.
CREATE TABLE table_name AS SELECT ...
CREATE TABLE table_name AS SELECT ...
CREATE TABLE IF NOT EXISTS table_name ...
CREATE TABLE IF NOT EXISTS table_name ...
36
BigQuery also supports the DDL statement CREATE OR REPLACE TABLE statement, which overwrites a table if it already exists. BigQuery’s CREATE TABLE statement also supports the following clauses, which do not have a Redshift equivalent:
● PARTITION BY partition_statement ● CLUSTER BY clustering_column_list ● OPTIONS(table_options_list)
For more information about CREATE TABLE in BigQuery, see the BigQuery CREATE TABLE examples in the DML documentation.
Temporary tables Redshift supports temporary tables, which are only visible within the current session. There are several ways to emulate temporary tables in BigQuery:
● Dataset TTL: Create a dataset that has a short time to live (for example, one hour) so that any tables created in the dataset are effectively temporary because they won’t persist longer than the dataset’s time to live. You can prefix all of the table names in this dataset with temp to clearly denote that the tables are temporary.
● Table TTL: Create a table that has a table-specific short time to live using DDL statements similar to the following:
CREATE TABLE temp.name (col1, col2, ...) OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR));
CREATE VIEW statement
The following table shows equivalents between Redshift and BigQuery for the CREATE VIEW statement.
Redshift BigQuery
CREATE VIEW view_name AS SELECT ... CREATE VIEW view_name AS SELECT ...
CREATE OR REPLACE VIEW view_name AS SELECT ...
CREATE OR REPLACE VIEW view_name AS SELECT ...
CREATE VIEW view_name (column_name, ...) AS SELECT ...
CREATE VIEW view_name AS SELECT ...
37
Not supported. CREATE VIEW IF NOT EXISTS view_name OPTIONS(view_option_list) AS SELECT … Creates a new view only if the view does not exist in the specified dataset.
CREATE VIEW view_name AS SELECT ... WITH NO SCHEMA BINDING In Redshift, a late binding view is required in order to reference an external table.
In BigQuery, to create a view, all referenced objects must already exist. BigQuery allows you to query external data sources.
User-defined functions (UDFs) A UDF lets you create functions for custom operations. These functions accept columns of input, perform actions, and return the result of those actions as a value. Both Redshift and BigQuery support UDF using SQL expressions. Additionally, in Redshift you can create a Python-based UDF, and in BigQuery you can create a JavaScript-based UDF. Refer to the Google Cloud BigQuery utilities GitHub repository for a library of common BigQuery UDFs.
CREATE FUNCTION syntax
The following table addresses differences in SQL UDF creation syntax between Redshift and BigQuery.
Redshift BigQuery
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type IMMUTABLE AS $$ sql_function_definition $$ LANGUAGE sql
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) AS sql_function_definition Note: In BigQuery SQL UDF, return data type is optional. BigQuery infers the result type of the function from the SQL function body when a query calls the function.
38
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type { VOLATILE | STABLE | IMMUTABLE } AS $$ sql_function_definition $$ LANGUAGE sql
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type AS sql_function_definition Note: Function volatility is not a configurable parameter in BigQuery. All BigQuery UDF volatility is equivalent to Redshift’s IMMUTABLE volatility (that is, it does not do database lookups or otherwise use information not directly present in its argument list).
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type IMMUTABLE AS $$ SELECT_clause $$ LANGUAGE sql Note: Redshift supports only a SQL SELECT clause as function definition. Also, the SELECT clause cannot include any of the FROM, INTO, WHERE, GROUP BY, ORDER BY, and LIMIT clauses.
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type AS sql_expression Note: BigQuery supports any SQL expressions as function definition. However, referencing tables, views, or models is not supported.
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type IMMUTABLE AS $$ sql_function_definition $$ LANGUAGE sql
CREATE [OR REPLACE] FUNCTION function_name ([sql_arg_name sql_arg_data_type[,..]]) RETURNS data_type AS sql_function_definition Note: Language literal need not be specified in BigQuery SQL UDF. BigQuery interprets the SQL expression by default. Also, the Redshift dollar quoting ($$) is not supported in BigQuery.
CREATE [OR REPLACE] FUNCTION function_name (integer, integer) RETURNS integer IMMUTABLE AS $$ SELECT $1 + $2 $$ LANGUAGE sql
CREATE [OR REPLACE] FUNCTION function_name (x INT64, y INT64) RETURNS INT64 AS SELECT x + y Note: BigQuery UDFs require all input arguments to be named. The Redshift argument variables ($1, $2, …) are not supported in BigQuery.
39
CREATE [OR REPLACE] FUNCTION function_name (integer, integer) RETURNS integer IMMUTABLE AS $$ SELECT $1 + $2 $$ LANGUAGE sql Note: Redshift does not support ANY TYPE for SQL UDFs. However, it supports using ANYELEMENT data type in python based UDFs.
CREATE [OR REPLACE] FUNCTION function_name (x ANY TYPE, y ANY TYPE) AS SELECT x + y Note: BigQuery supports using ANY TYPE as argument type. The function accepts an input of any type for this argument. For more information, see templated parameter in BigQuery.
BigQuery also supports the CREATE FUNCTION IF NOT EXISTS statement, which treats the query as successful and takes no action if a function with the same name already exists. BigQuery’s CREATE FUNCTION statement also supports creating TEMPORARY or TEMP functions, which do not have a Redshift equivalent. See calling UDFs for details on executing a BigQuery persistent UDF.
DROP FUNCTION syntax
The following table addresses differences in DROP FUNCTION syntax between Redshift and BigQuery.
Redshift BigQuery
DROP FUNCTION function_name ( [arg_name] arg_type [, ...] ) [ CASCADE | RESTRICT ]
DROP FUNCTION dataset_name.function_name Note: BigQuery does not require using the function’s signature for deleting the function. Also, removing function dependencies is not supported in BigQuery.
BigQuery also supports the DROP FUNCTION IF EXISTS statement, which deletes the function only if the function exists in the specified dataset. BigQuery requires that you specify the project_name if the function is not located in the current project.
40
UDF components
This section highlights the similarities and differences in UDF components between Redshift and BigQuery.
Component Redshift BigQuery
Name Redshift recommends using the prefix _f for function names to avoid conflicts with existing or future built-in SQL function names.
In BigQuery, you can use any custom function name.
Arguments Arguments are optional. You can use name and data types for Python UDF arguments and only data types for SQL UDF arguments. In SQL UDF, you must refer to arguments using $1, $2, and so on. Redshift also restricts the number of arguments to 32.
Arguments are optional, but if you specify arguments, they must use both name and data types for both JavaScript and SQL UDFs. The maximum number of arguments for a persistent UDF is 256.
Data type Redshift supports a different set of data types for SQL and Python UDFs. For a Python UDF, the data type might also be ANYELEMENT. You must specify a RETURN data type for both SQL and Python UDFs. See Data types in this document for equivalents between data types in Redshift and in BigQuery.
BigQuery supports a different set of data types for SQL and JavaScript UDFs. For a SQL UDF, the data type might also be ANY TYPE. For more information, see templated parameters in BigQuery. The RETURN data type is optional for SQL UDFs. See SQL type encodings in JavaScript for information on how BigQuery data types map to JavaScript data types.
Definition For both SQL and Python UDFs, you must enclose the function definition using dollar quoting, as in a pair of dollar signs ($$), to indicate the start and end of the function statements. For SQL UDF, Redshift supports only a SQL SELECT clause as the function definition. Also, the SELECT clause cannot include any
In BigQuery, you need to enclose the JavaScript code in quotes. See Quoting rules for more information. For SQL UDF, you can use any SQL expressions as the function definition. However, BigQuery doesn’t support referencing tables, views, or models.
41
of the FROM, INTO, WHERE, GROUP BY, ORDER BY, and LIMIT clauses. For Python UDF, you can write a Python program using the Python 2.7 Standard Library or import your custom modules by creating one using the CREATE LIBRARY command.
For JavaScript UDF, you can include external code libraries directly using the OPTIONS section. You can also use the BigQuery UDF test tool to test your functions.
Language You must use the LANGUAGE literal to specify the language as either sql for SQL UDF or plpythonu for Python UDF.
You need not specify LANGUAGE for SQL UDF but must specify the language as js for JavaScript UDF.
State Redshift does not support creating temporary UDFs. Redshift provides an option to define the volatility of a function using VOLATILE, STABLE, or IMMUTABLE literals. This is used for optimization by the query optimizer.
BigQuery supports both persistent and temporary UDFs. You can reuse persistent UDFs across multiple queries, whereas you can only use temporary UDFs in a single query. Function volatility is not a configurable parameter in BigQuery. All BigQuery UDF volatility is equivalent to Redshift’s IMMUTABLE volatility.
Security and privileges
To create a UDF, you must have permission for usage on language for SQL or plpythonu (Python). By default, USAGE ON LANGUAGE SQL is granted to PUBLIC, but you must explicitly grant USAGE ON LANGUAGE PLPYTHONU to specific users or groups. Also, you must be a superuser to replace a UDF.
Granting explicit permissions for creating or deleting any type of UDF is not necessary in BigQuery. Any user assigned a role of BigQuery Data Editor (having bigquery.routines.* as one of the permissions) can create or delete functions for the specified dataset. BigQuery also supports creating custom roles. This can be managed using Cloud IAM.
Limits See Python UDF limits. See UDF limits.
42
Metadata and transaction SQL statements
Redshift BigQuery
SELECT * FROM STL_ANALYZE WHERE name = 'T';
Not used in BigQuery. You don’t need to gather statistics in order to improve query performance. To get information about your data distribution, you can use approximate aggregate functions.
ANALYZE [[ table_name[(column_name [, ...])]]
Not used in BigQuery.
LOCK TABLE table_name; Not used in BigQuery.
BEGIN TRANSACTION; SELECT ... END TRANSACTION;
BigQuery uses snapshot isolation. For details, see Consistency guarantees elsewhere in this document.
EXPLAIN ... Not used in BigQuery. Similar features are the query plan explanation in the BigQuery Cloud Console and the slot allocation, and in audit logging in Cloud Monitoring.
SELECT * FROM SVV_TABLE_INFO WHERE table = 'T';
SELECT * EXCEPT(is_typed) FROM mydataset.INFORMATION_SCHEMA.TABLES; For more information see Introduction to BigQuery INFORMATION_SCHEMA.
VACUUM [table_name] Not used in BigQuery. BigQuery clustered tables are automatically sorted.
Multi-statement and multi-line SQL statements
Redshift supports transactions (sessions) and therefore supports statements separated by semicolons that are consistently executed together. BigQuery scripting enables you to send multiple statements to BigQuery in one request, to use variables, and to use control flow statements such as IF and WHILE. In BigQuery, a script is a SQL statement list to be executed in sequence. A SQL statement list is a list of any valid BigQuery statements that are separated by semicolons. The scripting feature in BigQuery runs these statements as individual queries without rollback.
43
Procedural SQL statements
CREATE PROCEDURE statement
Redshift BigQuery
CREATE or REPLACE PROCEDURE CREATE PROCEDURE if a name is required. Otherwise, use inline with BEGIN or in a single line with CREATE TEMP FUNCTION.
CALL CALL
Variable declaration and assignment
Redshift BigQuery
DECLARE DECLARE Declares a variable of the specified type.
SET SET Sets a variable to have the value of the provided expression, or sets multiple variables at the same time based on the result of multiple expressions.
Error condition handlers
In Redshift, an error encountered during the execution of a stored procedure ends the execution flow, ends the transaction, and rolls back the transaction. This occurs because subtransactions are not supported..In a Redshift-stored procedure, the only supported handler_statement is RAISE. In BigQuery, error handling is a core feature of the main control flow, similar to what other languages provide with TRY ... CATCH blocks.
Redshift BigQuery
BEGIN ... EXCEPTION WHEN OTHERS THEN
BEGIN ... EXCEPTION WHEN ERROR THEN
RAISE RAISE
[ <<label>> ] [ DECLARE declarations ] BEGIN statements EXCEPTION
BEGIN BEGIN ... EXCEPTION WHEN ERROR THEN SELECT 1/0; END;
44
WHEN OTHERS THEN handler_statements END;
EXCEPTION WHEN ERROR THEN -- The exception thrown from the inner exception handler lands here. END;
Cursor declarations and operations
Because BigQuery doesn’t support cursors or sessions, the following statements aren’t used in BigQuery:
● DECLARE cursor_name CURSOR [FOR] ... ● PREPARE plan_name [ (datatype [, ...] ) ] AS statement ● OPEN cursor_name FOR SELECT ... ● FETCH [ NEXT | ALL | {FORWARD [ count | ALL ] } ] FROM cursor_name ● CLOSE cursor_name;
If you’re using the cursor to return a result set, you can achieve similar behavior using temporary tables in BigQuery.
Dynamic SQL statements
The scripting feature in BigQuery supports dynamic SQL statements like those shown in the following table.
Redshift BigQuery
EXECUTE EXECUTE IMMEDIATE
Flow-of-control statements
Redshift BigQuery
IF..THEN..ELSIF..THEN..ELSE..END IF IF condition THEN stmts ELSE stmts END IF
name CURSOR [ ( arguments ) ] FOR query
Cursors or sessions are not used in BigQuery.
[<<label>>] LOOP statements END LOOP [ label ];
LOOP sql_statement_list END LOOP;
WHILE condition LOOP stmts END LOOP WHILE condition DO stmts END WHILE
EXIT BREAK
45
Consistency guarantees and transaction isolation Both Redshift and BigQuery are atomic—that is, ACID-compliant on a per-mutation level across many rows.
Transactions
Redshift supports serializable isolation by default for transactions. Redshift lets you specify any of the four SQL standard transaction isolation levels but processes all isolation levels as serializable. BigQuery helps ensure optimistic concurrency control (first to commit wins) with snapshot isolation, in which a query reads the last committed data before the query starts. This approach guarantees the same level of consistency on a per-row, per-mutation basis and across rows within the same DML statement, yet avoids deadlocks. In the case of multiple DML updates against the same table, BigQuery switches to pessimistic concurrency control. Load jobs can run completely independently and append to tables. However, BigQuery does not yet provide an explicit transaction boundary or session.
Rollback
If Redshift encounters any error while running a stored procedure, it rolls back all changes made in a transaction . Additionally, you can use the ROLLBACK transaction control statement in a stored procedure to discard all changes. There is also no concept of an explicit rollback in BigQuery because there is no explicit transaction boundary in BigQuery. The workarounds are table decorators or using FOR SYSTEM_TIME AS OF.
Database limits Check the BigQuery public documentation for the latest quotas and limits. Many quotas for large-volume users can be raised by contacting the Cloud support team. The following table shows a comparison of the Redshift and BigQuery database limits.
Limit Redshift BigQuery
Tables per database for large and xlarge cluster node types
9,900 Unrestricted
Tables per database for 8xlarge cluster node types
20,000 Unrestricted
User-defined databases you can create per cluster
60 Unrestricted
Maximum row size 4 MB 100 MB
46