solving common sql problems with the seq engine

58
Solving Common SQL Problems with the SeqEngine Beat Vontobel, CTO, MeteoNews AG [email protected] http://seqengine.org Copyright © 2009 Beat Vontobel This work is made available under the Creative Commons Attribution-Noncommercial-Share Alike license, see http://creativecommons.org/licenses/by-nc-sa/3.0/

Upload: mysqlconference

Post on 01-Dec-2014

1.284 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Solving Common Sql Problems With The Seq Engine

Solving CommonSQL Problemswith theSeqEngineBeat Vontobel, CTO, MeteoNews [email protected]

http://seqengine.org

Copyright © 2009 Beat VontobelThis work is made available under the Creative Commons Attribution-Noncommercial-Share Alike license, seehttp://creativecommons.org/licenses/by-nc-sa/3.0/

Page 2: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Solving SQL Problems with the SeqEngine• How to benefit from simple auxiliary tables holding

sequences

• Use of a pluggable storage engine to create such tables

• On the side:

‣ Some interesting benchmarks

‣ MySQL-Optimizer caveats

‣ Remember once more how to do things the „SQL-way“

Page 3: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Sequences: What are we talking about?CREATE TABLE integer_sequence ( i INT NOT NULL PRIMARY KEY);

INSERT INTO integer_sequence (i)VALUES (1), (2), (3), (4), (5), (6), (7), (8);

SELECT * FROM integer_sequence;+---+| i |+---+| 1 | | 2 | | 3 | | 4 | …

Page 4: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Names used by others…• Pivot Table

‣ Can be used to „pivot“ other tables („turn them around“)

• Integers Table

‣ They often hold integers as data type

• Auxiliary/Utility Table

‣ They help us solve problems, but contain no actual data

• Sequence Table

‣ Just what it is: The name I‘ll use

Page 5: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

What we‘re not talking about (1)-- Oracle Style Sequences-- -- (mostly used to generate primary keys, much-- like what MySQL‘s auto_increment feature is-- used for)

CREATE SEQUENCE customers_seq START WITH 1000 INCREMENT BY 1;

INSERT INTO customers (customer_id, name)VALUES (customers_seq.NEXTVAL, 'John Doe');

Page 6: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

What we‘re not talking about (2)• Sequence in Mathematics:

‣ „an ordered list of objects“

‣ n-tuple

• Sequence Table in SQL:

‣ a set, unordered by definition

F = {n | 1 ≤ n ≤ 20; n is integer}

‣ relation (set of 1-tuples)

Page 7: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

„Using such a utility table is a favorite old trick of experienced SQL developers“

(Stéphane Faroult: The Art of SQL)

Page 8: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding Holes… typically Swiss!

Page 9: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding Holes… in a Table!+---------+---------------------+------+---| stat_id | datetime | tt | …+---------+---------------------+------+------+| … … | … … | || ABO | 2004-11-03 22:40:00 | 8.3 | …| ABO | 2004-11-03 22:50:00 | 8.7 | | ABO | 2004-11-03 23:00:00 | 9.9 | | ABO | 2004-11-03 23:10:00 | 7.8 | | ABO | 2004-11-04 00:10:00 | 9.2 | | ABO | 2004-11-04 00:20:00 | 9.1 | | ABO | 2004-11-04 00:30:00 | 10.2 | | ABO | 2004-11-04 00:40:00 | 9.3 | | | … | … … …| | | |+---------+---------------------+------+----

Page 10: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding Holes… in a Table!+---------+---------------------+------+---| stat_id | datetime | tt | …+---------+---------------------+------+------+| … … | … … | || ABO | 2004-11-03 22:40:00 | 8.3 | …| ABO | 2004-11-03 22:50:00 | 8.7 | | ABO | 2004-11-03 23:00:00 | 9.9 | | ABO | 2004-11-03 23:10:00 | 7.8 | | ABO | 2004-11-04 00:10:00 | 9.2 | | ABO | 2004-11-04 00:20:00 | 9.1 | | ABO | 2004-11-04 00:30:00 | 10.2 | | ABO | 2004-11-04 00:40:00 | 9.3 | | | … | … … …| | | |+---------+---------------------+------+----

Page 11: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

The table‘s create statement used for demoCREATE TABLE temperatures (

stat_id CHAR(3) NOT NULL, datetime TIMESTAMP NOT NULL, tt decimal(3,1) DEFAULT NULL,

PRIMARY KEY (stat_id, datetime), UNIQUE KEY reverse_primary (datetime, stat_id)

);

Page 12: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

How to „SELECT“ a row that doesn‘t exist?• SELECT only returns rows that are there

• WHERE only filters rows

• We need something to generate rows!

Page 13: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding Holes… the naïve wayfor(„all timestamps to check“) {

/* Single SELECTs for every timestamp */db_query(„SELECT COUNT(*) FROM temperatures WHERE stat_id = ? AND datetime = ?“);

if(„no row found“) {warn_about_missing_row(„timestamp“);

}}

Page 14: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding Holes… the „standard“ way/* Working with an ordered set */db_query(„SELECT datetime FROM temperatures WHERE stat_id = ? ORDER BY datetime ASC“);

for(„all timestamps to check“) {

db_fetch_row();

while(„timestamps don‘t match“) {warn_about_missing_row();increment_timestamp();

}}

Page 15: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

These were just ordinary JOINs!• for-Loop just walks an „imaginary“ timestamps table with a

sequence of all the values to check for!

• Then we LEFT JOIN these timestamps against our temperatures

‣ or do a NOT EXIST subquery

• So, if we had a sequence table…

Page 16: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Sequence of timestampsCREATE TABLE timestamps (

datetime TIMESTAMP NOT NULL PRIMARY KEY);

INSERT INTO timestamps (datetime)VALUES ('2004-01-01 00:00:00'), ('2004-01-01 00:00:10'), …;

SELECT * FROM timestamps;+---------------------+| datetime |+---------------------+| 2004-01-01 00:00:00 | | 2004-01-01 00:00:10 | | … |

Page 17: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Queries using the sequenceSELECT *FROM timestamps -- our „for-Loop“LEFT JOIN temperaturesON timestamps.datetime = temperatures.datetimeWHERE temperatures.datetime IS NULL;

SELECT *FROM timestamps -- our „for-Loop“WHERE NOT EXISTS ( SELECT * FROM temperatures WHERE temperatures.datetime = timestamps.datetime );

Page 18: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding missing rows

datetime…

2004-11-03 23:00:00

2004-11-03 23:10:00

2004-11-03 23:20:00

2004-11-03 23:30:00

2004-11-03 23:40:00

2004-11-03 23:50:00

2004-11-04 00:00:00

2004-11-04 00:10:00

stat_id datetime tt… … …

ABO 2004-11-03 23:00:00 9.9

ABO 2004-11-03 23:10:00 7.8

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

ABO 2004-11-04 00:10:00 9.2

… … …

timestamps temperatures

WHERE temperatures.stat_id IS NULL

Page 19: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Filling sequence tables: „Manually“• INSERT from an external loop (or a stored procedure)

• „explode“ a few rows using CROSS JOINs

‣ INSERT INTO i VALUES (1), (2), …, (8), (9), (10);

‣ INSERT INTO j SELECT u.i * 10 + v.i FROM i AS u CROSS JOIN i AS v;

• „Pop quiz: generate 1 million records“ (Giuseppe Maxia)http://datacharmer.blogspot.com/2007/12/pop-quiz-generate-1-

million-records.html

Page 20: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

…or, just use the SeqEngine

-- http://seqengine.org-- README for build instructions

INSTALL PLUGIN SeqEngine SONAME 'ha_seqengine.so';

SHOW PLUGIN; SHOW ENGINES;

CREATE TABLE million (i TIMESTAMP NOT NULL)ENGINE=SeqEngine CONNECTION=‘1;1000000;1‘;

-- If you want to… now it‘s materialized (fast!)ALTER TABLE million ENGINE=MyISAM;

Page 21: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Syntax-- Variable parts are highlighted

CREATE TABLE table_name (column_name {INT|TIMESTAMP} NOT NULL [PRIMARY KEY]

) ENGINE=SeqEngine CONNECTION=‘start;end;increment‘;

Page 22: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

„Manually“ created: Disadvantages• Wastes storage

• Wastes RAM (for caches or if ENGINE=MEMORY)

• Wastes I/O

• Wastes CPU (unnecessary overhead in code)

• Cumbersome to fill (especially if large)

Page 23: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

SeqEngine: Disadvantages• None

• …other than:

‣ It‘s (yet) just a really quick hack for this presentation

‣ Contains ugly code and probably a lot of bugs

‣ Coded in C++ by somebody who‘s never done C++ before

‣ Is not part of the core server – go build it yourself!

Page 24: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Limitations of the SeqEngine (v0.1)• Not real limitations, but due to the concept:

‣ Read-only

‣ One column maximum

‣ UNIQUE keys only

• Current limitations:

‣ INT and TIMESTAMP only

‣ Only full key reads

‣ Error checking, clean-up, optimization, bugs…

Page 25: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

The tiny core of the SeqEngine: Initint ha_seqengine::rnd_init(bool scan){ DBUG_ENTER("ha_seqengine::rnd_init");

rnd_cursor_pos = share->seq_def.seq_start;

DBUG_RETURN(0);}

Page 26: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

The tiny core of the SeqEngine: Next rowint ha_seqengine::rnd_next(uchar *buf){ DBUG_ENTER("ha_seqengine::rnd_next");

if(rnd_cursor_pos <= share->seq_def.seq_end) { build_row(buf, rnd_cursor_pos); rnd_cursor_pos += share->seq_def.seq_inc; table->status= 0; DBUG_RETURN(0); }

table->status= STATUS_NOT_FOUND; DBUG_RETURN(HA_ERR_END_OF_FILE);}

Page 27: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

SeqEngine: The BOF• Using the Storage Engine API for small projects

• Additional questions/discussion

‣ Wednesday, April 22

‣ 20:30pm

‣ Ballroom E

Page 28: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Back to the missing rows example…

datetime…

2004-11-03 23:00:00

2004-11-03 23:10:00

2004-11-03 23:20:00

2004-11-03 23:30:00

2004-11-03 23:40:00

2004-11-03 23:50:00

2004-11-04 00:00:00

2004-11-04 00:10:00

stat_id datetime tt… … …

ABO 2004-11-03 23:00:00 9.9

ABO 2004-11-03 23:10:00 7.8

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

ABO 2004-11-04 00:10:00 9.2

… … …

timestamps temperatures

WHERE temperatures.stat_id IS NULL

Page 29: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

SeqEngine (LEFT JOIN)SELECT timestamps.datetime, stations.stat_id

FROM timestamps CROSS JOIN stations

LEFT JOIN temperatures AS tempsON (temps.datetime, temps.stat_id) = (timestamps.datetime, stations.stat_id)

WHERE stations.stat_id = 'ABO'AND temperatures.stat_id IS NULL;

Page 30: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

SeqEngine (NOT EXISTS)SELECT timestamps.datetime, stations.stat_id

FROM timestamps CROSS JOIN stations

WHERE stations.stat_id = 'ABO'

AND NOT EXISTS ( SELECT * FROM temperatures AS temps WHERE (temps.datetime, temps.stat_id) = (timestamps.datetime, stations.stat_id));

Page 31: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding Holes… the naïve wayfor(„all timestamps to check“) {

/* Single SELECTs for every timestamp */db_query(„SELECT COUNT(*) FROM temperatures WHERE stat_id = ? AND datetime = ?“);

if(„no row found“) {warn_about_missing_row(„timestamp“);

}}

Page 32: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

As a Procedure (Single SELECTs)CREATE PROCEDURE find_holes_naive(stat CHAR(3))BEGIN DECLARE dt DATETIME DEFAULT '2004-01-01 00:00:00'; DECLARE c INT;

WHILE dt < '2005-01-01 00:00:00' DO SELECT COUNT(*) INTO c FROM temperatures WHERE (stat_id, datetime) = (stat, dt);

IF c = 0 THEN -- missing row SELECT stat, dt; END IF;

SET dt = dt + INTERVAL 10 MINUTE; END WHILE;END //

Page 33: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Finding Holes… the „standard“ way/* Working with an ordered set */db_query(„SELECT datetime FROM temperatures WHERE stat_id = ? ORDER BY datetime ASC“);

for(„all timestamps to check“) {

db_fetch_row();

while(„timestamps don‘t match“) {warn_about_missing_row();increment_timestamp();

}}

Page 34: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

As a Procedure (Ordered Set)CREATE PROCEDURE find_holes_ordered(stat CHAR(3))BEGIN DECLARE no_more_rows BOOLEAN DEFAULT FALSE; DECLARE dt1 DATETIME DEFAULT '2004-01-01 00:00:00'; DECLARE dt2 DATETIME;

DECLARE temperatures_cursor CURSOR FOR SELECT datetime FROM temperatures WHERE stat_id = stat ORDER BY datetime ASC;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE;

OPEN temperatures_cursor;

temperatures_loop: LOOP FETCH temperatures_cursor INTO dt2;

WHILE dt1 != dt2 DO SELECT stat, dt1; SET dt1 = dt1 + INTERVAL 10 MINUTE; IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF; END WHILE;

SET dt1 = dt1 + INTERVAL 10 MINUTE; IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF;

END LOOP temperatures_loop;

CLOSE temperatures_cursor;END//

Page 35: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Self-Reference (LEFT self-JOIN)SELECT *FROM temperatures

LEFT JOIN temperatures AS missing

ON temperatures.stat_id = missing.stat_idAND temperatures.datetime + INTERVAL 10 MINUTE = missing.datetime

WHERE temperatures.stat_id = 'ABO'AND missing.datetime IS NULL;

Page 36: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Self-Reference (NOT EXISTS)SELECT *FROM temperatures

WHERE NOT EXISTS (

SELECT * FROM temperatures AS missing WHERE missing.datetime = temperatures.datetime + INTERVAL 10 MINUTE AND missing.stat_id = temperatures.stat_id

)

AND stat_id = 'ABO';

Page 37: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

What‘s the performance?• SeqEngine

‣ LEFT JOIN

‣ NOT EXISTS

• Self-Reference

‣ LEFT self-JOIN

‣ NOT EXISTS

• Stored Procedures

‣ Naïve (Single SELECTs)

‣ Standard (Ordered SET)

Page 38: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

The benchmark

Query Remarks Time [s]1

2

3

4

5

6

SeqEngine (NOT EXISTS) 0.28

SeqEngine (LEFT JOIN) 0.29

Procedure (Ordered SET) result set per missing row 0.59

Self (NOT EXISTS) only first missing row 0.93

Self (LEFT JOIN) only first missing row 1.10

Procedure (Single SELECTs) result set per missing row 2.80

All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL, your storage engines, your data sets and your server configuration settings.

Page 39: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

The benchmark

1. SeqEngine (NOT EXISTS)

2. SeqEngine (LEFT JOIN)

3. Procedure (Ordered SET)

4. Self Reference (NOT EXISTS)

5. Self Reference (LEFT JOIN)

6. Procedure (Single SELECTs)

0s 0.5s 1.0s 1.5s 2.0s 2.5s 3.0s

2.80s

1.10s

0.93s

0.59s

0.29s

0.28s

All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL, your storage engines, your data sets and your server configuration settings.

Page 40: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Lessons to be learned…• The Sequence trick (and SeqEngine) worked

‣ It may sometimes pay off to go the extra mile and write a custom storage engine!

• Stored PROCEDUREs with CURSORs sometimescan be damned fast!

• Subquery optimization really did progress in MySQL(at least in some parts, more to come with 6.0)

‣ Consider NOT EXISTS over LEFT JOIN

Page 41: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

2nd use case: Generate Test Datamysql> CREATE TABLE large (i INT NOT NULL) ENGINE=SeqEngine CONNECTION='1;10000000;1';

Query OK, 0 rows affected (0,12 sec)

mysql> ALTER TABLE large ENGINE=MyISAM;

Query OK, 10000000 rows affected (3,27 sec)Records: 10000000 Duplicates: 0 Warnings: 0

Page 42: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Generating other Sequences from IntegersCREATE VIEW letters AS

SELECT CHAR(i) FROM integer_sequence;

CREATE VIEW timestamps ASSELECT FROM_UNIXTIME(i) FROM integer_sequence;

CREATE VIEW squares ASSELECT i*i FROM integer_sequence;

Page 43: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Generate very large and complex data setsINSERT INTO customersSELECT i AS customer_id, MD5(i) AS customer_name, ROUND(RAND()*80+1920) AS customer_year FROM large;

SELECT * FROM customers;+-------------+---------------------+---------------+| customer_id | customer_name | customer_year |+-------------+---------------------+---------------+| 1 | c4ca4238a0b9f75849… | 1935 | | 2 | c81e728d9d4c2f636f… | 1967 | | || || … | … | … | +-------------+---------------------+---------------+10000000 rows in set

Page 44: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

„Salvage“ a bad designOne-to-Many gone wrong:

Table `users`+----------+--------+---------+---------+| username | sel1 | sel2 | sel3 |+----------+--------+---------+---------+| john | apple | orange | pear | | bill | NULL | NULL | NULL | | emma | banana | pear | NULL | +----------+--------+---------+---------+

Page 45: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

„Salvage“ a bad designCREATE TABLE salvage ( col INT NOT NULL) ENGINE=SeqEngine CONNECTION='1;3;1';

+-----+| col |+-----+| 1 | | 2 | | 3 | +-----+

Page 46: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

„Multiply“ the rows with a cartesian JOINmysql> SELECT * FROM users CROSS JOIN salvage;

+----------+--------+--------+------+-----+| username | sel1 | sel2 | sel3 | col |+----------+--------+--------+------+-----+| bill | NULL | NULL | NULL | 1 | | bill | NULL | NULL | NULL | 2 | | bill | NULL | NULL | NULL | 3 | | emma | banana | pear | NULL | 3 | | emma | banana | pear | NULL | 1 | | emma | banana | pear | NULL | 2 | | john | apple | orange | pear | 1 | | john | apple | orange | pear | 2 | | john | apple | orange | pear | 3 | +----------+--------+--------+------+-----+9 rows in set (0,00 sec)

Page 47: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

„Multiply“ the rows with a cartesian JOINmysql> SELECT * FROM users CROSS JOIN salvage;

+----------+--------+--------+------+-----+| username | sel1 | sel2 | sel3 | col |+----------+--------+--------+------+-----+| bill | NULL | NULL | NULL | 1 | | bill | NULL | NULL | NULL | 2 | | bill | NULL | NULL | NULL | 3 | | emma | banana | pear | NULL | 3 | | emma | banana | pear | NULL | 1 | | emma | banana | pear | NULL | 2 | | john | apple | orange | pear | 1 | | john | apple | orange | pear | 2 | | john | apple | orange | pear | 3 | +----------+--------+--------+------+-----+9 rows in set (0,00 sec)

Page 48: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Normalized on the flySELECT username, CASE col WHEN 1 THEN sel1 WHEN 2 THEN sel2 WHEN 3 THEN sel3 END AS selFROM users CROSS JOIN salvageHAVING sel IS NOT NULL;

+----------+--------+| username | sel |+----------+--------+| john | apple | | emma | banana | | john | orange | | emma | pear | | john | pear | +----------+--------+

Page 49: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Comma-Separated Attribute Listsmysql> DESCRIBE selections;+------------+--------------+------+-----+---------+| Field | Type | Null | Key | Default |+------------+--------------+------+-----+---------+| username | varchar(5) | NO | PRI | NULL || selections | varchar(255) | NO | | NULL |+------------+--------------+------+-----+---------+

mysql> SELECT * FROM selections;+----------+-------------------+| username | selections |+----------+-------------------+| john | apple,orange,pear || bill | || emma | banana,pear |+----------+-------------------+

Page 50: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Querying Comma-Separated Attribute ListsSELECT username, SUBSTRING_INDEX(

SUBSTRING_INDEX( selections, ',', i ),

',', -1 ) AS selectionFROM selectionsJOIN integersHAVING selection NOT LIKE '';

Page 51: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Querying Comma-Separated Attribute ListsSELECT username, -- Take last element SUBSTRING_INDEX(

-- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ),

',', -1 ) AS selectionFROM selectionsJOIN integersHAVING selection NOT LIKE '';

Page 52: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Querying Comma-Separated Attribute ListsSELECT username, -- Take last element SUBSTRING_INDEX(

-- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ),

',', -1 ) AS selectionFROM selectionsJOIN integersHAVING selection NOT LIKE '';

Page 53: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Querying Comma-Separated Attribute ListsSELECT username, -- Take last element SUBSTRING_INDEX(

-- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ),

',', -1 ) AS selectionFROM selectionsJOIN integersHAVING selection NOT LIKE '';

Page 54: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Counting members from attribute listsSELECT SUBSTRING_INDEX( SUBSTRING_INDEX( CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection, COUNT(*)FROM selections JOIN integersGROUP BY selectionHAVING selection NOT LIKE '';+-----------+----------+| selection | COUNT(*) |+-----------+----------+| apple | 1 || banana | 1 || orange | 1 || pear | 2 |+-----------+----------+

Page 55: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Problem: Variable-sized IN-Predicates• Statements can‘t be prepared for variable-sized lists in the

in clause:

‣ SELECT * FROM x WHERE a IN (?)

• One needs:

‣ SELECT * FROM x WHERE a IN (?)

‣ SELECT * FROM x WHERE a IN (?, ?)

‣ SELECT * FROM x WHERE a IN (?, ?, ?, …)

• Example from Stéphane Faroult: „The Art of SQL“adapted for MySQL

Page 56: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Split arguments as before!SELECT …FROM rentalINNER JOIN customer ON rental.customer_id = …INNER JOIN address ON ……INNER JOIN ( SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(?, ","), ",", i), ",", -1 ) AS customer_id FROM sequences.integers WHERE i <= ?) AS s ON rental.customer_id = s.customer_id…WHERE …;

Page 57: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

SQL-String-Parsing beats Query-Parsing!

0

4

8

12

16

20

1000

x

1000

0x

2000

0x

3000

0x

0.6

6.0

12.1

18.1

0.5

5.4

10.9

16.3

Execution Times in Seconds for a different number of runs (lower is better)

Prepared/Sequence Client-side IN-List

Page 58: Solving Common Sql Problems With The Seq Engine

Beat VontobelCTO, MeteoNews [email protected]://seqengine.org

Sequences and SeqEngine: Conclusion• Use Sequences (and SeqEngine) to e.g.:

‣ Find missing rows

‣ Generate test data

‣ Pivot tables

‣ Do clever-things with „for-Loops“ (String-Parsing etc.)

• http://seqengine.org

‣ Slides will be available shortly after the presentation (also on conference website)