denormalizing data with proc sql. demoralizing data with proc sql is that a real word?? spell...

19
Denormalizing Data with PROC SQL

Upload: leslie-small

Post on 18-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Denormalizing Data with PROC SQL

Page 2: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Demoralizing Data withPROC SQL

Is that a real word?? Spell Check…

Page 3: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Grocery Store

Page 4: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

What You HaveCUSTOMER_ID NAME

1 Ansel

2 Fiona

3 James

4 Kathy

5 Ying

6 Otto

7 Costas

8 Abdul

9 Enrico

10 Mitzu

ITEM_ID ITEM_NAME

1 eggs

2 milk

3 bread

4 chicken

5 beef

6 broccoli

7 carrots

8 apples

9 peaches

10 dog food

CUSTOMER_ID ITEM_ID

1 1

1 2

1 4

1 6

2 2

2 3

2 4

2 7

2 8

… …10 4

10 5

10 8

10 10

• Relational tables in normal form

• Purchase events in many-to-many relation

• Good for relational storage, bad for computing stats

• Data step: join tables using complex merge

Page 5: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

What you wantCUSTOMER_ID NAME EGGS MILK BREAD … DOG_FOOD

1 Ansel 1 1 0 … 0

2 Fiona 1 1 1 … 1

3 James 0 1 0 … 0

4 Kathy 1 0 1 … 0

5 Ying 1 1 1 … 1

6 Otto 0 0 1 … 1

7 Costas 0 0 1 … 1

8 Abdul 0 1 0 … 0

9 Enrico 1 0 1 … 1

10 Mitzu 0 1 0 … 1

• Matrix shows who bought what• One row per customer• One column per item• Easy to compute stats

Page 6: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

How do you get this?

A few SQL examplesto build up to a solution…

Page 7: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

SQL Examples

What items did customer #1 buy?

select item_id from purchaseswhere customer_id = 1;

ITEM_ID---------- 1 2 4 6

Page 8: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

What items did customer #1 buy?Join with grocery table to get item name

select P.item_id, G.item_name fromfrom purchases P, groceries Gwhere G.item_id = P.item_idand P.customer_id = 1;

ITEM_ID ITEM_NAME---------- --------- 1 eggs 2 milk 4 chicken 6 broccoli

Page 9: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

How many customers bought eggs?

Use SQL aggregate function count().

select count(*)

from purchases P, groceries G

where P.item_id = G.item_id

and G.item_name = 'eggs'

COUNT(*)

--------

5

Page 10: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Did customer #1 buy eggs?

Restrict by customer, count() function returns 0 or 1, i.e., yes or no

select count(*)

from groceries G, purchases P

where P.item_id = G.item_id

and G.item_name = 'eggs'

and P.customer_id = 1;

COUNT(*)

--------

1

Page 11: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Did customer #10 buy eggs?

select count(*)

from groceries G, purchases P

where P.item_id = G.item_id

and G.item_name = 'eggs'

and P.customer_id = 10;

COUNT(*)

--------

0

Page 12: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Subqueries• In SQL, select clause can include a query that

returns a scalar value

select name, (select count(*) from purchases) num_purchasesfrom customers

NAME NUM_PURCHASES----------- -------------Ansel 58Fiona 58James 58Kathy 58Ying 58Otto 58Costas 58Abdul 58Enrico 58Mitzu 58

Page 13: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Correlated Subqueries• Relate inner and outer queries via alias

select name, (select count(*)

from purchaseswhere customer_id = C.customer_id) num_purchases

from customers C;

NAME NUM_PURCHASES----------- -------------Ansel 4Fiona 9James 6Kathy 3Ying 8Otto 7Costas 7Abdul 2Enrico 7Mitzu 5

Page 14: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Putting the pieces together

Joins to get data from multiple tables

Count() to get 0/1, yes/no

Correlated subqueries rotate rows to columns

Aliases to name columns

Page 15: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Final queryselect customer_id, name, (select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'eggs' and P.customer_id = C.customer_id) eggs,

(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'milk' and P.customer_id = C.customer_id) milk,

(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'bread' and P.customer_id = C.customer_id) breadfrom customers C;

Page 16: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

SAS Codeproc sql;

create table Work.Purchase_Matrix as select

customer_id,name,(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'eggs' and P.customer_id = C.customer_id) eggs,

(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'milk' and P.customer_id = C.customer_id) milk,

(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'bread' and P.customer_id = C.customer_id) bread

from customers C;

quit;

Page 17: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Final Dataset

CUSTOMER_ID NAME EGGS MILK BREAD

----------- ----------- ---------- ---------- ----------

1 Ansel 1 1 0

2 Fiona 1 1 1

3 James 0 1 0

4 Kathy 1 0 1

5 Ying 1 1 1

6 Otto 0 0 1

7 Costas 0 0 1

8 Abdul 0 1 0

9 Enrico 1 0 1

10 Mitzu 0 1 0

Page 18: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Resources

SQL may not always be the most appropriate choice for a given problem. This technique starts to get untenable as the number of columns needed in the output increases.

DATA Step vs. PROC SQL: What’s a neophyte to do?

http://www2.sas.com/proceedings/sugi29/269-29.pdf

Proc SQL versus The Data Step

http://www.nesug.org/proceedings/nesug06/hw/hw06.pdf

Page 19: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…

Questions?