![Page 1: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/1.jpg)
Build your Data Skills: Introduction to SQLKaeli Samson, MA, MPHDepartment of BiostatisticsCollege of Public HealthUniversity of Nebraska Medical Center
June 25th, 2019
![Page 2: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/2.jpg)
2
Top 3 Skills:1. SQL2. Github3. Marketing
Hadley Wickham, PhDChief Scientist, RStudio
![Page 3: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/3.jpg)
3
• Brief Intro to SQL• Terminology• General syntax/structure• Description of Dataset• Basic Queries• Creating New Variables• Joins• Helpful SQL Code
Overview
![Page 4: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/4.jpg)
4
SQL = Structured Query Language
Typically associated with use in database management, but also great for data management, generally!
Brief Introduction to SQL
![Page 5: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/5.jpg)
5
Brief Background in Database DesignStudent Student
ContactCourse Course
DescriptionInstructor Instructor
Contact
Josie 555-1234 Calculus II Integration Julie 555-8888
Ken 555-9845 Calculus II Integration Joe 555-2222
Brooke 555-7878 Calculus II Integration Julie 555-8888
Addison 555-1111 Calculus II Integration Julie 555-8888
Cole 555-6127 Calculus II Integration Julie 555-8888
Samantha 555-1534 Calculus II Integration Joe 555-2222
Josh 555-5463 Calculus II Integration Joe 555-2222
Josie 555-1234 GIS I Mapping Paul 555-3333
Ken 555-9845 GIS I Mapping Paul 555-3333
![Page 6: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/6.jpg)
6
Brief Background in Database Design
InstructorID
InstructorName
Instructor Contact
1 Julie 555-8888
2 Joe 555-2222
3 Paul 555-3333
Course ID Course Name
Course Description
A Calculus II Integration
B GIS I Mapping
StudentID
Student Name
StudentContact
1 Josie 555-1234
2 Ken 555-9845
3 Brooke 555-7878
4 Addison 555-1111
5 Cole 555-6127
6 Samantha 555-1534
7 Josh 555-5463
Enrollment ID Student ID Course ID Instructor ID
1 1 A 1
2 2 A 2
3 1 B 3
![Page 7: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/7.jpg)
7
SQL can be used to “query” data, but can do more, such as:• Create new variables• Join tables together• Insert observations• Edit observations • Delete observations
Brief Introduction to SQL
![Page 8: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/8.jpg)
Terminology
![Page 9: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/9.jpg)
9
S-Q-L vs. Sequel?
![Page 10: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/10.jpg)
10
Terminology
Dataset
Observation
Variable
Table
Row
Column
SAS SQL
Note: Since this presentation uses SAS to run SQL, SAS terms will be used interchangeably with SQL terms, although I acknowledge in some fields of study these terms are not considered synonymous.
![Page 11: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/11.jpg)
General Structure and Syntax of SQL
![Page 12: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/12.jpg)
12
SELECT Choose variables/columns for your table
FROM Indicate source(s) of data (i.e. datasets)
WHERE Subsetting criteria for rows
GROUP BY Grouping desired for summary variables
ORDER BY Sort order for rows
Common SQL Clauses
Clauses must be in this specific order!
![Page 13: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/13.jpg)
13
Common SQL ClausesCREATE TABLESELECTFROMWHEREGROUP BYORDER BY
SELECTFROMWHEREGROUP BYORDER BY
Prints query result (i.e. table) to output
Saves query result (i.e. table) as SAS dataset
![Page 14: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/14.jpg)
14
SQL in SASPROC SQL ;
SELECTFROMWHEREGROUP BYORDER BY ;
QUIT ;
Specific to SAS
![Page 15: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/15.jpg)
15
SQL in SASNote: While SQL is an ANSIstandard language, each software that runs it, including SAS, may have their own options that are specific to that software. As such, some of the code in this presentation may not work outside SAS, but the general principles will still apply.
![Page 16: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/16.jpg)
Structure of Example Dataset
![Page 17: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/17.jpg)
17
• State Identifiers (x3)• Name• Abbreviation• FIPS Code
• State Information• Division• Population• Number of representatives• Change in number of seats
Data: sashelp.us_data
![Page 18: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/18.jpg)
18
sashelp.us_data (abbr.)
![Page 19: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/19.jpg)
19
data us_pop;set sashelp.us_data
(rename=(population_2010=pop_2010));keep statename state division pop_2010;
run;
New Dataset
us_pop
![Page 20: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/20.jpg)
Basic Queries
![Page 21: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/21.jpg)
21
Printing all variables and observations in a dataset
proc sql;select var1, var2, var3, var4from dataset;
quit;
Basic Structure of SQL Code
SQL
![Page 22: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/22.jpg)
22
Basic Structure of SQL Code
dataset
variables
dataset
variables
Traditional SAS Code:
Printing all variables and observations in a dataset
proc print data=us_pop;var statename pop_2010 state division;
run;
proc sql;select statename, pop_2010, state, divisionfrom us_pop;
quit;
![Page 23: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/23.jpg)
23
Printing all variables and observations in a dataset
Basic Structure of SQL Code
Proc Print Proc SQL
![Page 24: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/24.jpg)
24
Printing all variables and observations in a dataset
proc sql;select *from dataset;
quit;
Basic Structure of SQL Code
select all variables
![Page 25: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/25.jpg)
25
Printing all observations in a dataset
proc print data=us_pop;run;
proc sql;select *from us_pop;
quit;
Basic Structure of SQL Code
Traditional SAS Code:
![Page 26: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/26.jpg)
26
Printing all observations in a dataset
Basic Structure of SQL Code
![Page 27: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/27.jpg)
27
Printing unique observations in a dataset
proc sql;select divisionfrom us_pop;
quit;
proc sql;select distinct divisionfrom us_pop;
quit;
Basic Structure of SQL Code
only select unique observations
![Page 28: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/28.jpg)
28
Printing unique observations in a dataset
Basic Structure of SQL Code
proc sql;select distinct divisionfrom us_pop;
quit;
proc sql;select divisionfrom us_pop;
quit;
![Page 29: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/29.jpg)
29
Printing a subset of observations
proc sql;select *from datasetwhere var in (“A”, “B”, “C”);
quit;
Basic Structure of SQL Code
restrict observations with where clause
![Page 30: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/30.jpg)
30
Printing a subset of observations
proc print data=us_pop;where division in ("West North Central", "Mountain");
run;
proc sql;select *from us_popwhere division in ("West North Central", "Mountain");
quit;
Basic Structure of SQL Code
Traditional SAS Code:
![Page 31: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/31.jpg)
31
Printing a subset of observations
Basic Structure of SQL Code
![Page 32: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/32.jpg)
32
Other examples of where clause in SQLproc sql;
select *from us_popwhere pop_2010 between 0 and 1000000;
quit;
proc sql;select *from us_popwhere 0 le pop_2010 le 1000000 and division = "Mountain";
quit;
Basic Structure of SQL Code
![Page 33: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/33.jpg)
33
Other examples of where clause in SQL
Basic Structure of SQL Code
where pop_2010 between 0 and 1000000;
where 0 le pop_2010 le 1000000 and division = "Mountain";
![Page 34: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/34.jpg)
34
Sort observations in a dataset
proc sql;select *from datasetorder by var;
quit;
proc sql;select *from datasetorder by var1, var2 desc;
quit;
Basic Structure of SQL Code
sort output using ‘order by’ clause
![Page 35: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/35.jpg)
35
Sort observations in a dataset
proc sort data=us_pop;by division descending population_2010;
run;
proc print data=us_pop;run;
proc sql;select *from us_poporder by division, population_2010 desc;
quit;
Basic Structure of SQL Code
Traditional SAS Code:
![Page 36: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/36.jpg)
36
Sort observations in a dataset
Basic Structure of SQL Code
![Page 37: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/37.jpg)
Creating New Variables
![Page 38: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/38.jpg)
38
Creating new variables
proc sql;select statename, pop_2010, pop_2010/1000000 as new_popfrom us_pop;
quit;
Creating New Variables
new variable
name
new variable
definition
(not optional)
![Page 39: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/39.jpg)
39
Creating new variables
proc sql;select statename, pop_2010, pop_2010/1000000 as new_popfrom us_pop;
quit;
Creating New Variables
![Page 40: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/40.jpg)
40
Cleaning up new variables
proc sql;select
statename, pop_2010, pop_2010/1000000 as new_pop format=8.1 label="Pop in Millions"
from us_pop;quit;
new variable
label
new variable format
Creating New Variables
![Page 41: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/41.jpg)
41
Cleaning up new variablesproc sql;
selectstatename, pop_2010, pop_2010/1000000 as new_pop format=8.1 label="Pop in Millions"
from us_pop;quit;
Creating New Variables
![Page 42: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/42.jpg)
42
Summary Variables
proc sql;select mean(pop_2010) as mean_popfrom us_pop;
quit;
Creating New Variables
new variable
name
new variable
definition
![Page 43: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/43.jpg)
43
Summary Variables
proc sql;select mean(pop_2010) as mean_popfrom us_pop;
quit;
Creating New Variables
The summary function is applied to the entire data set (when there is
no group by clause)
![Page 44: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/44.jpg)
44
Using the count functionproc sql;
select count(*)from us_pop;
quit;
proc sql;select count(division)from us_pop;
run;
proc sql;select count(distinct division)from us_pop;
run;
Creating New Variables
![Page 45: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/45.jpg)
45
Summary Variables by Group
proc sql;select division, mean(pop_2010) as div_mean_popfrom us_popgroup by division;
quit;
Creating New Variables
new variable
name
new variable
definition
Grouping variable: will calculate summary statistics for each unique value of this variable
![Page 46: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/46.jpg)
46
Summary Variables by Groupproc sql;
select division, mean(pop_2010) as div_mean_popfrom us_popgroup by division;
quit;
Creating New Variables
![Page 47: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/47.jpg)
47
Summary Variables by GroupNote: It’s important to have your grouping variable in both your select and your group by clauses!
proc sql;select division, mean(pop_2010) as div_mean_popfrom us_popgroup by division;
quit;
Creating New Variables
![Page 48: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/48.jpg)
48
Summary Variables by GroupThis is what happens if you leave the grouping variable out of the select clause…
proc sql;select mean(pop_2010) as div_mean_popfrom us_popgroup by division;
quit;
Creating New Variables
![Page 49: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/49.jpg)
49
Summary Variables by GroupThis is what happens if you forget to includes the group by clause:
proc sql;select division, mean(pop_2010) as div_mean_popfrom us_pop;
quit;
Creating New Variables
![Page 50: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/50.jpg)
50
proc sql;create table div_stats asselect division, mean(pop_2010) as div_mean_popfrom us_popgroup by division;
quit;
Creating New SAS Datasetsnew SAS
dataset namenot
optional
![Page 51: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/51.jpg)
Joins
![Page 52: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/52.jpg)
52
sashelp.us_data (abbr.)
![Page 53: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/53.jpg)
53
data us_pop;set sashelp.us_data
(rename=(population_2010=pop_2010));keep statename state division pop_2010;
run;
data us_rep;set sashelp.us_data (keep=
state statecodereps_2010);
run;
data us_seatch;set sashelp.us_data (keep=
statecodestatenameseat_change_2010);
run;
New Datasets
![Page 54: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/54.jpg)
54
New Datasets
us_rep
us_pop
us_seatch
![Page 55: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/55.jpg)
55
Basic set up of a joinproc sql;
select dataset1.*, dataset2.*
from dataset1, dataset2
where dataset1.common_var = dataset2.common_var;quit;
Joins
Common variable structure:
data_source.variable_nameWhen joining tables, it is important to let SQL know which variables are coming from each dataset, especially when variables in different datasets have the same name.
![Page 56: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/56.jpg)
56
Basic set up of a joinproc sql;
select dataset1.*, dataset2.*
from dataset1, dataset2
where dataset1.common_var = dataset2.common_var;quit;
Joins
Without the where statement, the output table would be a Cartesian product join, where every observation in the first dataset would be joined to the first observation in the second table, and that would be repeated for each subsequent observation in the second table, until you had a table with 52*52 = 2,704 observations!
![Page 57: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/57.jpg)
57
Use abbreviations for dataset namesproc sql;
select dataset1.*, dataset2.*
from dataset1, dataset2
where dataset1.common_var = dataset2.common_var;quit;
proc sql;select abbr1.*,
abbr2.*from dataset1 as abbr1,
dataset2 as abbr2where abbr1.common_var = abbr2.common_var;
quit;
Joins
the ‘as’ is optional in the from clause
equivalent
![Page 58: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/58.jpg)
58
Desired Join
us_rep
us_pop
![Page 59: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/59.jpg)
59
Example of a join – keep all variables
proc sql;select pop.*,
rep.*from us_pop as pop,
us_rep as repwhere pop.state = rep.state;
quit;
Joins
Note that unlike merges in the data step, there’s no need to sort the input datasets prior to a join!
![Page 60: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/60.jpg)
60
Example of a join – keep all variablesproc sql;
select pop.*, rep.*from us_pop as pop, us_rep as repwhere pop.state = rep.state;
quit;
Joins
![Page 61: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/61.jpg)
61
Save joined tables as SAS dataset
proc sql;create table pop_rep asselect pop.*,
rep.*from us_pop as pop,
us_rep as repwhere pop.state = rep.state;
select *from pop_rep;
quit;
Joins
prints the new dataset
![Page 62: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/62.jpg)
62
Desired Joinus_pop
us_seatch
But we only want the following
variables in the final joined table:• Statecode• Population• Seat Change
![Page 63: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/63.jpg)
63
Joins
us_pop us_seatch
We only want the following variables in the final joined table:
• Statecode• Population• Seat Change
proc sql;select statecode,
pop_2010, seat_change_2010
from us_pop as pop, us_seatch as seat
where pop.statename = seat.statename;quit;
![Page 64: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/64.jpg)
64
Joinsproc sql;
select statecode, pop_2010, seat_change_2010
from us_pop as pop, us_seatch as seat
where pop.statename = seat.statename;quit;
![Page 65: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/65.jpg)
65
Desired Join
us_rep
us_pop
us_seatch
Keep:• Statename• Population• Representatives• Seat Change
![Page 66: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/66.jpg)
66
Joinsproc sql;
select pop.statename,pop_2010, reps_2010,seat_change_2010
from us_pop as pop, us_rep as rep,us_seatch as seat
where pop.state = rep.state ANDrep.statecode = seat.statecode;
quit;
Keep:• Statename• Population• Representatives• Seat Change
us_repus_pop us_seatch
![Page 67: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/67.jpg)
67
Joinsproc sql;
select pop.statename,pop_2010, reps_2010,seat_change_2010
from us_pop as pop, us_rep as rep,us_seatch as seat
where pop.state = rep.state ANDrep.statecode = seat.statecode;
quit;
![Page 68: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/68.jpg)
Some Good Uses for SQL
![Page 69: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/69.jpg)
69
Joins related to dates
seizure blood
The goal is to match up blood data that was collected before the day the seizure occurred,
but no more than one week before
![Page 70: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/70.jpg)
70
Joins related to dates
seizure blood
proc sql;select s.*,
b.*,(bld_date - sz_date) as days_diff
from seizure as s,blood as b
where s.pt_ID = b.pt_ID AND-7 le CALCULATED days_diff le -1;
quit;
![Page 71: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/71.jpg)
71
Joins related to datesproc sql;
select s.*, b.*,(bld_date - sz_date) as days_diff
from seizure as s,blood as b
where s.pt_ID = b.pt_ID AND-7 le CALCULATED days_diff le -1;
quit;
![Page 72: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/72.jpg)
72
Joins related to datesproc sql;create table sz_bld asselect
s.*, b.*,(bld_date - sz_date) as days_diff
from seizure as s,blood as b
where s.pt_ID = b.pt_ID AND-7 le CALCULATED days_diff le -1
order by pt_id, days_diff;
select *from sz_bld;
quit;
*If you only want one blood draw per patient, and prefer the draw that was closest to the seizure date;data sz_bld_single;
set sz_bld;by pt_id days_diff;
if last.pt_id then output;run;
![Page 73: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/73.jpg)
73
Confirming User-Defined Formatsproc format;
value age_group low - 30 = "Age Group 1"35 - 45 = "Age Group 2"46 - 55 = "Age Group 3"56 - 65 = "Age Group 4"66 - high = "Age Group 5";
run;
proc sql;select distinct ageatstart,
ageatstart format=age_group.from sashelp.heart;
quit;
![Page 74: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/74.jpg)
74
Confirming User-Defined Formats
proc format;value age_grouplow - 30 = "Age Group 1"35 - 45 = "Age Group 2“46 - 55 = "Age Group 3“56 - 65 = "Age Group 4"66 - high = "Age Group 5";
run;
proc sql;select distinct ageatstart,
ageatstart format=age_group.from sashelp.heart;
quit;
![Page 75: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/75.jpg)
75
Confirming User-Defined Variables
The goal is to create a new variable ‘risk’, which takes the value ‘At risk’ if any of the following variables have values with a star:
![Page 76: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/76.jpg)
76
Confirming User-Defined Variables
data heart;set sashelp.heart;
length risk $8.;
*Create overall risk variable;if bp_status = "High" OR
weight_status ne "Normal" then risk = "At risk";else risk = "Ok";
run;
proc sql;select distinct bp_status,
weight_status, risk, count(*) as total
from heartgroup by bp_status, weight_status, risk;
quit;
![Page 77: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/77.jpg)
77
Confirming User-Defined Variables
proc sql;select distinctbp_status, weight_status, risk, count(*) as totalfrom heartgroup bybp_status, weight_status, risk;
quit;
![Page 80: Build your Data Skills: Introduction to SQL · S-Q-L vs. Sequel? 10 Terminology. Dataset. Observation; Variable. ... Note: Since this presentation uses SAS torun SQL, SAS terms will](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6880b3ffb33e5b59689743/html5/thumbnails/80.jpg)