haas mfe sas workshop lecture 3: peng liu haas school

24
Haas MFE SAS Workshop Lecture 3: Peng Liu http:// faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006

Upload: maleah-harwell

Post on 28-Mar-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas MFE SAS WorkshopLecture 3:

Peng Liu http://faculty.haas.berkeley.edu/peliu/computing

Haas School of Business, Berkeley, MFE 2006

Page 2: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

SAS SQL

Peng Liu http://faculty.haas.berkeley.edu/peliu/computing

Haas School of Business, Berkeley, MFE 2006

Page 3: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 3

PROC SQL - What

What does SQL can do? SelectingOrdering/sortingSubsetting RestructuringCreating table/viewJoining/MergingTransforming variablesEditing

Page 4: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 4

PROC SQL - Why

The Advantage of using SQLCombined functionalityFaster for smaller tablesSQL code is more portable for non-SAS

applicationsNot require presorting Not require common variable names to

join on. (need same type , length)

Page 5: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 5

Selecting Data

PROC SQL;

SELECT DISTINCT rating FROM MFE.MOVIES;

QUIT;

The simplest SQL code, need 3 statements By default, it will print the resultant query, use

NOPRINT option to suppress this feature. Begin with PROC SQL, end with QUIT; not RUN; Need at least one SELECT… FROM statement DISTINCT is an option that removes duplicate rows

Page 6: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 6

Ordering/Sorting Data

PROC SQL ;

SELECT *

FROM MFE.MOVIES

ORDER BY category;

QUIT;

Remember the placement of the SAS statements has no effect; so we can put the middle statement into 3 lines

SELECT * means we select all variables from dataset MFE.MOVIES

Put ORDER BY after FROM. We sort the data by variable “category”

Page 7: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 7

Subsetting Data- Character searching in WHERE

PROC SQL;

SELECT title, category

FROM MFE.MOVIES

WHERE category CONTAINS 'Action';

QUIT;

Use comma (,) to separate selected variables CONTAINS in WHERE statement only for character

variables Also try WHERE UPCASE(category) LIKE '%ACTION%';

Use wildcard char. Percent sign (%) with LIKE operator.

Page 8: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 8

Subsetting Data- Phonetic Matching in WHERE

PROC SQL;

SELECT title, category, rating

FROM MFE.MOVIES

WHERE category =* 'Drana';

QUIT;

Always Put WHERE after FROM

Sounds like operator =* Search movie title for the phonetic variation of

“drama”, also help possible spelling variations

Page 9: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 9

Case Logic - reassigning/recategorize

PROC SQL;

SELECT title, rating,

CASE rating

WHEN 'G' THEN ‘General' ELSE 'Other'

END AS level

FROM MFE.MOVIES;

QUIT;

The order of each statement is important CASE …END AS should in between SELECT and FROM Note there is , after the variables you want to select Use WHEN … THEN ELSE… to redefine variables Rename variable from “rating” to “level”

Page 10: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 10

Creating New Data - Create Table

PROC SQL;

CREATE TABLE ACTION AS

SELECT title, category

FROM MFE.MOVIES

WHERE category CONTAINS 'Action';

QUIT;

CREATE TABLE … AS can always be in front of SELECT … FROM statement to build a sas file.

In SELECT, the results of a query are converted to an output object (printing). Query results can also be stored as data. The CREATE TABLE statement creates a table with the results of a query. The CREATE VIEW statement stores the query itself as a view. Either way, the data identified in the query can beused in later SQL statements or in other SAS steps.

Produce a new dataset (table) ACTION in work directory, no printing

Page 11: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 11

Creating New Data - Create View

PROC SQL;

CREATE VIEW G_MOVIES AS

SELECT title, category, rating

FROM MFE.MOVIES

WHERE rating = 'G'

ORDER BY title;

SELECT * FROM G_MOVIES;

QUIT;

First step-creating a view,no output is produced; then display the desired output results

Use ; to separate two block of code inside of proc sql When a table is created, the query is executed and the resulting

data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view.

Page 12: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 12

Join Tables (Merge datasets) - Cartesian Join

PROC SQL;

SELECT *

FROM MFE.CUSTOMERS, MFE.MOVIES;

QUIT;

Terminology: Join (Merge) datasets (tables) No prior sorting required – one advantage over DATA MERGE Use comma (,) to separate two datasets in FROM Without WHERE, all possible combinations of rows from each

tables is produced, all columns are included Turn on the HTML result option for better display:Tool/Options/Preferences…/Results/ check Create HTML/OK

Page 13: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 13

Join Tables (Merge datasets) - Inner Join using WHERE

PROC SQL;

SELECT *

FROM MFE.MOVIES, MFE.ACTORS

WHERE MOVIES.title = ACTORS.title;

QUIT; Use WHERE to specify connecting columns (title) table1.matchvar = table2.matchvar Produce rows that have same movie title The matching variable can be of different name

different datasets

Page 14: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 14

Join Tables (Merge datasets) - Inner Join using WHERE (Cont.)

PROC SQL;

SELECT M.title, M.rating, A.actor_leading

FROM MFE.MOVIES M, MFE.ACTORS A

WHERE MOVIES.title = ACTORS.title;

QUIT;

Short-cut for table names Can be used in SELECT and WHERE statements Need to be declared in FROM statement

Page 15: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 15

Join Tables (Merge datasets) - Join three tables

PROC SQL;

SELECT C.cust_no,

M.title,M.rating, M.category,

A.actor_leading

FROM MFE.CUSTOMERS C,

MFE.MOVIES2 M,

MFE.ACTORS A

WHERE C.cust_no = M.cust_no AND M.title = A.title;

QUIT;

Use AND in WHERE statement to specify two matching conditions

Produce rows that satisfies all the conditions Note: We use MOVIES2 in this example Can join up to 32 tables in one SQL code

Page 16: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 16

Join Tables (Merge datasets) - Inner Joins using ON

PROC SQL;

SELECT M.title, rating,actor_leading

FROM MFE.MOVIES M

INNER JOIN MFE.ACTORS A

ON M.TITLE = A.TITLE;

QUIT;

Same result as using where WHERE is used to select rows from inner joins ON is used to select rows from outer or inner

Page 17: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 17

Join Tables (Merge datasets) - Left Outer Joins

PROC SQL;

SELECT MOVIES.title, actor_leading, rating

FROM MFE.MOVIES

LEFT JOIN

MFE.ACTORS

ON MOVIES.title = ACTORS.title;

QUIT;

Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from LEFT table (MOVIES) that did not match any row in the right (ACTORS) table.

Essentially the rows from LEFT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists.

Need to specify a table name for the matching variable in SELECT

Page 18: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 18

Join Tables (Merge datasets) - Right Outer Joins

PROC SQL;

SELECT ACTORS.title, actor_leading, rating

FROM MFE.MOVIES

RIGHT JOIN

MFE.ACTORS

ON MOVIES.title = ACTORS.title;

QUIT;

Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from RIGHT table (ACTORS) that did not match any row in the right (MOVIES) table.

Essentially the rows from RIGHT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists.

Page 19: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 19

Join Tables (Concatenating) - Outer Union

PROC SQL;

SELECT * FROM MFE.CUSTOMERS

OUTER UNION

SELECT * FROM MFE.MOVIES;

QUIT;

SQL performs OUTER UNION, similar to DATA steps with a SET statement to Concatenate datasets.

The result contains all the rows produced by the first table-expression followed by all the row produced by the second table-expression.

Page 20: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 20

Transforming Data- Creating new Variables

/*Creating new variables*/

PROC SQL;

SELECT title, length, category, year, rating,

2006-year AS age

FROM MFE.MOVIES;

QUIT;

You can create new variables within SELECT statement, the name of new variable follows after AS.

Note the order of the express is reversed

Page 21: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 21

Transforming Data- Summarizing Data using SQL functions

PROC SQL;

SELECT *,

COUNT(title) AS notitle,

MAX(year) AS most_recent,

MIN(year) AS earliest,

SUM(length) AS total_length,

NMISS(rating) AS nomissing

FROM MFE.MOVIES

GROUP BY rating;

QUIT;

Simple summarization functions available All function can be operated in GROUPs

Page 22: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 22

Editing Data – Insert observations.

PROC SQL NOPRINT;

INSERT INTO MFE.CUSTOMERS

VALUES(1 'Peng');

INSERT INTO MFE.CUSTOMERS

SET Cust_no=2,Name='Sasha';

QUIT;

There are two ways of inserting observations into a table. Data type should be the same.

VALUES( ) new values are separated by space. SET column name = newly assigned values, delimited

by commas.

Page 23: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 23

Editing Data – Deleting rows and Dropping columns

/*Deleting rows*/

PROC SQL;

DELETE

FROM MFE.MOVIES

WHERE length LE 100;

QUIT;

/*Droping variables*/

PROC SQL;

CREATE TABLE NEW (DROP=rating) AS

SELECT *

FROM MFE.MOVIES;

QUIT;

Deleting columns can be done in SELECT or in DROP on created table

Page 24: Haas MFE SAS Workshop Lecture 3: Peng Liu   Haas School

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 24

Editing Data– Update observations

/*Updating Observation*/

PROC SQL NOPRINT;

UPDATE MFE.CUSTOMERS

SET Name='Liu'

WHERE Cust_no=1;

QUIT; UPDATE … SET… WHERE Find the observation and set new value If more than one observations satisfies the condition,

all are updated with the new data in SET statement