lec5 student

28
Concatenating SAS Data Sets

Upload: brosa3

Post on 08-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 1/28

Concatenating SAS Data Sets

Page 2: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 2/28

2

Define concatenation.

Use the SET statement in a DATA step toconcatenate two or more SAS data sets.

Use the RENAME= data set option to change thenames of variables.

Use the SET and BY statements in a DATA step tointerleave two or more SAS data sets.

Objectives

Page 3: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 3/28

3

DATA SAS-data-set ;SET SAS-data-set1 SAS-data-set2 . . .;<other SAS statements >

RUN;

Concatenating SAS Data SetsUse the SET statement in a DATA step to concatenate

SAS data sets.

Page 4: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 4/28

4

You can read any number of SAS data sets with a single

SET statement.

SAS data sets

jan

feb

 mar

data work.qtr1;set work.jan work.feb

work.mar;

run;

janfeb

 mar

work.qtr1

...

Concatenating SAS Data Sets

Page 5: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 5/28

5

Two SAS data sets, na1 and na2, contain data for newly

hired navigators. Concatenate the data sets into a newdata set named newhires.

 Name

Gender

JobCodeTORRES M 

 NA1

LANG

F

 NA1

SMITH F

 NA1

 Name

Gender JobCodeLISTER M 

 NA2

TORRES F NA2

na1 na2

Business Task

Page 6: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 6/28

6

 Name

Gender JobCodeTORRES M 

 NA1

LANG

F NA1

SMITH F NA1

LISTER 

 M 

 NA2

TORRES F NA2

newhires

Concatenating SAS Data Sets: Execution

When SAS reaches end-of-file on the last data set,

DATA step execution ends.

Page 7: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 7/287

Two SAS data sets, fa1 and fa2, contain data for newly

hired flight attendants. Concatenate the data sets into anew data set named newfa.

 Name Gender JobCode

KENT F FA1PATEL M  FA1JONES F FA1

 Name JCode Gender

LOPEZ FA2 FGRANT FA2 F

fa1 fa2

Business Task

Page 8: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 8/288

 Name Gender JobCode JCode

KENT F FA1PATEL M  FA1JONES F FA1LOPEZ F FA2GRANT F FA2

data newfa;set fa1 fa2;

run;

Concatenating SAS Data Sets: Execution

 Name Gender JobCodeKENT F FA1PATEL M  FA1JONES F FA1

 Name JCode GenderLOPEZ FA2 FGRANT FA2 F

newfa

...

fa1 fa2

Page 9: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 9/289

You can use a RENAME= data set option to change the

name of a variable.

General form of the RENAME= data set option:

SAS-data-set (RENAME=(old-name-1=new-name-1

old-name-2=new-name-2 .

.

.

old-name-n=new-name-n ))

The RENAME= Data Set Option

Page 10: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 10/2810

 Name Gender JobCode Name Gender JobCode

KENT F FA1PATEL M  FA1

JONES F FA1

LOPEZ F FA2

GRANT F FA2

data newfa;set fa1 fa2(rename=(JCode=JobCode));run;

 Name Gender JobCodeKENT F FA1PATEL M  FA1JONES F FA1

 Name JCode GenderLOPEZ FA2 FGRANT FA2 F

newfa

The RENAME= Data Set Option

...

fa1 fa2

Page 11: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 11/2811

Use the SET statement with a BY statement in a DATA

step to interleave SAS data sets.

General form of a DATA step interleave:

DATA SAS-data-set ;

SET SAS-data-set1 SAS-data-set2 . . .;BY BY-variable ;<other SAS statements >

RUN;

Interleaving SAS Data Sets

Page 12: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 12/2812

Interleaving SAS data sets simply concatenates SAS data

sets so the observations in the resulting data set are inorder.

Interleaving SAS Data Sets

data work.allemp;

set ia.miamiemp

ia.parisemp

ia.romeemp; by ID;

run;

work.allemp

ID Salary

109 36000171 54000

ID Salary083 87000

217 42000

ID Salary059 60000

154 88000

ia.miamiemp

ia.parisemp

ia.romeemp

ID Salary059 60000083 87000109 36000

154 88000171 54000217 42000

c08s1d4 ...

Page 13: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 13/2813

 Name Gender JobCode

GRANT F FA2

JONES F FA1

KENT F FA1

LOPEZ F FA2

PATEL M  FA1

data newfa;

set fa1 fa2(rename=(JCode=JobCode)); by Name;run;

newfa

Interleaving SAS Data Sets

 Name Gender JobCodeJONES F FA1KENT F FA1PATEL F FA1

 Name JCode GenderGRANT FA2 FLOPEZ FA2 F

...

fa1 fa2

Page 14: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 14/28

Merging SAS Data Sets

Page 15: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 15/2815

Prepare data for merging using the SORT procedureand data set options.

Merge SAS data sets on a single common variable.

Objectives

Page 16: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 16/2816

Merging SAS Data SetsUse the MERGE statement in a DATA step to join

corresponding observations from two or more SAS datasets.

DATA SAS-data-set ;MERGE SAS-data-sets; 

BY BY-variable(s);<other SAS statements> 

RUN;

Page 17: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 17/28

17

SAS data sets

data compare; merge costs sales goals taxes; by Month;

run;

sales goalscosts taxes

compare

goals taxescosts sales

Merging SAS Data SetsYou can read any number of SAS data sets with a single

MERGE statement.

Page 18: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 18/28

18

International Airlines iscomparing monthly salesperformance to monthly salesgoals.

The sales and goals data arestored in separate SAS datasets.

Business Task

Page 19: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 19/28

19

To calculate the difference between revenues and goals, the

 performance and goals data sets must be merged.

 Month Sales

1 2118223

2 1960034

 Month Goal

1 2130000

2 1920000

ia.performance ia.goals

Match-merge the data sets by Month and compute the

difference between the variable

values for Sales and Goal.

 Month Sales Goal Difference

1 2118223 2130000 -11777

2 1960034 1920000 40034

ia.compare

Business Task

Page 20: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 20/28

20

Merge two data sets to acquire the names of the German

crew who are scheduled to fly next week.

ia.gercrew ia.gersched 

To match-merge the data sets by EmpID, the data sets

must be ordered by EmpID.

EmpID FlightNum 

E04064 5105

E00632 5250E01996 5501

EmpID LastName

E00632 STRAUSS

E01483 SCHELL-HAUNGSE01996  WELLHAEUSSER 

E04064  WASCHK 

Business Task

Page 21: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 21/28

21

data work.nextweek;

 merge ia.gercrew work.gersched; by EmpID;

run;

EmpID FlightNum 

E00632 5250

E01996 5501

E04064 5105

EmpID LastName

E00632 STRAUSS

E01483 SCHELL-HAUNGS

E01996  WELLHAEUSSER 

E04064  WASCHK 

work.

gersched 

ia.

gercrew

EmpID LastName FlightNum 

E00632 STRAUSS 5250

E01483 SCHELL-HAUNGS

E01996  WELLHAEUSSER  5501

E04064  WASCHK  5105

work.nextweek

...

Merging SAS Data: Execution

Page 22: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 22/28

22

Exclude from the data set crew members who are not

scheduled to fly next week.

ia.gercrew work.gersched EmpID LastName

E00632 STRAUSS

E01483 SCHELL-HAUNGS

E01996  WELLHAEUSSER 

E04064  WASCHK 

EmpID FlightNum 

E00632 5250

E01996 5501

E04064 5105

Eliminating Nonmatches

Page 23: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 23/28

23

General form of the IN= data set option:

Use the IN= data set option to determine which data

set(s) contributed to the current observation.

Variable is a temporary numeric variable that hastwo possible values:

SAS-data-set (IN=variable )

0 indicates that the data set did not contribute to thecurrent observation.

1 indicates that the data set did contribute to the currentobservation.

The IN= Data Set Option

Page 24: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 24/28

24

data work.combine;

 merge ia.gercrew(in=InCrew)

work.gersched(in=InSched); by EmpID;

run;

EmpID LastName

E00632 STRAUSS

E01483 SCHELL-HAUNGSE01996  WELLHAEUSSER 

E04064  WASCHK 

PDV

EmpID FlightNum 

E00632 5250

E01996 5501E04064 5105

work.gersched 

ia.gercrew

EmpID LastName FlightNum InCrew InSched 

E00632 STRAUSS 5250 1 1

DD

...

The IN= Data Set Option

Page 25: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 25/28

25

Other MergesIn addition to one-to-one merges, the DATA step merge

works with many other kinds of data combinations:one-to-many unique BY values are in one

data set and duplicatematching BY values are in the

other data set.

many-to-many duplicate matching BY valuesare in both data sets.

Page 26: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 26/28

26

work.two

X Z1 A1

1 A2

2 B1

3 C13 C2

work.one

X Y Z

1 A   A1

1 A   A2

2 B B1

3 C C1

3 C C2

work.threedata work.three;

 merge work.one work.two;

 by X;run;

...

X Y1 A 

2 B

3 C

One-to-Many Merging

Page 27: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 27/28

27

work.three

X Y

1 A1

1 A2

2 B1

2 B2

X Z

1 AA1

1 AA2

1 AA3

2 BB1

2 BB2

data work.three; merge work.one work.two;

 by X;run;

X Y Z

1 A1  AA11 A2  AA2

1 A2  AA3

2 B1 BB1

2 B2 BB2

...

Many-to-Many Mergingwork.twowork.one

Page 28: Lec5 Student

8/6/2019 Lec5 Student

http://slidepdf.com/reader/full/lec5-student 28/28

28

Exercises

1. Concatenating SAS Data Sets2. Merging SAS Data Sets

3. Identifying Data Set Contributors