14b. accessing data files in sas ®

35
14b. Accessing Data Files in SAS ®

Upload: fabian

Post on 23-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

14b. Accessing Data Files in SAS ®. Prerequisites. Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training Modules 2. NLTS2 Study Overview 3. NLTS2 Study Design and Sampling NLTS2 Data Sources, either 4. Parent and Youth Surveys or - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

Page 2: 14b. Accessing Data Files in SAS ®

2

14b. Accessing Data Files in SAS®

Prerequisites• Recommended modules to complete before viewing

this module 1. Introduction to the NLTS2 Training Modules 2. NLTS2 Study Overview 3. NLTS2 Study Design and Sampling NLTS2 Data Sources, either

• 4. Parent and Youth Surveys or• 5. School Surveys, Student Assessments, and Transcripts

NLTS2 Documentation• 10. Overview• 11. Data Dictionaries• 12. Quick References

Page 3: 14b. Accessing Data Files in SAS ®

3

14b. Accessing Data Files in SAS®

Overview

•Purpose•Opening and viewing data files•Limiting variables •Subsetting cases•Joining/combining data files•Summary•Closing•Important information

Page 4: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

NLTS2 restricted-use data• NLTS2 data are restricted.• Data used in these presentations are from a

randomly selected subset of the restricted-use NLTS2 data.

• Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.

Page 5: 14b. Accessing Data Files in SAS ®

5

14b. Accessing Data Files in SAS®

Purpose• Learn to

Open a data file See what is in a file (i.e., contents of the file) “Size” a data file for a perfect fit

• Reduce the number of variables• Reduce the number of cases (i.e., subset the data)

Combine information from multiple sources• Bring in data from another source or another wave• Join or combine files

Create a new file

Page 6: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

6

Open and view data files• SAS® and SPSS ® data are in separate folders.

• SAS data files have a “.sas7bdat” extension.

• Associated value formats are stored in a SAS library, “Formats.sas7bcat.”

• SAS programming code is available for re-creating the user-defined formats in the SAS format library.

Hyperlinked from the table of contents on the database and documentation disk.

Page 7: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

7

Open and view data files• Files are either read from or written to.• Files have a name and a location where they are stored.

SAS needs to know the name of the file and where to find it.

• SAS uses a LIBNAME statement to identify the path. The path describes the nesting of folders. C:\myprojects\NLTS2\Data is a path or location.

• i.e., the file is located on Drive “C”, in the folder “Data,” which is nested inside the “myprojects” and “NLTS2” folders.

• An example LIBNAME statement for a path would beLIBNAME sasdb 'C:\myprojects\NLTS2\Data' ;

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 8: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

8

Open and view data files An easy way to view the contents of the file is PROC

CONTENTS. Specify LIBNAME statements for the data files and format

library.LIBNAME [ddname] '[path]' ;LIBNAME library '[path]' ;

SyntaxPROC CONTENTS DATA = [ddname].[file name] ;RUN ;

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 9: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

9

Open and View Data Files: Example

• Viewing files Use the Wave 1

teacher file Run a PROC

CONTENTS.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 10: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

10

Open and view data files: Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 11: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

11

Viewing files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 12: 14b. Accessing Data Files in SAS ®

12

14b. Accessing Data Files in SAS®

Limiting variables• How to reduce the number of variables in the file

Large files with many cases and many variables are unwieldy; simplify.• Fewer variables to search through. • Fewer cases to process.

Create data files that are limited to just those variables needed for analysis.

You have the choice of drop or keep.• Which one is best? The one that requires less typing!• If you are dropping more variables than keeping, use “keep” and

vice versa.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 13: 14b. Accessing Data Files in SAS ®

13

14b. Accessing Data Files in SAS®

Limiting variables• Note: When making changes to your data

Use work files for temporary changes.• Work files are files that are in existence only for the duration of

the program or SAS interactive session.• Work files are temporary files unless they are saved.• Work files have a one-level name—no ddname needed

To save modified data permanently, create a new permanent data file.• Usually it is best to create a new file rather than to modify the

source file.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 14: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

14

Limiting variables• Syntax

DATA Par_w1_lmt_vars (DROP= np1E2a np1GroupMember);

SET sasdb.n2w1parent(KEEP= ID w1_DisHdr2001 w1_GendHdr2001w1_IncomeHdr2001 w1_AgeHdr2001 np1Weight np1HealthProb np1GroupMember np1ProblemCount np1E2a np1B2a) ;

run ;• A “KEEP” or “DROP” option on a “SET” or “MERGE” statement controls

which variables come into the data set.• A “KEEP” or “DROP” option on a “DATA” statement controls which

variables are saved on the output data set. • Notice that SAS statements end in a semicolon.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 15: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

15

Limiting Variables: Example• Limiting variables

Create a file with fewer variables. Create a new file called “PrScores” from n2w2dirassess. Keep only the following variables:

• ID• ndacalc_pr• ndaPC_PR• ndasyn_pr• NDaF1_friend• na_age4

Save the new file Review the new file with a PROC CONTENTS

• w2_dis12• w2_gend2• na_grade4• w2_incm3• wt_na

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 16: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

16

Limiting variables: Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 17: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

17

Subsetting cases• How to reduce the number of cases in a data step or

in a procedure• Often analysis is done on a subset; for example:

• Select only youth with visual impairment.• Select only youth who are out of secondary school.• Exclude younger students.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 18: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

18

Subsetting cases• Example: Limit Wave 4 parent/youth interview data to those

who are 21 or older, excluding youth who are 19 or 20 (W4_Age2007 = 19 or 20).

• Syntax to limit cases in a file

DATA AgeGT20 ; SET sasdb.n2w4paryouth (WHERE=(W4_Age2007>20));run; or

DATA AgeGT20 ; SET sasdb.n2w4paryouth ;

IF W4_Age2007>20; run ;

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 19: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

19

Subsetting cases• Syntax to limit cases in a procedure

No change to file with this syntax

PROC FREQ data=sasdb.n2w4paryouth ; WHERE W4_Age2007>20 ; TABLES W4_Age2007 ;run ;

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 20: 14b. Accessing Data Files in SAS ®

20

14b. Accessing Data Files in SAS®

Subsetting cases: Example

• Subsetting cases Create a small data set with a subset of cases. Use “PrScores” created in previous example. Limit cases to those classified with hearing.

impairment only, i.e., those with a value of “5” for “w2_dis12”.

Look at notes in the log window.• Are there any clues that the file has changed?

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 21: 14b. Accessing Data Files in SAS ®

21

14b. Accessing Data Files in SAS®

Joining/combining data files• How to bring in data from another file• Purpose

Learn to combine or join files• Bring in data from another source• Bring in data from another wave

Learn what to watch for• Number of cases in the combined file• How cases are joined

– Key variable, i.e., which variable to match on– Keyed file, i.e., which cases to keep

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 22: 14b. Accessing Data Files in SAS ®

22

14b. Accessing Data Files in SAS®

Joining/combining data files• Why do this?

Often it is necessary to combine information from different files to perform comparative analyses, create new variables, or measure differences over time.

• For example, you may want to Create composite variables from multiple sources. Look at similar items at different points in time.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 23: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

23

Joining/combining data files• Example of composite variables from multiple sources.

Create a variable for “if parent attended a parent/teacher conference” using Wave 2 teacher survey item nts2C8, and fill in with parent interview item np2E1a_d if teacher data are missing.

• Example of items at different points in time. Create a variable to look at the pattern of employment between

waves 2 and 3: employed both waves, either wave, or neither wave.• Set to “employed both waves” if np2HasPdJob (W2) and

np3HasJob (W3) are “yes.”• Else set to “employed in either wave” if np2HasPdJob or

np3HasJob are “yes.”• Else set to “not employed” if np2HasPdJob and np3HasJob are

“no.”

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 24: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

24

Joining/combining data filesHypothetical Example: Data Availability Across Instruments

YouthInterview Data W1

Assessment Data W2

Interview Data W2

Program Data W2

1 Yes Yes Yes No2 Yes No Yes No3 No No No Yes4 Yes No No Yes5 Yes No Yes No6 Yes Yes Yes Yes

There will be missing records across files and missing items within files.If data look like this, ask which file is the main file being analyzed.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 25: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

25

Joining/combining data files• Data in all files must be sorted by the key variable.

The key variable matches files case by case. Key variable is “ID.”

• Files on CD should be sorted by key variable, but as you work with files they may become unsorted.

• Syntax to sort dataPROC SORT data=sasdb.PrScores ; by ID ;run ;

• Note for those who do not want to destroy data If you use a KEEP or DROP statement, do a temporary sort, or otherwise

change the data, save to a new file.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 26: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

26

Joining/combining data files

• Syntax to join filesDATA Emplmt ;MERGE sasdb.n2w1parent (KEEP=id np1i_3a_7) sasdb.n2w2paryouth (KEEP=id np2HasPdJob) sasdb.n2w3paryouth (KEEP=id np3HasJob) sasdb.n2w4paryouth (KEEP=id np4HasJob) ;BY ID ;run ;

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 27: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

27

Joining/combining data files

SAS log

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 28: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

28

Joining/combining data files• Syntax to join files keeping only those who

have Wave 4 dataDATA Emplmt ;MERGE sasdb.n2w1parent (KEEP=id np1i_3a_7) sasdb.n2w2paryouth (KEEP=id np2HasPdJob) sasdb.n2w3paryouth (KEEP=id np3HasJob) sasdb.n2w4paryouth (KEEP=id np4HasJob in=inW4) ;BY ID ; IF inW4 ;

run ;

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 29: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

29

Joining/combining data files

SAS log

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 30: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

30

Joining/combining data files• Suggestion: Name files in a meaningful way, such as

By date: AnFile_29July By type of Analysis: PI_CrossWave By source: PI_W123 By sequence: File_5

• Be sure to use a two-level data set name (using a ddname) to save the file; otherwise, it is a work or temporary file.

DATA Emplmt ; /* temporary work file */DATA sasdb.Emplmt; /* saved file */

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 31: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

31

Joining/combining data files:Example• Joining/combining data

Combine data from another file with an existing file.

Sort PrScores by ID. Bring in np2HasPdJob from n2w2paryouth. Bring in np3HasJob from n2w3paryouth. Save the file as PrScoresEmp.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 32: 14b. Accessing Data Files in SAS ®

14b. Accessing Data Files in SAS®

32

Joining/combining data files:Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 33: 14b. Accessing Data Files in SAS ®

33

14b. Accessing Data Files in SAS®

Summary• Congratulations! You have learned to

Open and view a file Create a new file Reduce the size of files by specifying the

• Variables needed• Cases needed

Join files using a key variable Save files with a new name

Page 34: 14b. Accessing Data Files in SAS ®

34

14b. Accessing Data Files in SAS®

Closing• Topics discussed in this module

Purpose Opening and viewing data files Limiting variables Subsetting cases Joining/combining data files

• Next module 15b. Accessing Data: Frequencies in SAS

Page 35: 14b. Accessing Data Files in SAS ®

35

14b. Accessing Data Files in SAS®

Important information NLTS2 website contains reports, data tables, and other

project-related information http://nlts2.org/

Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/

General information about restricted data licenses can be found on the NCES website http://nces.ed.gov/statprog/instruct.asp

E-mail address: [email protected]