lecture 4 ways to get data into sas some practice programming review of statistical concepts

25
Lecture 4 • Ways to get data into SAS • Some practice programming • Review of statistical concepts

Upload: christopher-garrison

Post on 21-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Lecture 4

• Ways to get data into SAS

• Some practice programming

• Review of statistical concepts

Page 2: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Getting data into SAS

• DATALINES statement– Data is contained within a data step

• INFILE statement– Data contained in separate file

• PROC IMPORT– Data contained in separate file

Page 3: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

* List Directed Input: Reading data values separated by spaces.;

DATA bp; INFILE DATALINES; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C 84 138 93 143D 89 150 91 140A 78 116 100 162A . . 86 155C 81 145 86 140;RUN ;TITLE 'Data Separated by Spaces';PROC PRINT DATA=bp;RUN;

Obs clinic dbp6 sbp6 dbpbl sbpbl

1 C 84 138 93 143 2 D 89 150 91 140 3 A 78 116 100 162 4 A . . 86 155 5 C 81 145 86 140

Page 4: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

* List Directed Input: Reading data values separated by commas;

DATA bp; INFILE DATALINES DLM = ',' ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C,84,138,93,143D,89,150,91,140A,78,116,100,162A,.,.,86,155C,81,145,86,140;RUN ;TITLE 'Data separated by a comma';PROC PRINT DATA=bp;RUN;

Page 5: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

* List Directed Input: Reading data values from a .csv type file;

DATA bp; INFILE DATALINES DLM = ',' DSD ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C,84,138,93,143D,89,150,91,140A,78,116,100,162A,,,86,155C,81,145,86,140;TITLE 'Reading in Data using the DSD Option';PROC PRINT DATA=bp;RUN;

Page 6: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

* List Directed Input: Reading data values separated by tabs (.txt files);

DATA bp; INFILE DATALINES DLM = '09'x DSD; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C 84 138 93 143D 89 150 91 140A 78 116 100 162A 86 155C 81 145 86 140;TITLE 'Reading in Data separated by a tab';PROC PRINT DATA=bp;RUN;

Page 7: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

* Reading data from an external file

DATA bp; INFILE '/home/ph5415/data/bp.csv' DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ;TITLE 'Reading in Data from an External File';PROC PRINT DATA=bp;

clinic,dbp6,sbp6,dbpbl,sbpblC,84,138,93,143D,89,150,91,140A,78,116,100,162A,,,86,155C,81,145,86,140

Content of bp.csv

Page 8: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

*Using PROC IMPORT to read in data ;

PROC IMPORT DATAFILE='/home/ph5415/data/bp.csv' OUT = bp

DBMS = csv REPLACE ; GETNAMES = yes;

TITLE 'Reading in Data Using PROC IMPORT';

PROC PRINT DATA=bp;PROC CONTENTS DATA=bp;

Page 9: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

The CONTENTS Procedure

Data Set Name: WORK.BP Observations: 5 Member Type: DATA Variables: 5 Engine: V8 Indexes: 0 Created: 18:15 Tuesday, January 25, 2005 Observation Length: 40 Last Modified: 18:15 Tuesday, January 25, 2005 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label:

-----Alphabetic List of Variables and Attributes-----

# Variable Type Len Posƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ1 clinic Char 8 322 dbp6 Num 8 04 dbpbl Num 8 163 sbp6 Num 8 85 sbpbl Num 8 24

Page 10: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Some Definitions

• Statistics: The art and science of collecting, analyzing, presenting, and interpreting numerical data.

• Data: facts and figures that are analyzed• Dataset: All the data collected for a study• Elements: Units in which data is collected

– People, companies, schools, households• Variables: Characteristics measured on elements

– People (height, weight)– Company (number of employees)– Schools (percentage of students who graduate in 5 years)– Households (number of computers owned)

Page 11: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Informal Definition

• Statistics:

In a scientific way gain information about something you do not know

Page 12: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Start With Research Question

• What is the proportion of persons without health insurance in Minnesota?

• Do newer BP medications prevent heart disease compared to older medications?

• What is the relationship between grade point average and SAT scores

• Do persons who eat more F&V have lower risk of developing colon cancer.

• Does the program DARE reduce the risk of young persons trying drugs?

Page 13: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Statistics

Start WithQuestion

Start WithQuestion

Design Study And

Collect Data

Compute SummaryCompute SummaryData to AssessData to Assess

Question.Question.

Compute SummaryCompute SummaryData to AssessData to Assess

Question.Question.

Make Conclusions(Inference)

Make Conclusions(Inference)

Page 14: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Statistical Inference

• Estimation (Chapter 4)

• Hypothesis Testing (Chapter 5)– Comparing population proportions (Chap 6)– Comparing population means (Chap 7)

Page 15: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Common Parameters to Estimate

Parameter Parameter Description

Mean of population

Proportion with a certain trait

Correlation between 2 variables

Difference between 2 means

Difference between 2 proportions

Population standard deviation

Page 16: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Statistical Inference

Population with mean

= ?

Population with mean

= ?

A simple random sampleof n elements is selected

from the population..

The sample data provide a value for

the sample mean . .

The sample data provide a value for

the sample mean . .xx

The value of is used tomake inferences about

the value of .

The value of is used tomake inferences about

the value of .

xx

Page 17: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Sampling

• Sample: a subset of target population

(usually a simple random sample - each sample has equal probability of occurring)

• Different samples yield different estimates

• Trying to understand the population parameter (the “true value”)– It’s usually not possible to measure the population value

Page 18: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Point Estimate

Parameter Point Estimate

Sample mean

Sample proportion

Sample correlation

Difference between 2 sample means

Difference between 2 sample proportions

Sample standard deviation

Page 19: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Interval Estimation

In general, confidence intervals are of the form:

SEestimate 96.1

SE = standard error of your estimate

Estimate = mean, proportion, regression coefficient, odds ratio...

1.96 = for 95% CI based on normal distribution

Page 20: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Estimation“What is the average total cholesterol level for MN

residents?”

Random sample of cholesterol levels

sample mean = sum of values / number of observations

Xn

XX

Estimates the population mean:

Page 21: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Estimation

“What is the average total cholesterol level for MN residents?”

sample standard deviation:

sestimates the

population standard deviation:

1

)( 2

n

XXs

Page 22: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Confidence Interval Example

Suppose sample of 100

mean = 215 mg/dL, standard deviation = 20

95% CI = nsX /96.1

= (215 - 1.96*20/10, 215 + 1.96*20/10) approximately = (211, 219)

ns / = standard error of mean

Page 23: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Properties of Confidence Intervals

• As sample size increases, CI gets smaller– If you could sample the whole population;

• Can use different levels of confidence – 90, 95, 99% common– More confidence means larger interval; so a 90% CI is smaller than a 99% CI

• Changes with population standard deviation– More variable population means larger interval

X

Page 24: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Caution with Confidence Intervals

– Data should be from random sample

– More complicated sampling requires different methods• Example - multistage or stratified sampling

– Outliers can cause problems

– Non-normal data can change confidence level• Skewed data a big problem

– Bias not accounted for• Non-responders

• Target and sampled population different

Page 25: Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

95% Confidence Intervals with SAS

1) Construct from output

estimate +/- 1.96*SE

2) Provided automatically by some procedures

PROC MEANS DATA = STUDENTS LCLM UCLM;

VAR AGE;