analysis using sas

40
SAS INDIVIDUAL PROJECT AHMAD HILMI BIN AZMAN 2010704243 Meister [email protected]

Upload: meisterh

Post on 13-Apr-2015

38 views

Category:

Documents


7 download

DESCRIPTION

using SAS

TRANSCRIPT

Page 1: Analysis using SAS

SAS INDIVIDUAL PROJECT AHMAD HILMI BIN AZMAN

2010704243

Meister [email protected]

Page 2: Analysis using SAS

1

UNIVERSITI TEKNOLOGI MARA (UiTM)

INDIVIDUAL PROJECT

SAS PROGRAMMING

BY

AHMAD HILMI BIN AZMAN

PREPARED FOR

MADAM WAN FAIROS BT WAN YAACOB

26 DECEMBER 2012

Page 3: Analysis using SAS

2

Table of Contents 1.0 TASK 1 ............................................................................................................................................... 3

1.1 Introduction .................................................................................................................................. 3

1.2 SAS Macro Programming .............................................................................................................. 5

a) Substituting text with %LET .............................................................................................................. 5

b) Creating Modular Code with Macros ................................................................................................ 7

c) Adding Parameters to Macros .......................................................................................................... 8

2.0 TASK 2 ............................................................................................................................................... 9

2.1 Introduction .................................................................................................................................. 9

2.1.1 Source of Dataset ...................................................................................................................... 9

2.1.2 Description of Dataset .............................................................................................................. 9

2.2 Analysis of Variance (ANOVA) ..................................................................................................... 10

2.2.1 Descriptive Analysis ................................................................................................................ 10

2.2.2 Analysis of Variance (ANOVA) ................................................................................................. 12

2.2.3 Report on Analysis of Variance (ANOVA) ................................................................................ 13

2.3 Regression Analysis ..................................................................................................................... 14

2.3.1 Model Adequacy Checking ...................................................................................................... 14

2.3.2 Regression Model ................................................................................................................... 16

2.3.3 Correlation Analysis ................................................................................................................ 18

2.3.4 Multicollinearity Test .............................................................................................................. 19

2.3.5 Stepwise Analysis .................................................................................................................... 20

2.3.6 Lack of Fit Test......................................................................................................................... 24

2.3.7 Report on Regression Analysis ................................................................................................ 26

2.4 Correlation Analysis using Macro ............................................................................................... 27

2.4.1 Report on Correlation Analysis ............................................................................................... 28

3.0 TASK 3 ............................................................................................................................................. 29

3.1 Question 1 ................................................................................................................................... 29

3.2 Question 2 ................................................................................................................................... 30

4.0 References ...................................................................................................................................... 39

Page 4: Analysis using SAS

3

1.0 TASK 1

1.1 Introduction

The data is about Sales of Mini Supermarket. This data is about the study of sales of mini

supermarket of different branch. There are all 16 observation from this data and 6 variables.

The variables are Branch, State, Date, NumWorker, SaleYear and Manager. The variable

description are shown in the table below.

Variable Name Description

Branch List branch area in Malaysia

State Name of state of the branch

Date Date of the branch been open

NumWorker Number of worker for each branch

SaleYear Number of sales per year

Manager Name of manager in charge of the branch

The SAS Command:

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

data indi.task;

infile 'C:\Users\Meister\Documents\SAS\individual

project\data\task1.txt';

input Branch $ 1-16 State $ 17-32 Date 33-48 NumWorker 49-56 SaleYear

57-65 ManagerName $ 66-75;

run;

proc print data=indi.task;

title 'Meister Supermarket Sales';

format SaleYear dollar9. Date date9.;

run;

Page 5: Analysis using SAS

4

The data:

Page 6: Analysis using SAS

5

1.2 SAS Macro Programming

a) Substituting text with %LET

Getting sales from Penang.

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

data indi.task;

infile 'C:\Users\Meister\Documents\SAS\individual

project\data\task1.txt';

input Branch $ 1-16 State $ 17-32 Date 33-48 NumWorker 49-56 SaleYear

57-65 ManagerName $ 66-75;

run;

%let slecstate = Penang;

proc print data=indi.task noobs;

title "Meister Supermarket Sales in Penang";

where State = "&slecstate";

format SaleYear dollar9. Date date9.;

run;

Page 7: Analysis using SAS

6

Getting State and Branch in charge by Mahmood.

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

data indi.task;

infile 'C:\Users\Meister\Documents\SAS\individual

project\data\task1.txt';

input Branch $ 1-16 State $ 17-32 Date 33-48 NumWorker 49-56 SaleYear

57-65 ManagerName $ 66-75;

run;

%let slecnum = Mahmood;

proc print data=indi.task noobs;

title "Meister Supermarket Sales in Penang";

where ManagerName = "&slecnum";

var managername branch state;

format SaleYear dollar9. Date date9.;

run;

Page 8: Analysis using SAS

7

b) Creating Modular Code with Macros

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

data indi.task;

infile 'C:\Users\Meister\Documents\SAS\individual

project\data\task1.txt';

input Branch $ 1-16 State $ 17-32 Date 33-48 NumWorker 49-56 SaleYear

57-65 ManagerName $ 66-75;

run;

%MACRO meister ;

proc sort data=indi.task;

by descending date;

proc print data=indi.task;

var Branch State ManagerName Date;

title 'Meister Supermarket by Branch Date Open';

format SaleYear dollar9. Date year4.;

%MEND sample;

%meister

run;

Page 9: Analysis using SAS

8

c) Adding Parameters to Macros

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

data indi.task;

infile 'C:\Users\Meister\Documents\SAS\individual

project\data\task1.txt';

input Branch $ 1-16 State $ 17-32 Date 33-48 NumWorker 49-56 SaleYear

57-65 ManagerName $ 66-75;

run;

%MACRO select(State=,sortvar=);

proc sort data = indi.task out = indi.task1;

by &sortvar;

where State="&state";

proc print data=indi.task1;

var Branch State Date;

format SaleYear dollar9. Date year4.;

TITLE1 "States by Date Order";

%MEND select;

%select (State=Penang, sortvar=date)

%select (State=Selangor, sortvar=date);

run;

Page 10: Analysis using SAS

9

2.0 TASK 2

2.1 Introduction

2.1.1 Source of Dataset

We are taking our data from database from New York University Stern (NYU Stern). The title

of the data is Movie Buzz Data

2.1.2 Description of Dataset

Our data is all about movie. We were using movie sales data. Where this data have 62

observations which are from different movies. This dataset contain 12 variables. The data

have 6 quantitative variables and 6 qualitative variables.

The 6 quantitative variables are Budget, StarPower, Addict, ComingSoon,

Fandango and CantWait. Variable Budget mean production budget in million dollar.

StarPower stand for star power measure by index of star poser. Variable Addict is the

variable that count people view watching trailer at traileraddict.com. ComingSoon is the

variable that count the comment on message board at comingsoon.net. The variable

Fandago is variable that count people attention on the movie in fandango.com. The last

variable CantWait stand for can’t wait to see vote in fandango.com. Here are summarize of

the variable description.

Variables Descriptions

Budget Production budget ($ million)

StarPower Index of star poser

Addict Trailer views at traileraddict.com

ComingSoon Number of people comment on message board at comingsoon.net

Fandago Number of people read review at fandango.com

CantWait Percentage of Fandago votes that can’t wait to see

The 6 qualitative variables are MPRating, Sequel, Action, Comedy, Animated and Horror.

Variable MPRating mean MPAA rating code. Sequel stand for sequel movie. Variable Action

is the variable for action movie. Comedy is the variable for comedy movie. The variable

Animated is the variable for animated movie. The last variable Horror is the variable for

horror movie.

Page 11: Analysis using SAS

10

Variables Descriptions

MPRating MPAA Rating where code 1=G (general) 2=PG (parental guide) 3=PG13 (parental guide and may not appropriate under 13 years old) 4=R (Restricted)

Sequel Sequel movie where code 1=sequel 2=not sequel

Action Action movie where code 1=action film 2=not action film

Comedy Comedy movie where code 1=comedy film 2=not comedy film

Animated Animated movie where code 1=animated film 2=not animated film

Horror Horror movie where code 1=horror film 2=not horror film

2.2 Analysis of Variance (ANOVA)

The analysis of variance (ANOVA) been done towards sales and types of movie. The type of

movie consist of variable comedy animated horror and action.

2.2.1 Descriptive Analysis

The coding:

libname sas 'C:\Users\Meister\Documents\SAS\project';

data sas.anova;

set work.project;

length type $ 10;

if action=1 then type='action';

else if comedy=1 then type='comedy';

else if animated=1 then type='animated';

else if horror=1 then type='horror';

else if type=1 then delete;

keep type sales;

run;

proc univariate data = sas.anova normal plot;

class type;

run;

1. Normality analysis for variable horror

Page 12: Analysis using SAS

11

Based on Kolmogorov-Smirnov analysis, the p-value is 0.15 where the value is greater

than alpha 0.05. We can conclude that the variable horror is normal.

2. Normality analysis for variable comedy

Based on Kolmogorov-Smirnov analysis, the p-value is 0.15 where the value is greater

than alpha 0.05. We can conclude that the variable comedy is normal.

3. Normality analysis for variable animated

Based on Kolmogorov-Smirnov analysis, the p-value is 0.15 where the value is greater

than alpha 0.05. We can conclude that the variable animated is normal.

4. Normality analysis for variable action

Based on Kolmogorov-Smirnov analysis, the p-value is 0.0851 where the value is greater

than alpha 0.05. We can conclude that the variable action is normal.

Page 13: Analysis using SAS

12

2.2.2 Analysis of Variance (ANOVA)

The coding:

libname sas 'C:\Users\Meister\Documents\SAS\project';

data sas.anova;

set work.project;

length type $ 10;

if action=1 then type='action';

else if comedy=1 then type='comedy';

else if animated=1 then type='animated';

else if horror=1 then type='horror';

else if type=1 then delete;

keep type sales;

run;

proc anova data=sas.anova;

class type;

model sales=type;

means type / hovtest=bf;

run;

Based on the analysis of variance, the p-value is 0.5102 where this value is greater than

alpha 0.05. we can conclude that the sales does not have relationship with the types of

movie.

Page 14: Analysis using SAS

13

2.2.3 Report on Analysis of Variance (ANOVA)

The analysis of variance (ANOVA) been done towards sales and types of movie.

The type of movie consist of 4 variables which the variable are comedy, animated, horror

and action. All the 4 variables must first be analyze for normality. Therefore descriptive

analysis been done. The analysis been done each of the variables.

Based on Kolmogorov-Smirnov analysis for variable Horror, the p-value is 0.15

where the value is greater than alpha 0.05. Based on this result, we can conclude that the

variable horror is normal. For variable Comedy, the Kolmogorov-Smirnov analysis show that

the p-value is 0.1. Where this value is greater than alpha 0.05. This mean that we can

conclude that the variable comedy is normal. Then for variable animated, based on

Kolmogorov-Smirnov analysis, the p-value is 0.15 where the value is greater than alpha 0.05.

By this we can conclude that the variable animated is normal. Next is analysis for variable

action. Based on Kolmogorov-Smirnov analysis, the p-value is 0.0851 where the value is

greater than alpha 0.05. Therefore we can conclude that the variable action is normal.

After all variable been confirm normal, we do the analysis of variance toward the

sales of the movie. Based on the analysis of variance, the p-value is 0.5102 where this value

is greater than alpha 0.05. By this we can conclude that the sales does not have relationship

with the types of movie.

Page 15: Analysis using SAS

14

2.3 Regression Analysis

2.3.1 Model Adequacy Checking

The coding for p-p plot original data:

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

proc import out = indi.project

datafile = "C:\Users\Meister\Documents\SAS\individual

project\movie.xlsx"

dbms = xlsx replace;

run;

proc capability data = indi.project normal;

var sales;

qqplot sales/ normal;

ppplot sales/ normal;

histogram / normal;

inset mean std;

run;

The coding for p-p plot transform data:

data indi.trans;

set indi.project;

transform = ln(sales);

run;

proc capability data = indi.trans normal;

var transform;

qqplot transform / normal;

ppplot transform / normal;

histogram / normal;

inset mean std;

run;

Page 16: Analysis using SAS

15

P-value of Kolmogorov-Smirnov less than alpha. We can conclude that the distribution is not normal. Transformation needed

P-value of Kolmogorov-Smirnov more than alpha. We can conclude that the distribution is normal. Transformation succeed.

Page 17: Analysis using SAS

16

2.3.2 Regression Model

The coding for regression model of the transform data:

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

proc import out = indi.project

datafile = "C:\Users\Meister\Documents\SAS\individual

project\movie.xlsx"

dbms = xlsx replace;

run;

data indi.trans;

set indi.project;

transform = ln(sales);

merge indi.trans,indi.project;

run;

proc reg data=indi.trans ;

model transform = MPRATING BUDGET STARPOWR SEQUEL ACTION COMEDY

ANIMATED

HORROR ADDICT CMNGSOON FANDANGO CNTWAIT;

plot residual.*cases.;

run;

The model is significant since the p-value = 0.0001 less than alpha 0.05

Page 18: Analysis using SAS

17

Regression model

Sales = 15.25374 – 0.21222MPRating + 0.00515Budget – 0.00471StarPowr + 0.39390Sequel –

0.74909Action – 0.00164Comedy – 0.82118Animated + 0.43770Horror + 0.00002216Addict –

0.00013618CmngSoon + 0.00020521Fandango + 3.29154CntWait

Page 19: Analysis using SAS

18

2.3.3 Correlation Analysis

The coding for correlation:

proc corr data=indi.trans;

var transform BUDGET STARPOWR ADDICT CMNGSOON FANDANGO CNTWAIT ;

run;

The correlation matrix above shows the coefficient of Pearson correlation between

quantitative variables in the data set. There is 5 predictor variables that are significant correlated

with the dependent variable (Sales). That is BUDGET, ADDICT, CMGSOON, FANDAGO and

CNTWAIT with correlation 0.45708, 0.43750, 0.0133, 0.37974 and 0.65501 respectively. Some of

the predictor variables also shows a correlation exist among them. BUDGET and STARPOWR have

a significant correlated and this indicates that a movie with high STARPOWR tend to increase the

budget to make the movie.

Page 20: Analysis using SAS

19

2.3.4 Multicollinearity Test

The coding for correlation:

proc reg data=indi.trans ;

model transform = MPRATING BUDGET STARPOWR SEQUEL ACTION COMEDY

ANIMATED

HORROR ADDICT CMNGSOON FANDANGO CNTWAIT / vif tol ;

run;

Variables Interpretation of Collinearity Statistics

Tolerance VIF

MPAA Rating No multicollinearity since tolerance value 0.602 more than 0.2

No multicollinearity since VIF value 1.660 less than 10

Budget No multicollinearity since tolerance value 0.401 more than 0.2

No multicollinearity since VIF value 2.495 less than 10

StarPower No multicollinearity since tolerance value 0.547 more than 0.2

No multicollinearity since VIF value 1.827 less than 10

Sequel No multicollinearity since tolerance value 0.544 more than 0.2

No multicollinearity since VIF value 1.837 less than 10

Action No multicollinearity since tolerance value 0.498 more than 0.2

No multicollinearity since VIF value 2.009 less than 10

Comedy No multicollinearity since tolerance value 0.515 more than 0.2

No multicollinearity since VIF value 1.940 less than 10

Animated No multicollinearity since tolerance value 0.523 more than 0.2

No multicollinearity since VIF value 1.912 less than 10

Page 21: Analysis using SAS

20

Horor No multicollinearity since tolerance value 0.617 more than 0.2

No multicollinearity since VIF value 1.621 less than 10

Addict No multicollinearity since tolerance value 0.465 more than 0.2

No multicollinearity since VIF value 2.150 less than 10

Coming Soon No multicollinearity since tolerance value 0.376 more than 0.2

No multicollinearity since VIF value 2.660 less than 10

Fandago No multicollinearity since tolerance value 0.570 more than 0.2

No multicollinearity since VIF value 1.754 less than 10

CantWait No multicollinearity since tolerance value 0.440 more than 0.2

No multicollinearity since VIF value 2.273 less than 10

2.3.5 Stepwise Analysis

The coding for stepwise:

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

proc import out = indi.project

datafile = "C:\Users\Meister\Documents\SAS\individual

project\movie.xlsx"

dbms = xlsx replace;

run;

data indi.trans;

set indi.project;

transform = ln(sales);

merge indi.trans,indi.project;

run;

proc reg data=indi.trans ;

model transform = MPRATING BUDGET STARPOWR SEQUEL ACTION COMEDY

ANIMATED

HORROR ADDICT CMNGSOON FANDANGO CNTWAIT / selection=stepwise ;

run;

Page 22: Analysis using SAS

21

1. Steps of stepwise

Step 1 Step 2

1. Variable CNTWAIT have the higher correlation value. Which is 0.4290.

2. Thus the variable is selected as the first variable to be enter to the model.

1. Variable ACTION have the higher correlation value. Which is 0.4856.

2. Thus the variable is selected to be enter to the model.

3. The two variable tested and no variable are deleted.

Page 23: Analysis using SAS

22

Step 3 Step 4

1. Variable ADDICT have the higher correlation value. Which is 0.5217.

2. Thus the variable is selected to be enter to the model.

3. The three variable tested and no variable are deleted.

1. Variable SEQUEL have the higher correlation value. Which is 0.5420.

2. Thus the variable is selected to be enter to the model.

3. The four variable tested and no variable are deleted.

Page 24: Analysis using SAS

23

Summary of the stepwise selection:

Based on 12 variable 4 are selected for the final model. All the selected variables are

CNTWAIT, ADDICT, SEQUEL and ACTION. Therefore the final model is

y = 14.56468 + 0.41590SEQUEL – 0.69464ACTION + 0.00002895ADDICT – 3.81397CNTWAIT

Here the test of the model

Hypothesis

H0: 𝛽1=𝛽2=𝛽3=𝛽4 =0

H1: at least one 𝛽i is not equal zero

Significant value

α=0.05

Test statistic

P value=0.0001

Decision

Since P-value=0.0001 < α=0.05 to reject H0.

Conclusion

The model is significant.

Page 25: Analysis using SAS

24

2.3.6 Lack of Fit Test

The coding for lack of fit:

Full model:

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

proc import out = indi.project

datafile = "C:\Users\Meister\Documents\SAS\individual

project\movie.xlsx"

dbms = xlsx replace;

run;

data indi.trans;

set indi.project;

transform = ln(sales);

merge indi.trans,indi.project;

run;

proc reg data=indi.trans ;

model transform = MPRATING BUDGET STARPOWR SEQUEL ACTION COMEDY

ANIMATED

HORROR ADDICT CMNGSOON FANDANGO CNTWAIT / lackfit ;

run;

Full reduced model:

proc reg data=indi.trans ;

model transform = SEQUEL ACTION ADDICT CNTWAIT / lackfit ;

run;

Page 26: Analysis using SAS

25

Full model Reduced model (after stepwise)

Hypothesis H0: There is no lack of fit H1: There is lack of fit Significant value α=0.05 Test statistic P value=0.0001 Decision Since P-value=0.0001 < α=0.05 to reject H0.

Conclusion The model have lack of fit.

Hypothesis H0: There is no lack of fit H1: There is lack of fit Significant value α=0.05 Test statistic P value=0.0001 Decision Since P-value=0.0001 < α=0.05 to reject H0.

Conclusion The model have lack of fit.

Page 27: Analysis using SAS

26

2.3.7 Report on Regression Analysis

To do regression model, first we plot the p-p plot to see either the data is normal

or not. Based on descriptive analysis the p-value of Kolmogorov-Smirnov less than alpha.

We can conclude that the distribution is not normal. Then transformation needed to be

done. After transformation been done using natural log transformation. The Kolmogorov-

Smirnov value is 0.15 which is more than alpha. We can conclude that the distribution is

normal. Therefore the transformation is succeed.

Then we run the data to get the regression model. The regression model is

Sales = 15.25374 – 0.21222MPRating + 0.00515Budget – 0.00471StarPowr +

0.39390Sequel – 0.74909Action – 0.00164Comedy – 0.82118Animated + 0.43770Horror

+ 0.00002216Addict – 0.00013618CmngSoon + 0.00020521Fandango + 3.29154CntWait.

For correlation matrix shows the coefficient of Pearson correlation between

quantitative variables in the data set. There is 5 predictor variables that are significant

correlated with the dependent variable (Sales). That is BUDGET, ADDICT, CMGSOON,

FANDAGO and CNTWAIT with correlation 0.45708, 0.43750, 0.0133, 0.37974 and 0.65501

respectively. Some of the predictor variables also shows a correlation exist among them.

BUDGET and STARPOWR have a significant correlated and this indicates that a movie with

high STARPOWR tend to increase the budget to make the movie.

Multicollinearity analysis been done and based on the result we can conclude that

there is multicollinearity exist. This is because since all the tolerance value more than 0.2

and VIF value is less than 10.

Then stepwise analysis been done. Based on 12 variable 4 are selected for the final

model. All the selected variables are CNTWAIT, ADDICT, SEQUEL and ACTION. Therefore

the final model is

y = 14.56468 + 0.41590SEQUEL – 0.69464ACTION + 0.00002895ADDICT –

3.81397CNTWAIT

For lack of fit analysis, we can conclude that the full model and reduced model is

not fit. This is because the p-value 0.0001 less than alpha.

Page 28: Analysis using SAS

27

2.4 Correlation Analysis using Macro

The coding for correlation:

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

proc import out = indi.project

datafile = "C:\Users\Meister\Documents\SAS\individual

project\movie.xlsx"

dbms = xlsx replace;

run;

data indi.trans;

set indi.project;

transform = ln(sales);

merge indi.trans,indi.project;

run;

%MACRO correlation ;

proc corr data=indi.trans;

var transform BUDGET STARPOWR ADDICT CMNGSOON FANDANGO CNTWAIT;

%MEND sample;

%correlation

run;

Page 29: Analysis using SAS

28

2.4.1 Report on Correlation Analysis

The correlation matrix above shows the coefficient of Pearson correlation

between quantitative variables in the data set. There is 5 predictor variables that are

significant correlated with the dependent variable (Sales). That is BUDGET, ADDICT,

CMGSOON, FANDAGO and CNTWAIT with correlation 0.45708, 0.43750, 0.0133, 0.37974

and 0.65501 respectively. Some of the predictor variables also shows a correlation exist

among them. BUDGET and STARPOWR have a significant correlated and this indicates that

a movie with high STARPOWR tend to increase the budget to make the movie.

Page 30: Analysis using SAS

29

3.0 TASK 3

3.1 Question 1

Coding

libname indi 'C:\Users\Meister\Documents\SAS\individual project';

proc import out = indi.project

datafile = "C:\Users\Meister\Documents\SAS\individual

project\data\country.txt"

dbms = tab replace;

run;

proc print data = indi.project;

run;

Partial output

Page 31: Analysis using SAS

30

3.2 Question 2

a) Total population of the world

SAS Command:

proc sql;

select sum(population)

as TotalPopulation

from indi.project;

quit;

Result:

b) List all region

SAS Command:

proc sql;

title 'List of All Region';

select distinct(region)

from indi.project

group by region;

quit;

Result:

Page 32: Analysis using SAS

31

c) Total GDP of Cambodia

SAS Command:

proc sql;

title 'Total GDP of Cambodia';

select name , sum(gdp) as total

from indi.project

where name='Cambodia';

quit;

Result:

d) Number of countries with area greater than 1000000

SAS Command:

proc sql;

title 'Number of Country with Area more than 1000000';

select count(name) as NumberOfCountry

from indi.project

where area ge 1000000;

quit;

Result:

e) Total population of France, Greece and Spain ????

SAS Command:

proc sql;

title Total Population France Greece and Spain;

select sum(population) as total label='Total Population'

from indi.project

where name in ('France', 'Greece', 'Spain');

quit;

Result:

Page 33: Analysis using SAS

32

f) Number of countries by region

SAS Command:

proc sql;

title 'Number of Countries by Region';

select region, count(name) as no label='Number of Countries'

from indi.project

group by region;

quit;

Result:

g) Number of countries by region with population at least 15 million

SAS Command:

proc sql;

title 'Number of Countries by Region with Population more than 15

million';

select region, count(name) as no label='Number of Country'

from indi.project

where population ge 15000000

group by region;

quit;

Result:

Page 34: Analysis using SAS

33

h) List of region with population at least 150 million

SAS Command:

proc sql;

title 'List of Region with Population more than 150 million';

select distinct(region)

from indi.project

where population ge 150000000

group by region;

quit;

Result:

i) Total population and GDP of South Asia

SAS Command:

proc sql;

title 'Total Population and GDP of South Asia';

select distinct(region),sum(population) as Pop label='Total

Population', sum(gdp) as ttl label='Total GDP'

from indi.project

where region='South Asia';

quit;

Result:

Page 35: Analysis using SAS

34

j) List of countries per capita with population more than 200 million

SAS Command:

proc sql;

title 'List of Countries per Capita';

title2 'Population more than 200 million';

select name, (gdp/population) as capita

from indi.project

where population ge 200000000;

quit;

Result:

k) List of Country for region middle east in million

SAS Command:

proc sql;

title 'List of Country for Middle East in million';

select name, (population/1000000) as pop label='Population'

from indi.project

where region='Middle East';

quit;

Result:

Page 36: Analysis using SAS

35

l) List of Country for region middle east in million

SAS Command:

proc sql;

title Total Population France Greece and Italy;

select name, population

from indi.project

where name in ('France', 'Greece', 'Italy');

quit;

Result:

Page 37: Analysis using SAS

36

m) List of country name with United

SAS Command:

proc sql;

title 'Countries Name with "UNITED"';

select name

from indi.project

having name ? 'United';

quit;

Result:

n) List of countries that population larger than Russia

SAS Command:

proc sql;

title 'List of Countries that Population larger than Russia';

select name

from indi.project

where population gt (select population from indi.project where

name='Russia') ;

quit;

Result:

Page 38: Analysis using SAS

37

o) List of country name with United

SAS Command:

proc sql;

title 'List of Countries that region of India and Iran';

select name, region

from indi.project

where region in (select region from indi.project where name in

('India','Iran')) ;

quit;

Result:

Page 39: Analysis using SAS

38

p) List of country name with United

SAS Command:

proc sql;

title 'List of Countries';

title2 'region more than Canada less than Algeria';

select name

from indi.project

where population gt (select population from indi.project where

name in ('Canada'))

and population lt (select population from indi.project where name

in ('Algeria'));

quit;

Result:

Page 40: Analysis using SAS

39

4.0 References

Books:

Douglas C. Montgomery, Elizabeth A. Peck and G. Geoffrey Vining [2006], “Introduction to

Linear Regression Analysis”

Michael H. Kutner, Christopher J. Nachtsheim, John Neter and William Li, “Applied Linear

Statistical Models”

Website:

http://sites.stat.psu.edu/~xzhan/stat597c/sp04/Chapter7.htm

http://sastipsbyhal.blogspot.com/2012/01/sas-date-calculator-now-available.html