pricipal component analysis using r

15

Upload: karthi-keyan

Post on 05-Dec-2014

272 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Pricipal Component Analysis Using R
Page 2: Pricipal Component Analysis Using R

R is a language and environment for statistical computing and graphics

R provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests,

time-series analysis, classification, clustering, and others.

R can be considered as a different implementation of S.

It compiles and runs on a wide variety of platforms such as UNIX, Windows and Mac OS.

Page 3: Pricipal Component Analysis Using R

An effective data handling and storage facility

A suite of operators for calculations on arrays and matrices

A large, coherent, integrated collection of tools for data analysis

Graphical facilities for data analysis and display either on-screen or on hardcopy

A well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Page 4: Pricipal Component Analysis Using R

R provides a comprehensive set of statistical analysis techniques• Classical statistical tests • Linear and nonlinear modeling• Time-series analysis• Classification & cluster analysis• Spatial statistics• Basically any statistical technique you can think

of is part of a contributed package to R

Page 5: Pricipal Component Analysis Using R

Why Principal Component Analysis used?

Data Dimension Reduction Technique. Principal Component Analysis (PCA) is a powerful tool

during the Analysis, when the data have ‘n’ variables. PCA finds the combination of each and every variable without losing the original data.

PCA are formed some as linear combinations of the data which is used to preserve the information

Principal Component Analysis - the extraction of hidden predictive information from large database organizations, can identify valuable customers, predict future behaviors, and enable firms to make proactive, knowledge-driven decisions.

Page 6: Pricipal Component Analysis Using R

There are four students application

Graduate Admission Office wants to select two graduate students Who should be selected ?

STUDENT GPA GREPROFESSOR RATING

1. 3.2 1270 38

2. 3.9 1600 42

3. 2.9 1500 22

4. 3.0 1400 32

Page 7: Pricipal Component Analysis Using R

There are five steps by PCA using R-STATISTICS to select two best graduate students from rest of the other in the given

table. Implementing data in R-statistics. Calculate the correlation matrix. Calculate the eigenvectors and eigen values of the

correlation matrix Choose the number of principal components to be

retained Derive the new data set.

Page 8: Pricipal Component Analysis Using R

R CODE

> Gpa <- c(3.2,3.9,2.9,3.0) > Gre <- c(1270,1600,1500,1400) > Professorrating <- c(38,42,22,32) > Student <- data. frame(Gpa,Gre,Professorrating) > Student

Gpa Gre Professorrating 1 3.2 1270 38 2 3.9 1600 42 3 2.9 1500 22 4 3.0 1400 32

Page 9: Pricipal Component Analysis Using R

>data= cor(Student) > stud Gpa Gre

Prof.rat

Gpa 1.0000000 0.531991767 0.824316301

Gre 0.5319918 1.000000000

0.009509527

Prof.rat 0.8243163 -0.009509527 1.000000000

It is used to find the linear relationship between two random variables

Page 10: Pricipal Component Analysis Using R

> eigen(stud) $values [1] 1.97676210 1.00866512 0.01457279

$vectors [,1] [,2] [,3][1,] 0.7086607 -0.003993348 0.7055382[2,] 0.3801843 -0.840227900 -0.3866225[3,] 0.5943568 0.542218710 -0.5939183

>barplot(eigen(stud)$vectors)

Page 11: Pricipal Component Analysis Using R

pc1

pc2

pc3

Page 12: Pricipal Component Analysis Using R

>pc1=0.7086607*Gpa+0.3801843*Gre+0.5943568*Professorrating

> pc2=0.003993348*Gpa0.840227900*Gre+0.542218710*Professorrating

> pc3= 0.7055382*Gpa- 0.3866225*Gre - 0.5939183*Professorrating

Page 13: Pricipal Component Analysis Using R

Student 2 and 3 will be selected if first component (pc1) is

used for calculating the score.

STUDENT GPA

GREPROFESSOR RATING SCORE

1. 3.2

1270 38 507.6873

2.

3.9 1600 42

636.0216

3. 2.9

1500 22 585.4074

4. 3.0

1400 32 553.4034

Page 14: Pricipal Component Analysis Using R

PCA is limited to re-expressing the data as a linear combination of its basis vectors.

• PCA is a non-parametric method –independent of user and can’t be configured for specific inputs.

• Principal components are orthogonal.• Mean and variance are sufficient

Page 15: Pricipal Component Analysis Using R