multivariate analysis for 26 rice grain varieties

Post on 05-Jul-2015

130 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Case Study :Rice Grain Varieties each with 4-5 replicates,with their 74 chemical constituents Functional:- 1. Classification of Rice Varieties 2. Searching for responsible variables explaining total variability among the measurements. 3. Detection of Superior varieties

TRANSCRIPT

UNIVERSITY OF KALYANI

M . Sc in Statistics

Multivariate analysis of Chemical Compositions

Related to some rice grain varieties

Team members : DIPIKA PATRA ARNAB JANA

Mentor: SADHAN SAMAR MAITY

Team members : DIPIKa PaTra arNab JaNa

meNTor: saDhaN swaPaN maITy

DATA DESCRIPTION:Data contain 26 varieties of rice grains, each with 4-5 Data contain 26 varieties of rice grains, each with 4-5 replicates, with their 74 chemical constituents placed replicates, with their 74 chemical constituents placed in 7 different groups (groups marked with different in 7 different groups (groups marked with different colours)colours)..\Documents\RICE_RAW.xls..\Documents\RICE_RAW.xls named as named as1> AMINO ACID1> AMINO ACID2> ORGANIC ACID chelated with salt2> ORGANIC ACID chelated with salt3> PHENOL3> PHENOL4> LIPIDS4> LIPIDS5> CARBOHYDRAD5> CARBOHYDRAD6> STEROL6> STEROL7> NITROGEN containing RIBOSE7> NITROGEN containing RIBOSEThe measurements were generated by analyzing the rice The measurements were generated by analyzing the rice extracts in GC-MS instrument. Here measurements are extracts in GC-MS instrument. Here measurements are unit free. unit free.

OBJECTIVE OF THE PROJECT:OBJECTIVE OF THE PROJECT:

•Trying for classification of rice varieties.Trying for classification of rice varieties.

•Searching for responsible variables explaining Searching for responsible variables explaining total variability among the measurements.total variability among the measurements.

•Detection of superior varieties.Detection of superior varieties.

ANALYTICAL SOFTWARES :ANALYTICAL SOFTWARES :

SAS, SPSS, MINITAB, MICROSOFT OFFICESAS, SPSS, MINITAB, MICROSOFT OFFICE

STEPS OF ANALYSISSTEPS OF ANALYSIS::

1>Preparing data1>Preparing data

2>2>Multivariate AppliancesMultivariate AppliancesOne-way MANOVAOne-way MANOVACluster Analysis(CA)Cluster Analysis(CA)Principal Component Analysis(PCA)Principal Component Analysis(PCA)Canonical Correlation Analysis(CCA)Canonical Correlation Analysis(CCA)Multi-dimensional Scaling(MDS)Multi-dimensional Scaling(MDS)Profile Analysis(PA)Profile Analysis(PA)

3> Interpretation of the results3> Interpretation of the results

4> Conclusion4> Conclusion

ONE WAY MANOVA

ONE-WAY MANOVAONE-WAY MANOVA

The MANOVA procedure tries to find out if there is The MANOVA procedure tries to find out if there is any significant difference among the 26 varieties any significant difference among the 26 varieties across 7 Biochemical groups. We assume that the across 7 Biochemical groups. We assume that the concerned datasets are coming from 26 concerned datasets are coming from 26 homoscedastic multi-normal populations with homoscedastic multi-normal populations with different mean vectors.different mean vectors.

Output are shown here-Output are shown here- ..\Documents\MANOVA_OUTPUT.xlsx..\Documents\MANOVA_OUTPUT.xlsx

CONCLUSION:CONCLUSION:

One way MANOVA results show all the p-values are One way MANOVA results show all the p-values are very small. So we reject the hypothesis which very small. So we reject the hypothesis which says that all the varieties are same.says that all the varieties are same.So there So there are significant differences among the mean are significant differences among the mean vectors not only of the varieties but also for vectors not only of the varieties but also for any group.any group.

CLUSTER ANALYSIS

FINAL PARTITIONFINAL PARTITION

Using Using single linkagesingle linkage & & Euclidean measureEuclidean measureCluster 1Cluster 1WA WB WC WD WE WA WB WC WD WE WG WH WI WJ WK WG WH WI WJ WK WL WM RB RD RF WL WM RB RD RF

Cluster 2Cluster 2WF WF

Cluster 3Cluster 3WN WO WP WQ WN WO WP WQ

Cluster 4Cluster 4WR WT RA RC RE WR WT RA RC RE

Cluster 5Cluster 5WSWS

DENDROGRAM OF 26 VARIETIES:DENDROGRAM FOR 26 VARIETIES

PRINCIPAL COMPONENT ANALYSIS

Eigen values of the Covariance MatrixEigen values of the Covariance Matrix

Eigen Eigen value value

DifferenceDifferenceProportionProportionCumulativeCumulative

11 744960.685744960.685 666389.993666389.9930.83870.8387 0.83870.8387

22 78570.69178570.691 32024.0932024.09 0.08850.0885 0.92710.9271

33 46546.60246546.602 36036.2336036.23 0.05240.0524 0.97950.9795

44 10510.37210510.372 7521.537521.53 0.01180.0118 0.99140.9914

55 2988.8422988.842 424.482424.482 0.00340.0034 0.99470.9947

Principal component vectors are shown herePrincipal component vectors are shown here..\..\..\dipika\KU_PRINCIPAL_1.xls..\..\..\dipika\KU_PRINCIPAL_1.xls

We select out responsible variables whose We select out responsible variables whose contribution to the principal components is contribution to the principal components is significant (with loading beyond ± 0.5).significant (with loading beyond ± 0.5). Responsible variables (with their loadings)Responsible variables (with their loadings)P1 Guanine (0.89) P1 Guanine (0.89) P2 Sucrose (0.91) P2 Sucrose (0.91) P3 Linoleic Acid (0.66)P3 Linoleic Acid (0.66)P4 Phosphate (0.67)P4 Phosphate (0.67)

CANONICAL CORRELATION CANONICAL CORRELATION ANALYSIS:ANALYSIS:

GROUPGROUP 11 22 33 44 55 66 7711 0.9750.975 0.9660.966 0.9760.976 0.8980.898 0.8270.827 0.9560.95622 0.9750.975 0.9880.988 0.9930.993 0.9550.955 0.8580.858 0.9810.98133 0.9660.966 0.9880.988 0.9960.996 0.8320.832 0.7310.731 0.9880.98844 0.9760.976 0.9930.993 0.9960.996 0.9430.943 0.8970.897 0.9890.98955 0.8980.898 0.9550.955 0.8320.832 0.9430.943 0.80.8 0.8810.88166 0.8270.827 0.8580.858 0.7310.731 0.8970.897 0.80.8 0.8160.81677 0.9560.956 0.9810.981 0.9880.988 0.9890.989 0.8810.881 0.8160.816

Canonical Correlation is a measure of association Canonical Correlation is a measure of association between two groups of random variables.between two groups of random variables.

Following table gives the entries of largest Following table gives the entries of largest Canonical Correlation between any two pairs of Canonical Correlation between any two pairs of groups.groups.

MULTIDIMENSIONAL SCALING :

Using SPSS Given outputs are:

Main points from SPSS output :

CONCLUSIONCONCLUSION

High canonical correlation indicates more High canonical correlation indicates more common(latent) factors interplay among the common(latent) factors interplay among the groups. From CCA, the table related to first groups. From CCA, the table related to first canonical correlation shows the entries nearer to canonical correlation shows the entries nearer to unity indicating a good no. of (common) factors unity indicating a good no. of (common) factors interplays among the 7 groups. interplays among the 7 groups.

However, from MDS analysis 7 groups are found However, from MDS analysis 7 groups are found quite scattered indicating their uniqueness, quite scattered indicating their uniqueness, rather than their similarities. Thus this event rather than their similarities. Thus this event contradicts the former establishment by CCA. It contradicts the former establishment by CCA. It may happen that latent factors among 7 groups are may happen that latent factors among 7 groups are not uncorrelated (i.e., non-orthogonal). We not uncorrelated (i.e., non-orthogonal). We dispense with factor analysis. dispense with factor analysis.

SPECIAL ATTENTION TO AMINO ACID GROUP

Considering varieties from cluster 1

Variety “WF”

Cluster 3

Cluster 4

Cluster 5

From the above 5 diagrams, it is clear that no clear cut views on the varieties can be settled. Indeed, superiority amongst the varieties is hard to judge without any suitable criterion framed beforehand.

FINAL CONCLUSION: The varieties are significantly different4 variables namely Guanine, Sucrose, Linoleic Acid & Phosphate are detected as responsible variables for capturing maximum share of system variance.We can group the 26 varieties into 5 groups(or clusters)Canonical correlations among 7 bio-chemical groups are found very high. So, they are governed by internal common factor(s), which needs factor analysis (FA). Due to shortage of time FA is not conducted.CCA & MDS lead to adverse results regarding biochemical groups.From profile diagrams, indeed no clear cut conclusion on best variety is obtained.Correspondence Analysis on variety vs. constituents could throw some better insight about their level wise correspondence. Due to shortage of time such analysis could not be done.

Thanking You

DIPIKA PATRAROLL NO: 96/STS/115002dipika.patra1988@gmail.com

ARNAB JANAROLL NO: 96/STS/115013arnabjana.jana@gmail.com

top related