agnes algorithm
TRANSCRIPT
JAMES COOK AUSTRALIA INSTITUTE OF HIGHER LEARNING
In SINGAPORE
DATA MINING PROJECT
AGGLOMERATIVE NESTING – AGNES for COMPANY CLUSTERING
Instructor : Dr.Insu SongStudents :
Ho Thi Hoang Yen – jc13139122Bryan Anselme - jc13145761
Content
1. Introduction 2. Agnes – Agglomerative Nesting
3. Application Demo
Introduction Data mining – A history :
1962 : The Future of Data Analysis (John W.Tukey)
1989 : The first Knowledge Discovery in Databases (KDD) workshop (Gregpry Piatetsky Shapiro)
Þ New era of data analysis
Introduction
WHY?
Most of the current money mass is invested in stocks market
Can be beneficial for portfolio management (capacity to have more choice to build the portfolio)
Better prediction by using information from multiple stocks rather than only one
Introduction- Automatically collecting data
- Preprocessing- Clustering- Building model- Predicting
Clustering
Categories:• Partitioning Method• Hierarchical Method• Density-based
Method• Grid-Based Method• Model-Based
Method• Clustering high
dimensional data Method
• Constraint-based Method
K- means, K-mediods, CLARAN
Agglomerative DivisiveBIRCH ROCK Chameleon
DBSCAN OPTICS DENCLUE
STINGWAVE CLUSTERExpectation–
MaximizationConceptual ClusteringNeural Network ApproachCLIQUEPROCLUSFrequent Pattern-BasedObstacle Objects,User-ConstrainedSemi-Supervised
Agglomerative Hierarchical Clustering – WHY AGNES ?
- Not Sensitive to noise
- Doesn’t need a number of cluster
- We need only to run this once
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering Step 1 : Calculate the distance
matrixStep 2 : Find the minimum distance in the matrixStep 3 : Merge the two nearest clusters.Step 4 : Calculate the center of the new cluster.Step 5 : Repeat 2 to 4 until we have only one cluster.
WHY R?- Free software
environment for statistical computing and graphics.
- Really optimized package function and data structure handling.
Data & Preprocessing?
Step 1 : Collect raw data from the NASDAQ website
Step 2 : Download the data from yahoo finance
Step 3 : Clean the data
Step 4 : Compute the return rate
Step 5 : Normalize the data
Real data
Result
COMPARISION
References1. Han, J., Kamber, M., & Pei, J. (2006, April 6). Data
mining, southeast asia edition: Concepts and techniques. Morgan kaufmann.
2. Kumar, D., & Bhardwaj, D. (2011). Rise of data mining: Current and future application areas. IJCSI International Journal of Computer Science Issues, 8(5).
https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/agnes.htmlhttp://www.tutorialspoint.com/data_mining/dm_cluster_analysis.htm
THANK YOU !!