agnes algorithm

JAMES COOK AUSTRALIA INSTITUTE OF HIGHER LEARNING

In SINGAPORE

DATA MINING PROJECT

AGGLOMERATIVE NESTING – AGNES for COMPANY CLUSTERING

Instructor : Dr.Insu SongStudents :

Ho Thi Hoang Yen – jc13139122Bryan Anselme - jc13145761

Content

1. Introduction 2. Agnes – Agglomerative Nesting

3. Application Demo

Introduction Data mining – A history :

1962 : The Future of Data Analysis (John W.Tukey)

1989 : The first Knowledge Discovery in Databases (KDD) workshop (Gregpry Piatetsky Shapiro)

Þ New era of data analysis

http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177704711



http://www.kdnuggets.com/meetings/kdd89/index.html



Introduction

WHY?

Most of the current money mass is invested in stocks market

Can be beneficial for portfolio management (capacity to have more choice to build the portfolio)

Better prediction by using information from multiple stocks rather than only one

Introduction- Automatically collecting data

- Preprocessing- Clustering- Building model- Predicting

Clustering

Categories:• Partitioning Method• Hierarchical Method• Density-based

Method• Grid-Based Method• Model-Based

Method• Clustering high

dimensional data Method

• Constraint-based Method

K- means, K-mediods, CLARAN

Agglomerative DivisiveBIRCH ROCK Chameleon

DBSCAN OPTICS DENCLUE

STINGWAVE CLUSTERExpectation–

MaximizationConceptual ClusteringNeural Network ApproachCLIQUEPROCLUSFrequent Pattern-BasedObstacle Objects,User-ConstrainedSemi-Supervised

Agglomerative Hierarchical Clustering – WHY AGNES ?

- Not Sensitive to noise

- Doesn’t need a number of cluster

- We need only to run this once

Agglomerative Hierarchical Clustering

Agglomerative Hierarchical Clustering Step 1 : Calculate the distance

matrixStep 2 : Find the minimum distance in the matrixStep 3 : Merge the two nearest clusters.Step 4 : Calculate the center of the new cluster.Step 5 : Repeat 2 to 4 until we have only one cluster.

WHY R?- Free software

environment for statistical computing and graphics.

- Really optimized package function and data structure handling.

Data & Preprocessing?

Step 1 : Collect raw data from the NASDAQ website

Step 2 : Download the data from yahoo finance

Step 3 : Clean the data

Step 4 : Compute the return rate

Step 5 : Normalize the data

Real data

Result

COMPARISION

References1. Han, J., Kamber, M., & Pei, J. (2006, April 6). Data

mining, southeast asia edition: Concepts and techniques. Morgan kaufmann.

2. Kumar, D., & Bhardwaj, D. (2011). Rise of data mining: Current and future application areas. IJCSI International Journal of Computer Science Issues, 8(5).

https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/agnes.htmlhttp://www.tutorialspoint.com/data_mining/dm_cluster_analysis.htm

https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/agnes.html




http://www.tutorialspoint.com/data_mining/dm_cluster_analysis.htm



THANK YOU !!

agnes algorithm

Data & Analytics

data step

introduction data

data preprocessing

data preprocessing

raw data

real data

rise of data mining

data structure handling