data mining vs statistics

7
STATISTIC VS DATA MINING @andrybrew

Upload: andry-alamsyah

Post on 18-Jul-2015

309 views

Category:

Science


5 download

TRANSCRIPT

Page 1: Data Mining vs Statistics

STATISTIC VS DATA MINING@andrybrew

Page 2: Data Mining vs Statistics

OBJECTIVE (FOR BOTH)

Both used for Data Analysis, But They are both different tools

Statistical Role is to describe more or less efficient a dataset while DM is to model for predict, simulate and optimize

Page 3: Data Mining vs Statistics

STATISTICS FACTS

Well established, centuries old methodology of science

No scope of heuristics think

Use sample to generalize the conclusion about the population (hypothese testing, p-values, etc). It needs confidence level of our generalization

Provide Theory first and test it using statistical tools

Deal with structured data in order to solve structured problems, result are software/researcher independent, inference reflects statistical hypothesis testing

Knowledge are not hidden, we are directly able to observe the knowledge. It prove our observation (hypothese) scientifically, so the community will accept our hypothese.

Concern about data collection

It has problem with too little data available and unable to uncover knowledge from complex (interactions) data

Page 4: Data Mining vs Statistics

DATA MINING FACTS

Just come recently with the availibility of large volume and complex data

make generous use of heuristics think

Used on population (or very large data), to find the pattern in the data

Dig out the data and find some patterns, and then make theories

Deal with structured data in order to solve unstructured problems, result are software/researcher dependent, inference reflects computational properties of data mining algorithm at hand. Accurate prediction is more desirable than the explanation

Exploratory tool, we have no idea about the hidden knowledge of the data and it let us discover those invisible knowledge

Less concerned about data collection

No problem with data size and able to uncover knowledge from complex (relations) data and also difficult for direct observations

Page 5: Data Mining vs Statistics

COMPARISON

STATISTICS DATA MINING

Confirmative Explorative

Small Data Set Larga Data Set

Small Number of Variable Large Number of Variable

Deductive (no predictions) Inductive

Numeric Data Numeric and Non-Numeric Data

Clean Data Data Cleaning

source from slideshare.net

Page 6: Data Mining vs Statistics

DATA SCIENCE

Data science is the study of the generalizable extraction of knowledge from data,[1] yet the key word is science.[2] It incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.

Page 7: Data Mining vs Statistics

CONCLUSION

The availibility of large volume data set should make business and sosial science (as well as other sciences) to use DM tools

Business and Sosical Science (as well as other sciences) need to use more DM tools, because of the usability of DM to model, predict and optimize phenomenon

DM/KDD/Data Science are more and more utilized as standard of decision making in modern business