data mining vs statistics
TRANSCRIPT
STATISTIC VS DATA MINING@andrybrew
OBJECTIVE (FOR BOTH)
Both used for Data Analysis, But They are both different tools
Statistical Role is to describe more or less efficient a dataset while DM is to model for predict, simulate and optimize
STATISTICS FACTS
Well established, centuries old methodology of science
No scope of heuristics think
Use sample to generalize the conclusion about the population (hypothese testing, p-values, etc). It needs confidence level of our generalization
Provide Theory first and test it using statistical tools
Deal with structured data in order to solve structured problems, result are software/researcher independent, inference reflects statistical hypothesis testing
Knowledge are not hidden, we are directly able to observe the knowledge. It prove our observation (hypothese) scientifically, so the community will accept our hypothese.
Concern about data collection
It has problem with too little data available and unable to uncover knowledge from complex (interactions) data
DATA MINING FACTS
Just come recently with the availibility of large volume and complex data
make generous use of heuristics think
Used on population (or very large data), to find the pattern in the data
Dig out the data and find some patterns, and then make theories
Deal with structured data in order to solve unstructured problems, result are software/researcher dependent, inference reflects computational properties of data mining algorithm at hand. Accurate prediction is more desirable than the explanation
Exploratory tool, we have no idea about the hidden knowledge of the data and it let us discover those invisible knowledge
Less concerned about data collection
No problem with data size and able to uncover knowledge from complex (relations) data and also difficult for direct observations
COMPARISON
STATISTICS DATA MINING
Confirmative Explorative
Small Data Set Larga Data Set
Small Number of Variable Large Number of Variable
Deductive (no predictions) Inductive
Numeric Data Numeric and Non-Numeric Data
Clean Data Data Cleaning
source from slideshare.net
DATA SCIENCE
Data science is the study of the generalizable extraction of knowledge from data,[1] yet the key word is science.[2] It incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.
CONCLUSION
The availibility of large volume data set should make business and sosial science (as well as other sciences) to use DM tools
Business and Sosical Science (as well as other sciences) need to use more DM tools, because of the usability of DM to model, predict and optimize phenomenon
DM/KDD/Data Science are more and more utilized as standard of decision making in modern business