chapter 01.introduction to data mining
DESCRIPTION
Data MiningTRANSCRIPT
![Page 1: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/1.jpg)
Data MiningIKO42351
Bahan Rancangan PengajaranMohamad Ivan Fanany, Dr. Eng.,
![Page 2: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/2.jpg)
Lectures Introduction
● Goals and Objectives
● Textbooks
● Syllabus
● Evaluation
● Lecture Plans
● Rules
![Page 3: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/3.jpg)
Goals and Objectives
● After finishing this course, students are expected to
understand the concept, tools, and techniques of
machine learning for data mining.
● Beside acquiring general picture of the most recent
development in data mining, students are also
expected to deeply understand the used techniques
and appreciate their strengths and applicability by
actively doing their own experiments both as
individual and as a member of a team.
![Page 4: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/4.jpg)
Textbooks
Major textbookbefore UTS
Programming Book
![Page 5: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/5.jpg)
Textbooks
1. Introduction to Data Mining, Pang-Ning
Tan, Michael Steinbach, Vipin Kumar,
Addison-Wesley, 2006
2. R and Data Mining, Examples and Case
Studies, YangChang Zhao, 2013
![Page 6: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/6.jpg)
Syllabus (Weekly)
1) Introduction
2) Data
3) Exploring Data
4) Classification: Basic Concepts, Decision
Tree, and Model Evaluation
5) Classification: Alternative Techniques
6) Association: Basic Concept and Algorithms
7) Association Analysis: Advanced Concepts
8) Cluster Analysis
9) Anomali Detection
UTSWittenCh.1-7
UASKumarCh.6-8WittenCh.8+
![Page 7: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/7.jpg)
Evaluation
1.Tugas Individu (PR): 8 kali = 16%
2.Tugas Kelompok (TK): 1 kali = 14%
3.Ujian Tengah Semester = 35%
4.Ujian Akhir Semester = 35%
5.Bonus (partisipasi di kelas, pop-quiz, dll)=++
6.Total: 100% ++
![Page 8: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/8.jpg)
Rules
● Toleransi keterlambatan 15 menit
● Handphone harus non-aktif
● Terkait PR:
◆Seluruh PR dan Tugas diwajibkan
menggunakan Python(x,y)
◆Untuk PR, tuliskan kode asisten dosen pada
masing-masing berkas PR, dan kumpulkan
berdasarkan kode asisten tersebut.
◆Penalti keterlambatan → Lihat BRP
![Page 9: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/9.jpg)
R and R Studio
http://www.rstudio.com/http://www.r-project.org/
![Page 10: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/10.jpg)
● Lots of data is being collected
and warehoused
◆ Web data, e-commerce
◆ purchases at department/
grocery stores
◆ Bank/Credit Card
transactions
● Computers have become cheaper and more powerful
● Competitive Pressure is Strong
◆ Provide better, customized services for an edge (e.g. in
Customer Relationship Management)
Why Mine Data? Commercial Viewpoint
![Page 11: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/11.jpg)
Why Mine Data? Scientific Viewpoint
● Data collected and stored at
enormous speeds (GB/hour)
◆ remote sensors on a satellite
◆ telescopes scanning the skies
◆ microarrays generating gene
expression data
◆ scientific simulations
generating terabytes of data
● Traditional techniques infeasible for raw data
● Data mining may help scientists
◆ in classifying and segmenting data
◆ in Hypothesis Formation
![Page 12: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/12.jpg)
Mining Large Data Sets - Motivation
● There is often information “hidden” in the data that is not readily evident
● Human analysts may take weeks to discover useful information
● Much of the data is never analyzed at all
Number of
analysts
![Page 13: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/13.jpg)
What is Data Mining?
● Many Definitions◆ Non-trivial extraction of implicit, previously
unknown and potentially useful information from data
◆ Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
![Page 14: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/14.jpg)
•What is (not) Data Mining?
What is Data Mining?
– Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)
– Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)
What is not Data Mining?
– Look up phone number in phone directory
– Query a Web search engine for information about “Amazon”
![Page 15: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/15.jpg)
● Draws ideas from machine learning/AI, pattern
recognition, statistics, and database systems
● Traditional Techniques
may be unsuitable due to
◆Enormity of data
◆High dimensionality
of data
◆Heterogeneous,
distributed nature
of data
Origins of Data Mining
Machine Learning/
Pattern Recognition
Statistics/AI
Data Mining
Database systems
![Page 16: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/16.jpg)
© 2002, AvaQuest Inc.
Text
Mining
Data
Mining
Data
Retrieval
Information
Retrieval
Search
(goal-oriented)
Discover
(opportunistic)
Structured
Data
Unstructured
Data (Text)
Search Vs Discovery
Data Mining = KDD: Knowledge ‘Discovery’ from DB
![Page 17: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/17.jpg)
Data Mining Tasks
● Prediction Methods
◆Use some variables to predict unknown or
future values of other variables.
● Description Methods
◆Find human-interpretable patterns that
describe the data.
From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
![Page 18: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/18.jpg)
Data Mining Tasks...
● Classification [Predictive]
● Clustering [Descriptive]
● Association Rule Discovery [Descriptive]
● Sequential Pattern Discovery [Descriptive]
● Regression [Predictive]
● Deviation Detection [Predictive]
![Page 19: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/19.jpg)
Do you want to be a Miner?
19
Wisdom
Knowledge
Information
Data
Pattern
![Page 20: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/20.jpg)
Why we need Data Mining?
The Internet
Storage
Storage
Storage
IncreasedCapacity
LowerCost
Faster... & Faster...
Storage
DATA EXPLOSION
DATA MINING
Wisdom
Knowledge
Information
Data
CompetitiveAdvantages
![Page 21: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/21.jpg)
Data Mining and Machine Learning
MACHINE LEARNING
DATA MINING
MULTI-SOURCE
MULTI-TYPE
ENSEMBLE LEARNING
MULTI-DIMENSION
SPATIO-TEMPORAL
BIG DATA
DEEP LEARNING
![Page 22: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/22.jpg)
Data Mining and Database
DATABASE
DATA MINING
DATA WAREHOUSE
DATA CLEANING
CLUSTER ANALYSIS
DATA CUBE OLAP
ASSOCIATION ANALYSIS
BIG DATA
![Page 23: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/23.jpg)
Evolution of Database Technology
![Page 24: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/24.jpg)
Financial Reporting
![Page 25: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/25.jpg)
Another Dashboard
![Page 26: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/26.jpg)
Another Dashboard
![Page 27: Chapter 01.Introduction to Data Mining](https://reader035.vdocuments.us/reader035/viewer/2022081513/55cf922e550346f57b945cea/html5/thumbnails/27.jpg)
Another Dashboard