![Page 1: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/1.jpg)
University of Toronto04/22/23 1
Data Mining
The Art and Science of Obtaining Knowledge from Data
Dr. Saed Sayad
![Page 2: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/2.jpg)
University of Toronto04/22/23 2
Agenda
Explosion of data Introduction to data mining Examples of data mining in science
and engineering Challenges and opportunities
![Page 3: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/3.jpg)
University of Toronto04/22/23 3
Explosion of Data Data in the world doubles every 20 months!
NASA’s Earth Orbiting System:
46 megabytes of data per second
4,000,000,000,000 bytes a day
FBI fingerprints image library:
200,000,000,000,000 bytes
In-line image analysis for particle detection:
1 megabyte in one second
![Page 4: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/4.jpg)
University of Toronto04/22/23 4
Explosion of Data (cont.)
![Page 5: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/5.jpg)
University of Toronto04/22/23 5
Explosion of Data (cont.)
![Page 6: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/6.jpg)
University of Toronto04/22/23 6
Explosion of Data (cont.)
![Page 7: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/7.jpg)
University of Toronto04/22/23 7
Explosion of Data (cont.)
![Page 8: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/8.jpg)
University of Toronto04/22/23 8
Fast, accurate, and scalable data analysis techniques to extract useful knowledge:
The answer is Data Mining.
What we need?
![Page 9: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/9.jpg)
University of Toronto04/22/23 9
What is Data Mining?
“Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”
Data KnowledgeData Mining
![Page 10: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/10.jpg)
University of Toronto04/22/23 10
AI,Machine Learning
Statistics
Data Mining
Database
Data Analysis
Data WarehouseOLAP
![Page 11: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/11.jpg)
University of Toronto04/22/23 11
Data MiningData Mining
Data Analysis Database
Statistics Machine Learning Data Warehouse OLAP
![Page 12: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/12.jpg)
University of Toronto04/22/23 12
Text Files Relational Database
Multi-dimensional Database
Entities File Table Cube
Attributes Row and Col
Record, Field, Index
Dimension, Level, Measurement
Methods Read, Write
Select, Insert, Update, Delete
Drill down, Drill up, Drill through
Language - SQL MDX
Database
![Page 13: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/13.jpg)
University of Toronto04/22/23 13
Data Analysis
Classification Regression Clustering Association Sequence Analysis
![Page 14: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/14.jpg)
University of Toronto04/22/23 14
Data Analysis
X1
X2 Y2
Output Variablesor
Targets
Y1Numeric
Categorical
Numeric
Categorical
Regression (0,1)
Classification (good, bad)
age, income, …
gender, occupation, …
Linear Modelsor
Decision Trees
Input Variablesor
Attributes
ModelModel
W1
W2
![Page 15: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/15.jpg)
University of Toronto04/22/23 15
Data Analysis (cont.)
Age
Income
Clustering
1, chips, coke, chocolate2, gum, chips3, chips, coke4, …
Probability (chips, coke) ?
Association
Sequence Analysis…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…
Xt-1 XtT
![Page 16: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/16.jpg)
University of Toronto04/22/23 16
Data Mining in Research Life Cycle
Questions Needs
Search
Research
Experiment
Modeling
Report
Library
Data
Database
Data Analysis
![Page 17: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/17.jpg)
University of Toronto04/22/23 17
Data Mining – Modeling Steps
1.Problem Definition2.Data Preparation3.Exploration4.Modeling5.Evaluation6.Deployment
![Page 18: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/18.jpg)
University of Toronto04/22/23 18
Agenda
Explosion of data Introduction to data mining Examples of data mining in science and
engineering Challenges and opportunities
![Page 19: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/19.jpg)
University of Toronto04/22/23 19
Examples of data mining in science & engineering
1. Data mining in Biomedical Engineering“Robotic Arm Control Using Data Mining Techniques”
2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”
![Page 20: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/20.jpg)
University of Toronto04/22/23 20
1. Problem Definition“Control a robotic arm by means of EMG signals from biceps and triceps muscles.”
Supination Pronation Flexion Extension
Muscle Contraction
Biceps Triceps
Supination H HPronation L LFlexion H LExtension L H
![Page 21: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/21.jpg)
University of Toronto04/22/23 21
2. Data Preparation
The dataset includes 80 records.
There are two input variables; biceps signal and triceps signal.
One output variable, with four possible values; Supination, Pronation, Flexion and Extension.
![Page 22: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/22.jpg)
University of Toronto04/22/23 22
3. Exploration
Triceps
Record#
Scatter Plot
Flexion Extension Supination Pronation
![Page 23: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/23.jpg)
University of Toronto04/22/23 23
3. Exploration (cont.)
Biceps
Record#
Scatter Plot
Flexion Extension Supination Pronation
![Page 24: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/24.jpg)
University of Toronto04/22/23 24
5. Modeling
Classification
OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …
![Page 25: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/25.jpg)
University of Toronto04/22/23 25
6. Model Deployment
A neural network model was successfully implemented inside the robotic arm.
![Page 26: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/26.jpg)
University of Toronto04/22/23 26
Examples of data mining in science & engineering
1. Data mining in Biomedical Engineering“Robotic Arm Control Using Data Mining Techniques”
2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”
![Page 27: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/27.jpg)
University of Toronto04/22/23 27
Plastics Extrusion
Plastic pellets
Plastic melt
![Page 28: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/28.jpg)
University of Toronto04/22/23 28
Film Extrusion
Extruder
Plastic Film
Defect due to particle
contaminant
![Page 29: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/29.jpg)
University of Toronto04/22/23 29
In-Line Monitoring
Transition Piece
Window Ports
![Page 30: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/30.jpg)
University of Toronto04/22/23 30
In-Line Monitoring
Light Source Extruder and Interface
Optical Assembly
Imaging Computer
Light
![Page 31: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/31.jpg)
University of Toronto04/22/23 31
Melt Without Contaminant Particles (WO)
![Page 32: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/32.jpg)
University of Toronto04/22/23 32
Melt With Contaminant Particles (WP)
![Page 33: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/33.jpg)
University of Toronto04/22/23 33
1. Problem Definition
Classify images into those with particles (WP) and those without particles (WO).
WO WP
![Page 34: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/34.jpg)
University of Toronto04/22/23 34
2. Data Preparation
2000 Images
54 Input variables all numeric
One output variables with two possible values-With Particle -Without Particle
![Page 35: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/35.jpg)
University of Toronto04/22/23 35
2. Data Preparation (cont.) Pre-processed images to remove noise
Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles
Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles
54 Input variables, all numeric
One output variable, with two possible values (WP and WO)
![Page 36: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/36.jpg)
University of Toronto04/22/23 36
3. Exploration
Demo!
![Page 37: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/37.jpg)
University of Toronto04/22/23 37
4. Modeling
Classification:
• OneR• Decision Tree• 3-Nearest Neighbors• Naïve Bayesian
![Page 38: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/38.jpg)
University of Toronto04/22/23 38
5. Evaluation
Dataset Attrib. Class One-R C4.5 3.N.N Bayes
Sharp Images 54 2 99.9 99.8 99.8 95.8
Sharp + Blurry Images
54 2 98.5 97.8 97.8 93.3
Sharp + Blurry Images
54 3 87 87 84 79
10 -fold cross-validation
If pixel_density_max < 142 then WP
![Page 39: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/39.jpg)
University of Toronto04/22/23 39
6. Deploy model A Visual Basic program will be developed to implement the model.
![Page 40: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/40.jpg)
University of Toronto04/22/23 40
Agenda
Explosion of data Introduction to data mining Examples of data mining in science &
engineering Challenges and opportunities
![Page 41: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/41.jpg)
University of Toronto04/22/23 41
Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.
![Page 42: Data Mining The Art and Science of Obtaining Knowledge from Data](https://reader036.vdocuments.us/reader036/viewer/2022062222/56816032550346895dcf520d/html5/thumbnails/42.jpg)
University of Toronto04/22/23 42
Data mining is an exciting and challenging field with the ability to solve many complex scientific and
business problems.
You can be part of the solution!