practical data mining - phm society · using rapidminer download rapidminer from rapid-i.com and...
TRANSCRIPT
1 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
© 2012 by Honeywell International Inc. All rights reserved
Tutorial T4: Practical Data Mining
PHM Conference, Minneapolis, MN
9/23/2012 1:30-3:00 PM
Ravi Patankar
Principal Engineer
Honeywell Aerospace
Phoenix, AZ [email protected]
2 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Objective
• Getting one started with data mining
using RapidMiner
Download RapidMiner from Rapid-i.com and install
3 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
RapidMiner
• Leading open source data mining tool
• 100% java
• Uses CRISP-DM
From crisp-dm.org
4 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Agenda
• RapidMiner Overview - Operators
- Repositories Data
Process
• (remote) Execution
• Saving
- Data visualization
- Documentation?
• ETL - Loading from data files and databases
- Data preparation
- Transformations
- Missing values
- Filtering
- Outliers
- Attribute reduction
- Attribute selection
5 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Agenda (continued)
• Supervised learning - Modeling methods
- Applying models
- Comparing models
- Performance metrics and evaluation
- Validation
- Automatic supervised learning (PaREn)
• Unsupervised Learning - Association rules
- Frequent item set mining
- FP-Growth
• Text Mining Introduction
• Delivering Analytics via the Web - RapidAnalytics
Collaboration
Webservices
Reports
7 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Data Visualization and ETL
• Titanic data set from R
• Introduction to process construction
- Decision Tree
• ETL operators
8 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Examples
• Incorrect settings of automotive DTCs
• Engine X : Fuel System Failure
• Engine Y : Turbine blade and Fuel control
9 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Control Module Vs Powertrain Voltages
$42<$148D likely powering down
$42>$148D likely powering up
US patent application 12/638,592
DTC P1682
10 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Fuel System Related Failure
• Analyze normalized engine X data
11 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Machine Learning
• Supervised
A method for creating a function from training data
Data consist of pairs of input objects and outputs
Ex: Data mining for reasons for known fault that has occurred
Algorithms: Trees - Decision Tree, Bayes, Neural Networks
• Unsupervised
A method where a model is fit to observations.
No a priori output is known.
Ex: Data mining for patterns (faults unknown)
Algorithms: Apriori, Predictive Apriori, Tertius, FPGrowth
12 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Using Supervised Learning
• Normalized engine Y data
• Detect Failure Onset
- Turbine blade
- Fuel control
• Using different Techniques
• Comparing Performance of Techniques (Methods)
• Validation Methods
• Automation/Optimization
• Using PaREn Plugin
13 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Fault Detection Performance
Truth
Table
Detected
Faulty No Fault
Actu
al Fau
lty
Good
True +
Consumer
Risk
False -
No
Fa
ult
Manufacturer
Risk
False +
Good
True -
recall
pre
cis
ion
14 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Optimization to Tune CA-CBM
PHM Algorithms
CA-CBM
performance
Evaluator
Historic
Data
Player
actual conclusions for the data streams
CA-CBM input
data streams
CA-CBM
conclusions
tunable parameters
Optimization automatically tunes parameters to obtain the maximum performance
15 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
RapidAnalytics
16 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.
Delivering Analytics via the Web
• Deployment of Analyses with RapidAnalytics
- Sharing data, models, and processes
- Managing processes and services
- User Management
- Configuration
• Process Execution
- Remote execution of processes
- Scheduling processes
- Exporting processes as Web services
• Reports
- Using RapidMiner Processes as Web services in reports