practical data mining - phm society · using rapidminer download rapidminer from rapid-i.com and...

15
1 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page. © 2012 by Honeywell International Inc. All rights reserved Tutorial T4: Practical Data Mining PHM Conference, Minneapolis, MN 9/23/2012 1:30-3:00 PM Ravi Patankar Principal Engineer Honeywell Aerospace Phoenix, AZ [email protected]

Upload: vuongtuyen

Post on 01-Apr-2018

231 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

1 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

© 2012 by Honeywell International Inc. All rights reserved

Tutorial T4: Practical Data Mining

PHM Conference, Minneapolis, MN

9/23/2012 1:30-3:00 PM

Ravi Patankar

Principal Engineer

Honeywell Aerospace

Phoenix, AZ [email protected]

Page 2: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

2 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Objective

• Getting one started with data mining

using RapidMiner

Download RapidMiner from Rapid-i.com and install

Page 3: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

3 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

RapidMiner

• Leading open source data mining tool

• 100% java

• Uses CRISP-DM

From crisp-dm.org

Page 4: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

4 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Agenda

• RapidMiner Overview - Operators

- Repositories Data

Process

• (remote) Execution

• Saving

- Data visualization

- Documentation?

• ETL - Loading from data files and databases

- Data preparation

- Transformations

- Missing values

- Filtering

- Outliers

- Attribute reduction

- Attribute selection

Page 5: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

5 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Agenda (continued)

• Supervised learning - Modeling methods

- Applying models

- Comparing models

- Performance metrics and evaluation

- Validation

- Automatic supervised learning (PaREn)

• Unsupervised Learning - Association rules

- Frequent item set mining

- FP-Growth

• Text Mining Introduction

• Delivering Analytics via the Web - RapidAnalytics

Collaboration

Webservices

Reports

Page 6: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

7 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Data Visualization and ETL

• Titanic data set from R

• Introduction to process construction

- Decision Tree

• ETL operators

Page 7: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

8 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Examples

• Incorrect settings of automotive DTCs

• Engine X : Fuel System Failure

• Engine Y : Turbine blade and Fuel control

Page 8: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

9 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Control Module Vs Powertrain Voltages

$42<$148D likely powering down

$42>$148D likely powering up

US patent application 12/638,592

DTC P1682

Page 9: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

10 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Fuel System Related Failure

• Analyze normalized engine X data

Page 10: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

11 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Machine Learning

• Supervised

A method for creating a function from training data

Data consist of pairs of input objects and outputs

Ex: Data mining for reasons for known fault that has occurred

Algorithms: Trees - Decision Tree, Bayes, Neural Networks

• Unsupervised

A method where a model is fit to observations.

No a priori output is known.

Ex: Data mining for patterns (faults unknown)

Algorithms: Apriori, Predictive Apriori, Tertius, FPGrowth

Page 11: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

12 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Using Supervised Learning

• Normalized engine Y data

• Detect Failure Onset

- Turbine blade

- Fuel control

• Using different Techniques

• Comparing Performance of Techniques (Methods)

• Validation Methods

• Automation/Optimization

• Using PaREn Plugin

Page 12: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

13 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Fault Detection Performance

Truth

Table

Detected

Faulty No Fault

Actu

al Fau

lty

Good

True +

Consumer

Risk

False -

No

Fa

ult

Manufacturer

Risk

False +

Good

True -

recall

pre

cis

ion

Page 13: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

14 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Optimization to Tune CA-CBM

PHM Algorithms

CA-CBM

performance

Evaluator

Historic

Data

Player

actual conclusions for the data streams

CA-CBM input

data streams

CA-CBM

conclusions

tunable parameters

Optimization automatically tunes parameters to obtain the maximum performance

Page 14: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

15 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

RapidAnalytics

Page 15: Practical Data Mining - PHM Society · using RapidMiner Download RapidMiner from Rapid-i.com and install . Honeywell Confidential. ... -Association rules -Frequent item set mining

16 Honeywell Confidential. Use or disclosure of information on this page is subject to the restrictions on the title page.

Delivering Analytics via the Web

• Deployment of Analyses with RapidAnalytics

- Sharing data, models, and processes

- Managing processes and services

- User Management

- Configuration

• Process Execution

- Remote execution of processes

- Scheduling processes

- Exporting processes as Web services

• Reports

- Using RapidMiner Processes as Web services in reports