exploratory data analysis.ppt - nc state universitync state university introduction l intelligent...

13
NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids Yixin Cai, Mo-Yuen Chow Electrical and Computer Engineering, North Carolina State University July 2009

Upload: others

Post on 28-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

NC STATE UNIVERSITY

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

Yixin Cai, Mo-Yuen ChowElectrical and Computer Engineering,

North Carolina State UniversityJuly 2009

NC STATE UNIVERSITY Outline

l Introduction

l Integrate data

l Evaluate a single feature

l Build a fault cause classifier

l Summary

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 2

NC STATE UNIVERSITY Introduction

l Intelligent fault management– detection, recording, location, diagnosis, restoration, …

l Problem of interest– Diagnosis: predict what the root cause is based on the available

information before the engineers go on-site– Help (not replace) the engineers to identify the root cause faster

l Challenges– Stochastic nature of faults– Noisy data with errors– More and more incoming data in Smart Grids

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 3

NC STATE UNIVERSITY Data Integration

l Data sources– Utility OMS database

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 4

NC STATE UNIVERSITY Data Integration

l Data sources– Utility OMS database– Public database on

weather, environment, geographic features, etc.

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 5

http://www.nc-climate.ncsu.edu/products

http://landcover.usgs.gov/usgslandcover.php

NC STATE UNIVERSITY Data Integration

l Data sources– Utility OMS database– Public database on

weather, environment, geographic features, etc.

– Private venders

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 6

http://www.mapmart.com/Default.aspx

NC STATE UNIVERSITY Data Integration

l Data integration under GIS framework– Spatial relation

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 7

NC STATE UNIVERSITY Data Integration

l Data integration under GIS framework– Spatial relation– Spatial-temporal relation

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 8

NC STATE UNIVERSITY Evaluate a Single Feature

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 9

l Data preprocess– Define the root cause of interest– Clean errors and noises– Extract features

l Categorical features– Likelihood measure– (Mosaic) plot

,, ( | ) i j

i j i jj

NL P o X x

N= = =

Weather vs. Tree Faults

1:Clear Weather, 2: Extreme Temperature, 4:Raining, 6:Thunderstorm, 8:WindyWeather Conditinos

Tre

e Fa

ults

1 2 4 6 8

01

Season vs. Tree Faults

1:Spring, 2:Summer, 3:Fall, 4:WinterSeason

Tre

e Fa

ults

1 2 3 4

01

Land Use vs. Tree Faults

21~24:Developed, 41~43:Forest, 71:Grassland, 81~82:Cultivated, 90:WetlandsLand Use

Tre

e Fa

ults

21 22 23 2431 41 42 43 52 71 81 82 90

01

NC STATE UNIVERSITY Evaluate a Single Feature

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 10

l Continuous features– Likelihood measure– plot

0 50 100 150 200 250 300

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Distance to Trees vs. Tree Faults

Distance to Trees (meters)

Tre

e F

ault

Like

lihoo

d

0 200 400 600 800

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Distance to Roads vs. Tree Faults

Distance to Roads (meters)

Tre

e F

ault

Like

lihoo

d

0 10 20 30 40

0.3

0.4

0.5

0.6

0.7

0.8

Wind Speed vs. Tree Faults

Hourly Max Wind Speed (miles/hour)

Tre

e F

ault

Like

lihoo

d

,, ( | ) i j

i j i jj

NL P o X x

N= ≥ =

NC STATE UNIVERSITY Build a Fault Cause Classifier

l Linear discriminant analysis (LDA)

l Logistic regression (LR)

l Comparison

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 11

1

NT

i ii

D w f=

= = ∑w f 11 0[ ( )]TD −= ∑ −μ μ f

( 1)logit( 1) ln( 0)

TP ccP c

α=

= = = +=

β f 1( 1)1

TP ce α− −

= =+ β f

LDA LRModel linear classifier non-linear classifierData assumption normal distributed

with equal variancenone

Computation matrix manipulation maximum likelihood

NC STATE UNIVERSITY Case Study

l Data sources– Progress Energy Carolinas outage database– NC Climate Office– NC State Univ. GIS data service

l Fault causes of interest– Tree-caused– Animal-caused– Other

l Features– 7 categorical – 5 continuous

l Classifiers– LDA– LR

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 12

6 Features 12 Featurestraining testing training testing

Tree faultACC 0.75(0.01) 0.76(0.01) 0.77(0.02) 0.76(0.02)POD 0.32(0.03) 0.34(0.03) 0.41(0.03) 0.39(0.03)FAR 0.34(0.03) 0.32(0.03) 0.32(0.04) 0.33(0.04)

Animal fault

ACC 0.84(0.02) 0.83(0.01) 0.84(0.02) 0.84(0.01)POD 0.31(0.04) 0.29(0.04) 0.35(0.03) 0.35(0.03)FAR 0.42(0.05) 0.43(0.05) 0.39(0.05) 0.41(0.05)

6 Features 12 Featurestraining testing training testing

Tree faultACC 0.76(0.02) 0.76 (0.02) 0.77 (0.01) 0.77(0.02)POD 0.32(0.03) 0.32(0.03) 0.44 (0.03) 0.44(0.03)FAR 0.30(0.04) 0.30(0.04) 0.32 (0.03) 0.34(0.03)

Animal fault

ACC 0.83(0.02) 0.83 (0.02) 0.84(0.01) 0.84(0.01)POD 0.30(0.03) 0.31(0.03) 0.37 (0.04) 0.35(0.03)FAR 0.42(0.04) 0.41(0.06) 0.41 (0.06) 0.41(0.06)

Classification Performance Using LDA on Sample Dataset

Classification Performance Using LR on Sample Dataset

NC STATE UNIVERSITY Summary

l Methods for exploratory data analysis– Integrate data from multiple sources under GIS framework– Use likelihood measure to evaluate both categorical and continuous

features– Apply LDA and LR as fault cause classifiers

l Findings– LDA and LR performs similar– Adding new features helps fault diagnosis

l Future work– Systematic feature selection methods– Advanced fault diagnosis algorithms– Novel sampling strategy

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 13