defense_aj.ppt

Statistical Mining in Data Streams

Ankur JainDissertation Defense

Computer Science, UC Santa Barbara

CommitteeEdward Y. Chang (chair)

Divyakant AgrawalYuan-Fang Wang

04/13/23 Statistical Mining in Data Streams 2

RoadmapThe Data Stream Model

Introduction and research issues Related work

Data Stream Mining Stream data clustering Bayesian reasoning for sensor stream processing

Contribution SummaryFuture work

Data Streams

“A data stream is an unbounded and continuous sequence of tuples.”

Tuples arrive online and could be multi-dimensionalA tuple seen once cannot be easily retrieved later No control over the tuple arrival order

“Find the mean temperature of the lagoon in last 3 hours”

Applications – Sensor NetworksApplications – Network MonitoringApplications – Text Processing

DoS,PROBE,U2R ?

INTERNET

AnomaliesIntrusions

Click stream Clustering

Applications

• Video surveillance• Stock ticker monitoring• Process control & manufacturing• Traffic monitoring & analysis• Transaction log processing

Traditional DBMS does not work!

Data Stream Projects STREAM (Stanford)

A general-purpose Data Stream Management System (DSMS) Telegraph (Berkeley)

Adaptive query processing TinyDB: General purpose sensor database

Aurora Project (Brown/MIT) Distributed stream processing Introduces new operators (map, drop etc.)

The Cougar Project (Cornell) Sensors form a distributed database system Cross-layer optimizations (data management layer and the routing layer)

MAIDS (UIUC) Mining Alarming Incidents in Data Streams Streaminer: Data stream mining

Data Stream Processing – Key Ingredients

Adaptivity Incorporate evolutionary changes in the stream

Approximation Exact results are hard to compute fast with limited

memory

A Data Stream Management System (DSMS)

The Central Stream Processing System

Query ProcessingResource ManagementAdaptive Stream Mining

User Query

Stream Synopsis

Streaming Query Result

Streaming Data Sources/SensorsQuery Precision

Sampling RateSliding Window Size

Data FilteringData SamplingResource Management

Sensor CalibrationData Acquisition

Query ProcessingResource ManagementAdaptive Stream Mining

Data FilteringData SamplingResource Management

Sensor CalibrationData Acquisition

Thesis Outline “Develop fast, online, statistical methods for

mining data streams.” Adaptive non-linear clustering in multi-

dimensional streams Bayesian reasoning sensor stream processing Filtering methods for resource conservation Change detection in data streams Video sensor data stream processing

Data Stream Mining Stream data clustering Bayesian reasoning for sensor stream processing

Clustering in High-Dimensional Streams

“Given a continuous sequence of points, group them into some number of clusters, such that the members of a cluster are geometrically close to each other.”

Example Application – Network Monitoring

DoS , Probe,Normal?

Connection tuples (high-dimensional)

INTERNETINTERNET

Stream Clustering – New Challenges One-pass restriction and limited memory constraint

Fading cluster technique proposed by Aggarwal et al. Non-linear separation boundaries

We propose using the kernel trick to deal with the non-linearity issue

Data dimensionality We propose effective incremental dimension reduction

technique

The 2-Tier Framework

x~Tier1:

Stream Segmentation

Tier 2:LDS Projection & Update

Input Space LDS

Adaptive Non-linear Clustering

Latest point received from the stream 2-Tier clustering module

(uses the kernel trick)Fading Clusters

d-dimensional

q-dimensionalq < d

The Fading Cluster MethodologyEach cluster Ci, has a recency value Ri s.t. Ri = f(t-tlast), where, t: current time tlast: last time Ci was updated f(t) = e- t

: fading factor A cluster is erased from memory (faded) when Ri · h,

where h is a user parameter controls the influence of historical data Total number of clusters is bounded

Non-linearity in Data

Spectral clustering methods likely to perform better

Traditional clustering techniques (k-means) do not perform well

Feature space mapping

Input SpaceFeature Space

Non-linearity in Network Intrusion Data

Input Space Feature Space

“ipsweep” attack data

Geometrically well-behaved

Use kernel trick?

The Kernel TrickActual projection in higher dimension is

computationally expensiveThe kernel trick does the non-linear projection

implicitly!

Given two input space vectors x,y k(x,y) = <(x),(y)>

Kernel Function

Gaussian kernel functionk(x,y) = exp(-||x-y||2) used in the previous example !

:x = (x1, x2) → (x) = (x1

2, x22, √2x1x2)

<(x),(z)> = <(x12, x2

2, √2x1x2), (z12, z2

2, √2z1z2)>,

= x12z1

2 + x22z2

2 + 2x1x2z1z2,

= (x1z1 + x2z2)2,

= <x,z>2.k(x,z) = <x,z>2

Kernel Trick - Working ExampleNot required explicitly!

“Kernel trick allows us to make operations in high-dimensional feature space using a kernel function but without explicitly representing “

Stream Clustering – New Challenges One-pass restriction and limited memory constraint

We use the fading cluster technique proposed by Aggarwal et. al.

Non-linear separation boundaries We propose using kernel methods to deal with the non-

linearity issue Data dimensionality

We propose effective incremental dimension reduction technique

Dimensionality ReductionPCA like kernel method desirable

Explicit representation – EVD preferred KPCA is computationally prohibitive - O(n3) The principal components evolve with time –

frequent EVD updates may be necessaryWe propose to perform EVD on grouped-data

instead point-dataRequires a novel kernel method

The 2-Tier Framework

x~Tier1:

Stream Segmentation

Tier 2:LDS Projection & Update

Input Space LDS

Adaptive Non-linear Clustering

Latest point received from the stream 2-Tier clustering module

(uses the kernel trick)Fading Clusters

d-dimensional

q-dimensionalq < d

The 2-Tier Framework …Tier 1 captures the temporal locality in a segment

Segment is a group of contiguous points in the stream geometrically packed closely in the feature space

Tier 2 adaptively selects segments to project data in LDS

Selected segments are called representative segments Implicit data in the feature space is projected explicitly

in LDS such that the feature-space distances are preserved

Update cluster centers and recency values. Delete faded clusters Create new

cluster with x

Assign x to its nearest cluster

Clear contents of S

Add S in memoryand update LDS

Is S a representative

segment?

Is close to an active cluster?

Obtain in LDSx~

TIER 2

Obtain a point x From the stream

Add x to S

Is ((x) novel w.r.t S and s > smin)

OR is s = smax?

TIER 1

The 2-Tier Framework …

Network Intrusion Stream

• Simulated data from MIT Lincoln Labs.• 34 Continuous Attributes (Features)• 10.5 K Records• 22 types of intrusion attacks + 1 normal class

Clustering accuracy at LDS dimensionality u=10

Efficiency - EVD Computations

Newswire data3.8K Records, 16.5K Features, 10 news topics

Image data5K Records, 576 Features, 10 digits

In Retrospect…We proposed an effective stream clustering

frameworkWe use the kernel trick to delineate non-linear

boundaries efficientlyWe use stream segmentation approach to

continuously project data into a low dimensional space

Contributions Towards Stream Mining Stream data clustering Bayesian reasoning sensor steam processing

Bayesian Reasoning for Sensor Data ProcessingUsers submit queries

with precision constraintsResource conservation is

of prime concern to prolong system life

Data acquisition Data communication

“Find the temperature with 80% confidence

Use probabilistic models at central site for approximate predictions preventing actual acquisitions

Dependencies in Sensor Attributes

Get Temperature

Attribute Acquisition Cost

Temperature 50 J

Voltage 5 J

Dependency Model

Temperature

Voltage

Acquire Voltage!

Report TemperatureBayesian Networks

Get Voltage

Using Correlation Models [Deshpande et al.]

Correlation models ignore conditional dependency

Humidity [35-40)

Intel Lab ( Real Sensor network data)Attributes: Voltage (V), Temperature (T), Humidity (H)

“voltage” is conditionally independent of “temperature”, given “humidity” !

“voltage” is correlated with “temperature”

Deshpande et al. [VLDB’04]

BN vs. Correlations

NDBC Buoy DatasetBayesian Network•Maintains vital dependencies only•Lower search complexity O(n)•Storage O(nd), d: avg. node degree•Intuitive dependency structure

Correlation model [Deshpande et. al.]•Maintains all dependencies•Search space of finding best possible alternative sensor attribute is high•Joint probability is represented in O(n2) cells

Intel Lab. Dataset

Bayesian Networks (BN)Qualitative Part – Directed Acyclic Graph (DAG)

• Nodes – Sensor Attributes• Edges – Attribute influence relationship

Quantitative Part – Conditional Probability Table (CPT)• Each node X has its own CPT , P(X|parents(X))

Together, the BN represent the joint probability in factored from: P(T,H,V,L) = P(T)P(H|T)P(V|H)P(L|T)

The “influence relationship” is represented by conditional entropy function H. H(Xi)=k

l=1 P( Xi = xil )log(P( Xi = xil ))

We learn the BN by minimizing H(Xi| Parents(Xi)).

System Architecture

Group-query Plan Generation

Bayesian Inference Engine

Sensor Network

Acquisition Values

Acquisition PlanGroup Query (Q)

{(Wind Speed, 75%)}

{(Temperature, 95%),(Wind Speed, 85%)}

{(Air Pressure, 90%),(Wind Speed, 90%)}

{(Temperature, 80%)}

Query Processor

Storage

CPTsCost

0.01C4

0.20.60.040.1

0.050.050.30.6

Finding the Candidate AttributesFor any attribute in the group-query Q, analyze

candidates attributes in the Markov blanket recursively

Selection criterion

Select candidates in a

greedy fashion

Acquisition cost

Information Gain (Conditional Entropy)

Meet precision constraints

Maximize resource conservation

Experiments – Resource Conservation

NDBC dataset, 7 attributes

Effect of using group-queries, |Q| - Group-query size

Effect of using MB Property with min = 0.90

Wave Period (WP)

Wind Speed (SP)

Water Temperature (WT)

Wind Direction (DR)

Wave Height (WH)

Air Temperature (AT)

Air Pressure (AP)

Results - Selectivity

In Retrospect…Bayesian networks can encode the sensor

dependencies effectivelyOur method provides significant resource

conservation for group-queries

Contribution Summary “Adaptive Stream resource management using Kalman Filters.” [SIGMOD’04]

“Adaptive sampling for sensor networks.” [DMSN’04]

“Adaptive non-linear clustering for Data Streams.” [CIKM’06]

“Using stationary-dynamic camera assemblies for wide-area video surveillance and selective attention.” [CVPR’06]

“Filtering the data streams.” [in submission]

“Efficient diagnostic and aggregate queries on sensor networks.” [in submission]

“OCODDS: An On-line Change-Over Detection framework for tracking evolutionary changes in Data Streams.” [in submission]

Future WorkDevelop non-linear techniques for capturing

temporal correlations in data streamsThe Bayesian framework can be extended to

address “what-if” queries with counterfactual evidence

The clustering framework can be extended for developing stream visualization systems

Incremental EVD techniques can improve the performance further

Thank You !

BACKUP SLDIES!

Back to Stream ClusteringWe propose a 2-tier stream clustering

framework Tier 1: Kernel method that continuously divides the

stream into segments Tier 2: Kernel method that uses the segments to

project data in a low-dimensional space (LDS)The fading clusters reside in the LDS

Clustering – LDS Projection

Clustering – LDS Update

Clustering accuracy at LDS dimensionality u=10

Cluster strengths at LDS dimensionality u=10

Effect of dimensionality

Query Plan Generation Given a group query, the query plan computes

“candidate attributes” that will actually be acquired to successfully address the query.

We exploit the “Markov Blanket (MB)” property to select candidate attributes.

Given a BN G, the Markov Blanket of a node Xi comprises the node, and its immediate parent and child.

))(,())(|()|(

iiiii XMBP

XMBXPXMBXPGXP

Exploiting the MB Property“Given a node Xi and a set of arbitrary nodes Y in a BN s.t. MB(Xi) µ Y [ Xi), the conditional entropy of Xi given Y is at least as high as that given its Markov blanket or H(Xi|Y) ¸ H(Xi|MB(Xi)).”

Proof: Separating MB(Xi) into two parts MB1 = MB(Xi) [ Y and MB2 = MB(Xi) - MB1 and denoting Z = Y – MB(Xi):

H(Xi|Y) = H(Xi|Z,MB1) Y = Z [ MB1

¸ H(Xi|Z,MB1,MB2) Additional information cannot

increase entropy = H(Xi|Z, MB(Xi)) MB(Xi) = MB1|MB2

= H(Xi|MB(Xi)) Markov-blanket definition

Bayesian Reasoning -More Results…

Effect of using MB Property with min = 0.90

Query answer Quality loss50-node Synth. Data BN

Bayesian Reasoning for Group Queries

More accurate in addressing group queries Q = { (Xi, i)|Xi 2 X Æ (0 < i ·1) Æ 1 · i · n) } s.t. i

<maxl P(Xi = xil)} X ={X1,X2 ,X3,…, Xn} Sensor attributes

i Confidence parameters

P(Xi = xil) Probability with which Xi assumes the value of xil

Bayesian reasoning is helpful in detecting abnormalities

Bayesian Reasoning – Candidate attribute selection algorithm

defense_aj.ppt

mining data streams

data streamsstreaminer

data streamsclustering

data streamsnonlinearity

data streamskernel trick

data streamsthesis outline

boundedstatistical mining

stream segmentationtier

Documents

basic buffer overflows explained

bhagavad gita

the dutch republic in international trade

compressing and decompressing folders

disclaimer

jan van eyck and the man in a red turban

oedipus the king: the ideal tragic play

chapter 23

venture capital

daniel zanella and alexander weygers

barclays1

simple functions in haskell

steve jobs' commencement speech at stanford

european colinization of latin america

iron mills essay

explore the levels of creation

fortran

star wars original trilogy trivia (episodes iv-vi)

18 tricks to teach your body

xcdc14a