defense_aj.ppt

Statistical Mining in Data Streams

Ankur JainDissertation Defense

Computer Science, UC Santa Barbara

CommitteeEdward Y. Chang (chair)

Divyakant AgrawalYuan-Fang Wang

04/13/23 Statistical Mining in Data Streams 2

RoadmapThe Data Stream Model

Introduction and research issues Related work

Data Stream Mining Stream data clustering Bayesian reasoning for sensor stream processing

Contribution SummaryFuture work


Data Streams

“A data stream is an unbounded and continuous sequence of tuples.”

Tuples arrive online and could be multi-dimensionalA tuple seen once cannot be easily retrieved later No control over the tuple arrival order


“Find the mean temperature of the lagoon in last 3 hours”

Applications – Sensor NetworksApplications – Network MonitoringApplications – Text Processing

DoS,PROBE,U2R ?

INTERNET

AnomaliesIntrusions

Blogs

Email

Click stream Clustering

Applications

• Video surveillance• Stock ticker monitoring• Process control & manufacturing• Traffic monitoring & analysis• Transaction log processing

Traditional DBMS does not work!


Data Stream Projects STREAM (Stanford)

A general-purpose Data Stream Management System (DSMS) Telegraph (Berkeley)

Adaptive query processing TinyDB: General purpose sensor database

Aurora Project (Brown/MIT) Distributed stream processing Introduces new operators (map, drop etc.)

The Cougar Project (Cornell) Sensors form a distributed database system Cross-layer optimizations (data management layer and the routing layer)

MAIDS (UIUC) Mining Alarming Incidents in Data Streams Streaminer: Data stream mining


Data Stream Processing – Key Ingredients

Adaptivity Incorporate evolutionary changes in the stream

Approximation Exact results are hard to compute fast with limited

memory


A Data Stream Management System (DSMS)

The Central Stream Processing System

Query ProcessingResource ManagementAdaptive Stream Mining

User Query

Stream Synopsis

Streaming Query Result

Streaming Data Sources/SensorsQuery Precision

Sampling RateSliding Window Size

Data FilteringData SamplingResource Management

Sensor CalibrationData Acquisition

Query ProcessingResource ManagementAdaptive Stream Mining

Data FilteringData SamplingResource Management

Sensor CalibrationData Acquisition


Thesis Outline “Develop fast, online, statistical methods for

mining data streams.” Adaptive non-linear clustering in multi-

dimensional streams Bayesian reasoning sensor stream processing Filtering methods for resource conservation Change detection in data streams Video sensor data stream processing




Data Stream Mining Stream data clustering Bayesian reasoning for sensor stream processing



Clustering in High-Dimensional Streams

“Given a continuous sequence of points, group them into some number of clusters, such that the members of a cluster are geometrically close to each other.”


Example Application – Network Monitoring

DoS , Probe,Normal?

DoS , Probe,Normal?

Connection tuples (high-dimensional)

INTERNETINTERNET


Stream Clustering – New Challenges One-pass restriction and limited memory constraint

Fading cluster technique proposed by Aggarwal et al. Non-linear separation boundaries

We propose using the kernel trick to deal with the non-linearity issue

Data dimensionality We propose effective incremental dimension reduction

technique


The 2-Tier Framework

x~Tier1:

Stream Segmentation

Tier 2:LDS Projection & Update

Input Space LDS

C9

C2

C6

C1

C5

C3

C4

C7

C8

Adaptive Non-linear Clustering

x

Latest point received from the stream 2-Tier clustering module

(uses the kernel trick)Fading Clusters

d-dimensional

q-dimensionalq < d


The Fading Cluster MethodologyEach cluster Ci, has a recency value Ri s.t. Ri = f(t-tlast), where, t: current time tlast: last time Ci was updated f(t) = e- t

: fading factor A cluster is erased from memory (faded) when Ri · h,

where h is a user parameter controls the influence of historical data Total number of clusters is bounded


Non-linearity in Data

Spectral clustering methods likely to perform better

Traditional clustering techniques (k-means) do not perform well

Feature space mapping

Input SpaceFeature Space


Non-linearity in Network Intrusion Data

Input Space Feature Space

“ipsweep” attack data

Geometrically well-behaved

trend

Use kernel trick?


The Kernel TrickActual projection in higher dimension is

computationally expensiveThe kernel trick does the non-linear projection

implicitly!

Given two input space vectors x,y k(x,y) = <(x),(y)>

Kernel Function

Gaussian kernel functionk(x,y) = exp(-||x-y||2) used in the previous example !


:x = (x1, x2) → (x) = (x1

2, x22, √2x1x2)

<(x),(z)> = <(x12, x2

2, √2x1x2), (z12, z2

2, √2z1z2)>,

= x12z1

2 + x22z2

2 + 2x1x2z1z2,

= (x1z1 + x2z2)2,

= <x,z>2.k(x,z) = <x,z>2

Kernel Trick - Working ExampleNot required explicitly!

“Kernel trick allows us to make operations in high-dimensional feature space using a kernel function but without explicitly representing “


Stream Clustering – New Challenges One-pass restriction and limited memory constraint

We use the fading cluster technique proposed by Aggarwal et. al.

Non-linear separation boundaries We propose using kernel methods to deal with the non-

linearity issue Data dimensionality

We propose effective incremental dimension reduction technique


Dimensionality ReductionPCA like kernel method desirable

Explicit representation – EVD preferred KPCA is computationally prohibitive - O(n3) The principal components evolve with time –

frequent EVD updates may be necessaryWe propose to perform EVD on grouped-data

instead point-dataRequires a novel kernel method


The 2-Tier Framework

x~Tier1:

Stream Segmentation

Tier 2:LDS Projection & Update

Input Space LDS

C9

C2

C6

C1

C5

C3

C4

C7

C8

Adaptive Non-linear Clustering

x

Latest point received from the stream 2-Tier clustering module

(uses the kernel trick)Fading Clusters

d-dimensional

q-dimensionalq < d


The 2-Tier Framework …Tier 1 captures the temporal locality in a segment

Segment is a group of contiguous points in the stream geometrically packed closely in the feature space

Tier 2 adaptively selects segments to project data in LDS

Selected segments are called representative segments Implicit data in the feature space is projected explicitly

in LDS such that the feature-space distances are preserved


Update cluster centers and recency values. Delete faded clusters Create new

cluster with x

Assign x to its nearest cluster

Clear contents of S

Add S in memoryand update LDS

Is S a representative

segment?

Is close to an active cluster?

x~

Obtain in LDSx~

YES

YES

NO

YES

NO

TIER 2

Obtain a point x From the stream

Add x to S

Is ((x) novel w.r.t S and s > smin)

OR is s = smax?

NO

TIER 1

The 2-Tier Framework …


Network Intrusion Stream

• Simulated data from MIT Lincoln Labs.• 34 Continuous Attributes (Features)• 10.5 K Records• 22 types of intrusion attacks + 1 normal class



Clustering accuracy at LDS dimensionality u=10


Efficiency - EVD Computations

Newswire data3.8K Records, 16.5K Features, 10 news topics

Image data5K Records, 576 Features, 10 digits


In Retrospect…We proposed an effective stream clustering

frameworkWe use the kernel trick to delineate non-linear

boundaries efficientlyWe use stream segmentation approach to

continuously project data into a low dimensional space




Contributions Towards Stream Mining Stream data clustering Bayesian reasoning sensor steam processing



Bayesian Reasoning for Sensor Data ProcessingUsers submit queries

with precision constraintsResource conservation is

of prime concern to prolong system life

Data acquisition Data communication

“Find the temperature with 80% confidence

Use probabilistic models at central site for approximate predictions preventing actual acquisitions


Dependencies in Sensor Attributes

Get Temperature

Attribute Acquisition Cost

Temperature 50 J

Voltage 5 J

Dependency Model

Temperature

Voltage

Acquire Voltage!

Report TemperatureBayesian Networks

Get Voltage


Using Correlation Models [Deshpande et al.]

Correlation models ignore conditional dependency

Humidity [35-40)

Intel Lab ( Real Sensor network data)Attributes: Voltage (V), Temperature (T), Humidity (H)

“voltage” is conditionally independent of “temperature”, given “humidity” !

“voltage” is correlated with “temperature”

Deshpande et al. [VLDB’04]


BN vs. Correlations

NDBC Buoy DatasetBayesian Network•Maintains vital dependencies only•Lower search complexity O(n)•Storage O(nd), d: avg. node degree•Intuitive dependency structure

Correlation model [Deshpande et. al.]•Maintains all dependencies•Search space of finding best possible alternative sensor attribute is high•Joint probability is represented in O(n2) cells

Intel Lab. Dataset


Bayesian Networks (BN)Qualitative Part – Directed Acyclic Graph (DAG)

• Nodes – Sensor Attributes• Edges – Attribute influence relationship

Quantitative Part – Conditional Probability Table (CPT)• Each node X has its own CPT , P(X|parents(X))

Together, the BN represent the joint probability in factored from: P(T,H,V,L) = P(T)P(H|T)P(V|H)P(L|T)

The “influence relationship” is represented by conditional entropy function H. H(Xi)=k

l=1 P( Xi = xil )log(P( Xi = xil ))

We learn the BN by minimizing H(Xi| Parents(Xi)).


System Architecture

Group-query Plan Generation

Bayesian Inference Engine

Sensor Network

Acquisition Values

Acquisition PlanGroup Query (Q)

{(Wind Speed, 75%)}

{(Temperature, 95%),(Wind Speed, 85%)}

{(Air Pressure, 90%),(Wind Speed, 90%)}

{(Temperature, 80%)}

Query Processor

Storage

CPTsCost

BN

0.4C3

0.01C4

0.3C2

0.5C1

0.20.60.040.1

X1

X2

X5

X3

X4

X6

0.050.050.30.6


Finding the Candidate AttributesFor any attribute in the group-query Q, analyze

candidates attributes in the Markov blanket recursively

Selection criterion

Select candidates in a

greedy fashion

Acquisition cost

Information Gain (Conditional Entropy)

Meet precision constraints

Maximize resource conservation


Experiments – Resource Conservation

NDBC dataset, 7 attributes

Effect of using group-queries, |Q| - Group-query size

Effect of using MB Property with min = 0.90


Wave Period (WP)

Wind Speed (SP)

Water Temperature (WT)

Wind Direction (DR)

Wave Height (WH)

Air Temperature (AT)

Air Pressure (AP)

Results - Selectivity


In Retrospect…Bayesian networks can encode the sensor

dependencies effectivelyOur method provides significant resource

conservation for group-queries


Contribution Summary “Adaptive Stream resource management using Kalman Filters.” [SIGMOD’04]

“Adaptive sampling for sensor networks.” [DMSN’04]

“Adaptive non-linear clustering for Data Streams.” [CIKM’06]

“Using stationary-dynamic camera assemblies for wide-area video surveillance and selective attention.” [CVPR’06]

“Filtering the data streams.” [in submission]

“Efficient diagnostic and aggregate queries on sensor networks.” [in submission]

“OCODDS: An On-line Change-Over Detection framework for tracking evolutionary changes in Data Streams.” [in submission]


Future WorkDevelop non-linear techniques for capturing

temporal correlations in data streamsThe Bayesian framework can be extended to

address “what-if” queries with counterfactual evidence

The clustering framework can be extended for developing stream visualization systems

Incremental EVD techniques can improve the performance further


Thank You !


BACKUP SLDIES!


Back to Stream ClusteringWe propose a 2-tier stream clustering

framework Tier 1: Kernel method that continuously divides the

stream into segments Tier 2: Kernel method that uses the segments to

project data in a low-dimensional space (LDS)The fading clusters reside in the LDS


Clustering – LDS Projection


Clustering – LDS Update



Clustering accuracy at LDS dimensionality u=10

Cluster strengths at LDS dimensionality u=10


Effect of dimensionality


Query Plan Generation Given a group query, the query plan computes

“candidate attributes” that will actually be acquired to successfully address the query.

We exploit the “Markov Blanket (MB)” property to select candidate attributes.

Given a BN G, the Markov Blanket of a node Xi comprises the node, and its immediate parent and child.

))((

))(,())(|()|(

i

iiiii XMBP

XMBXPXMBXPGXP


Exploiting the MB Property“Given a node Xi and a set of arbitrary nodes Y in a BN s.t. MB(Xi) µ Y [ Xi), the conditional entropy of Xi given Y is at least as high as that given its Markov blanket or H(Xi|Y) ¸ H(Xi|MB(Xi)).”

Proof: Separating MB(Xi) into two parts MB1 = MB(Xi) [ Y and MB2 = MB(Xi) - MB1 and denoting Z = Y – MB(Xi):

H(Xi|Y) = H(Xi|Z,MB1) Y = Z [ MB1

¸ H(Xi|Z,MB1,MB2) Additional information cannot

increase entropy = H(Xi|Z, MB(Xi)) MB(Xi) = MB1|MB2

= H(Xi|MB(Xi)) Markov-blanket definition


Bayesian Reasoning -More Results…

Effect of using MB Property with min = 0.90

Query answer Quality loss50-node Synth. Data BN


Bayesian Reasoning for Group Queries

More accurate in addressing group queries Q = { (Xi, i)|Xi 2 X Æ (0 < i ·1) Æ 1 · i · n) } s.t. i

<maxl P(Xi = xil)} X ={X1,X2 ,X3,…, Xn} Sensor attributes

i Confidence parameters

P(Xi = xil) Probability with which Xi assumes the value of xil

Bayesian reasoning is helpful in detecting abnormalities


Bayesian Reasoning – Candidate attribute selection algorithm

defense_aj.ppt

Documents

mining data streams

data streamsstreaminer

data streamsclustering

data streamsnonlinearity

data streamskernel trick

data streamsthesis outline

boundedstatistical mining

stream segmentationtier