sensor network databases - national tsing hua...

Sensor Network Databases

Chapter 6Feng Zhao

Leonidas J. GuibasWireless Sensor Networks

Outline

Sensor Database ChallengesQuerying the Physical EnvironmentQuery InterfacesHigh-Level Database OrganizationIn-Network AggregationData-Centric StorageData Indices and Range QueriesDistributed Hierarchical AggregationTemporal DataSummary

Sensor Network AbstractionCharacteristics: distributed, resource-constrained, failure prone

From data storage point of view: think of a sensor net as a distributed database

Sensor Network Database Challenges

The sensor network is highly volatile.Nodes may be depleted, and links may go down.

Relational tables are not static.New data is continuously being sensed.

High energy cost of communication.In-networking processing during query execution

The rates at which input data arrives to a database operator can be highly variable.

Sensor Network Database Challenges

Limited storage on sensor nodes.Older data has to be discarded.

Sensor tasking interacts in numerous ways with the sensor database system.Classical metrics of database system performance may have to be adjusted.

Differences in Sensor NetworkDatabases

Sensor Network data inherently include errorsinterference from other signals, device noise Range and probabilistic or approximate queries are more appropriate than exact queries.

Additional operators needed to the query language

specify durations and sampling rates for the dataContinuous, long-running type queries

Ex: monitoring the average temperature in a roomHaving correlating and comparing operators

Querying the Physical Environment

An aggregate queryQuery result is computed by integrating data from a set of sensors.Delivery of data from distributed sensor nodes to a central node for computation.Ex: average , join of sensor readings from different groups.

Correlation Queries“Sound an alarm whenever two sensors within 10 meters of each other simultaneously detect an abnormal temperature.”

Querying the Physical Environment

Snapshot queries“Retrieve the current rainfall level for all sensors in Southern California.”

Historical queries“Display the average rainfall level at all sensors for the last three months of the previous year.”

TinyDB Query interfacesSQL-style querying

long-running monitoring query“For the next three hours, retrieve every 10 minutes the maximum rainfall level in each county in Southern California, if it is greater than 3.0 inches.”SELECT max (Rainfall_level), county

FROM sensors

WHERE state = California

GROUP BY county

HAVING max(Rainfall_Level) > 3.0 in

DURATION [ now, now + 180 min ]

SMAPLING PERIOD 10 min

TinyDB Query interfaces

Cougar Sensor Database

Object-relational databaseSQL-type query interfaceEach type of sensor is associated with an abstract data type (ADT)

Device ADT method represent device functionse.g., getTemperature() ; detectTempGreaterThan(90)

Examples of Long-running queries

CREATE LR_QUERY q1 ASSELECT R.dev, R.dev.getTemperature()FROM TempSensors R, NamedPlaces NWHERE $every(30)

AND R.dev.location().inside(N.bbox)AND N.name = “California”;

CREATE LR_QUERY q2 ASSELECT R1.dev.location()FROM TempSensors R1, TempSensors R2WHERE $every(10)

AND R1.dev.detectAbnormalTemperature()AND R2.dev.detectAbnormalTemperature()AND R1.dev > R2.dev;

Probabilistic Queries

Sensor data is subject to random errors.Sensor data is normally distributed and characterized by a gaussian p.d.f.GADT

An instance of the ADT corresponds to a gaussian p.d.f.Use mean μ and standard deviation σ to represent.Prob is used to pose queries.

Probabilistic Queries“Retrieve from sensors all tuples whose temperature is within 0.5 degrees of 68 degrees, with at least 60 percent probability”Ex: SELECT *

FROM sensorsWHERE Sensor.Temp.Prob([67.5,68.5] >= 0.6)

Centralized approach

Each sensor forwards its data to a central server.

DisadvantagesThe nodes near the access point become traffic hot spots.

Sampling rate have to be set to be the highest burdening the network with unnecessary traffic.

In-network storage approach

Choose rendezvous points to storage data in network. Advantages

The overhead to store and access the data is minimized.The overall load is balanced across the network.

Server-based approachRequire a total of 16 message transmissions

In-Network Aggregation

Each sensor may compute a partial state record based on its data and that of its childrenRequire a total of 6 message transmissions

Aggregation Framework

• As in extensible databases, TinyDB supports any aggregation function conforming to:Aggn={finit, fmerge, fevaluate}

Finit {a0} → <a0>

Fmerge {<a1>,<a2>} → <a12> ->Partial State Record

Fevaluate {<a1>} → aggregate value

Example: Average

AVGinit {v} → <v,1>

AVGmerge {<S1, C1>, <S2, C2>} → < S1 + S2 , C1 + C2>

AVGevaluate{<S, C>} → S/C

Aggregates and their efficiency in TAG

80000

60000

40000

20000

Byte

s Tr

ansm

it ted

/ E

poch

, Al l

sens

ors

COU

NT

MIN

HIS

TOG

RAM

AVER

AGE

MED

IAN

Performance MetricsNetwork usage

Total usage and Hot spot usage

Preprocessing timetime taken to construct an index

Storage space requirementQuery time

time to process a query, assemble an answer, and return this answer.

ThroughputUpdate and maintenance cost

Properties of Sensor DatabasePersistence

Data stored in the system must remain available to queries.

ConsistencyA query must be routed correctly to a node where the data are currently stored.

Controlled access to dataScalability in network size

As the number of nodes increase, the communication cost should not grow unduly.

Load balancingTopological generality

The database architecture should work well on a broad range of network topologies.

Query Processing Scheduling

TinyDB uses an epoch-based mechanism.The epoch should be sufficiently large for data to travel from the leaf to the root.Each epoch is divided into time intervals.The number of intervals reflects the depth of the routing tree.Each node only needs to power up during its scheduled interval.

Schedule of In-Network Aggregation

1 2 3 4 5

4 1

3

2

1

4

1

2 3

4

51

Sensor #

Inte

rval

#

SELECT COUNT(*) FROM sensors

Epoch

Interval 4


Interval 3SELECT COUNT(*) FROM sensors

1 2 3 4 5

4 1

3 2

2

1

4

1

2 3

4

5

2

Epoch

Sensor #

Inte

rval

#



1 2 3 4 5

4 1

3 2

2 1 3

1

4

1

2 3

4

5

Epoch

Sensor #

1 3

Inte

rval

#


5 Interval 1SELECT COUNT(*) FROM sensors

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4

1

2 3

4

5

Epoch

Sensor #

Inte

rval

#



1 2 3 4 5

4 1

3 2

2 1 3

1 5

4 1

1

2 3

4

5

Epoch

Sensor #

Inte

rval

#

1

Data-Centric Storage (DCS)

DCS is a method proposed to support queries from any node in the network by providing a rendezvous mechanism for data and queries.Avoids flooding the entire network.At the center of a DCS system are rendezvous points.DCS distributes the storage load across the entire network.


For example:Geographic hash table (GHT) attempts to distribute data evenly across the network.GHT assumes each node knows its geographic location. (by GPS or…)A data object is associated with a key.Each node is responsible for storing a certain range of keys.

Geographic Hash Table (GHT)

RendezvousEvents are named with keysStorage and retrieval performed using these keysA key is hashed to a geographic position Geographic routing (GPSR) used to locate closest node to this geographic positionThis node serves as a rendezvous for storage and search

CostsNo flooding of queriesAggregate storage cost same as external

Structured ReplicationRendezvous points are replicatedDecreases storage communication costIncreases query dissemination cost

Structured replication in GHT(0,100) (100,100)

Root Point

Level 1 mirror Point

Level 2 mirror Point

(0,0) (100,0)


Reduce unnecessary network trafficHashing to locations respect geographic proximity.Hash to regions rather than to locations to avoid hot spots and increasing robustness.

Trade-offIf the frequency of event generation is high, then pushing data to arbitrary rendezvous points may be too expensive.

Data indices and range queries

It is difficult to serve a range query wellTinyDB aggregation tree require flooding the entire network each queryIndices

Auxiliary data structures to facilitate and speed up the execution of the queryIs useful when the rate of query is high than the rate of update

Indices

Key ideaPre-storing the answers to certain special queries and then delivering the answer to an arbitrary range query

Index structureHash table, k-d tree, quad-tree, R tree,…

Trade-offthe number of pre-stored answers and the speed of query execution.

One-Dimensional Indicess0

s1s2

s3

s4

s5

s6

s7Canonical subsets of sensors along a road

s0 s1 s2 s3 s4 s5 s6 s7

u1 u3

u2

u5 u7

u6

u4

One-Dimensional IndicesWe map logical node ui to physical node si-1Canonical subsets

The nodes with the pre-stored data. s0~s6

Complexity: store O( n ) ; query O( log n )

u1 s0⊕s1

u2 s0⊕s1⊕s2⊕s3

u3 s2⊕s3

u4 s0⊕s1⊕s2⊕s3⊕s4⊕s5⊕s6⊕s7

u5 s4⊕s5

u6 s4⊕s5⊕s6⊕s7

u7 s6⊕s7 (⊕ denotes the aggregation operator )

Multidimensional Indices for Orthogonal Range Searching

Orthogonal range query:Select * from Nestion_Events Where Temperature >= 50 And Temperature <= 60 And Light >= 5 And Light <= 10

10 20 30 40 50 60 70 …

…

50

40

30

20

10

0

Light

Temperature

A k-d tree partitions a plane into rectangles

Drill down the k-d tree with rectangle QWhen reach a node whose corresponding rectangle is disjoint from Q, just stop propagationWhen reach a node whose corresponding rectangle is fully contained in Q, incorporate its count into the events of interestOtherwise, expand a node and continue drilling on its children

A k-d tree partitions a plane into rectangles

Temperature

Ligh

t

Non-orthogonal Range Searching

propagate

propagate

propagate

propagate

propagate

propagate

propagate

Query Range

Distributed Hierarchical Aggregation

Designing a distributed indexLoad-balancing the communication, processing, and storage across the nodes

Robustness considerationFrequent failures of nodes and links

Important to WSN databaseReceive the attention it deserves

Multiresolution Summarization

Wavelet transformsOne way to compress and summarize information for both temporal and spatial signalsData structure

Quad-treeRouting

GPSR + GHTAvoid hot spot

Replication

Partitioning the Summaries

Query start at the root of the summarization treePartition aggregation data in a meaningful way to lessen the load on nodes near the hierarchy rootUse a multi-rooted quad-tree to partition the spatial domainSystem - DIFS

Quad Tree Approach

Quaternary Tree:Each node has 4 children

Each node has 4 histograms summarizing data distribution in each child subtreeQueries only propagate in relevant parts of the tree (pruning)

Quad Tree: Issues

Explicit child pointers required

On storage of new data, update must be propagated up the tree

Every query must originate at tree rootRoot bears greater burden!

DIFS

DIFS stands for distributed index for features in sensor networksGoals

Provide an efficient query mechanism for range searches of event attributesExtend network lifetime by amortizing the costs of communication and storage over as many nodes as possible

Even at expense of modest overall increases

GHT-based Quad TreeWe add an index structure to Structured Replication

Hierarchy of histograms summarizes the range of data within children

Problem: Root is the bottleneck

Every query goes through itInformation from every event that’s generated propagates to it

root pointlevel 1 childrenlevel 2 children

1

10

13

11

5

6

7

8

12

2

15

9 14

3 16

4The DIFS Tree

Every node (except the root) has parentsThe wider the spatial extent an index node knows about, the more constrained the value range it covers

1-49-12

5-813-16

1-49-12

5-813-16

1-49-12

5-813-16

1-49-12

5-813-16

1-16 1-16

1-161-16

1-16 1-16

1-161-16

1-16 1-16

1-161-16

1-16 1-16

1-161-16

1

513

8

2

6

7

4

12

16

15

9 14

3 10

11

1-49-12

5-813-16

1-49-12

5-813-16

1-49-12

5-813-16

1-49-12

5-813-16

1-16 1-16

1-161-16

1-16 1-16

1-161-16

1-16 1-16

1-161-16

1-16 1-16

1-161-169

0 100

0

StorageExample: Event with “temperature” equal to 9 generated at location (68,61) Compute geographically bounded hash

“temperature:1:16” in (50,50)->(75,75)“temperature:9:12” in (50,50)->(100,100)“temperature:9:9” in (0,0)->(100,100)

Periodically propagate up the tree

100

DIFS Hierarchy

Fractional Cascading

Sensor pA sensor p’s view of the worldLeaves of the quad-tree

Locality-Preserving Hashing

Goal:Have a way to map that attribute space to the plane so that nearby locations in attribute space correspond to nearby locations in the plane

DIM (distributed index for multidimensional data)Data with values close to one another are hashed to locations nearbyZone code - zone unique identify

DIM - zone tree & zone code

a

b

c

de

g

f

c

f g

d ea b

1

1

1 1

1

1

1

0

0

0

0

0 0

0

00

010 011 100 101 110

1110 1111

00

010

011

100101

110

1110

1111

Temporal Data

Overall node storage is very limitedWe might query about the past, the present, or the futureData Aging

Application-dependentSchedule for discarding data and data summaries

Indexing Motion Data

A fixed index structure will soon be obsolete, because of heavy update and communication costBoth the index construction and updates can be quite expensiveModify only when new objects are inserted or deleted, or when the trajectory of an object changes

KDS (Kinetic Data Structure)

Update only when certain critical events occurDrawback

It may lead to waste processing during periods of inactivity, when no queries are present in the network, because the index require to be updated as time goes on

These updates need not to be so frequent if the motion predictions are accurate

Summary

This area is still in its infancy, much more needs to be doneAs we remarked, integration of query processing with the networking layer, the mapping of index structures to the spatial topology of the network, and distributed index construction for motion data all remain important topics for further investigation

sensor network databases - national tsing hua...

Documents