race: time series compression with rate adaptivity and error bound for sensor networks huamin chen,...

RACE: Time Series Compression with Rate Adaptivity and Error Bound for Sensor Networks

Huamin Chen, Jian Li, and Prasant Mohapatra

Presenter: Jian Li

Networks Lab @ UC Davis [email protected]

Agenda

Motivation

Background

RACE Algorithm

Numerical Evaluation

Conclusion


Motivation

Sensor Networks Limited energy source

Limited link bandwidth, may be time-varying

Monitoring processContinuous data generation and dissemination

Data rate may be large, and time-varying

How to disseminate efficiently?Compression and aggregation


Data Quality: Impact factors

Sampling frequency

Number of sampling nodes

Data dissemination Compression

Aggregation


Why Compress?

How to get “properly small” data rate? Lower sampling frequency

Reduce the number of sensors

Lossy/lossless compression

Low sampling frequency is not equivalent to (lossy) compression of higher-precision raw data. E.g.: whether detailed features along timeline can be retained?

Lossy compression is able to adapt to various link constraints.


But, how about Error Bound?

Volatile physical process Data rate of time series could vary in a large range

Different compressibility at different time instances

Lossy compression cannot guarantee error bound, given a target output data rate

Consistency of data quality? Multihop network transmission

Multiple time series compression


So, Our goal is …

Adaptive compression Compress time series into CBR/LBR flow

Trade-off: network capacity v.s. data quality

Improve data quality Exploit different compressibility along timeline to achieve certain

error bound

Consistency of data quality among multiple time series compression


Error norm of time series

Data Quality: Error Norm

Normalized data element

Normalized data error

ei =


Haar Wavelet Transformation

Compute neighboring elements’ average and difference Average: trend of time series

Difference: details of time series

An example: original time series is [2, 6, 5, 11], we get transformation output [6, -2, -2, -3].


Wavelet coefficient tree

Time series: [3, 4, 3, 2, 6, 8, 9, 7, 2, 3, 1, 2, 10, 8, 7, 9]Output coefficients: [5.25, 0, -2.25, -3.25, 0.5, -0.5, 0.5, 0.5, -0.5, 0.5, -1, 1, -0.5, -0.5, 1, -1]


Data Element Reconstruction

and, Cj is individual coefficient.


Reconstruction: example

Calculation: +(5.25) +(0) -(-2.25) +(-0.5) +(-1) 6


Magnitude-based zeroing

Given a threshold a if coefficient Cj < a, then this coefficient leaf is cut off

and does not participate in reconstruction process.


RACE Algorithm

Generating gradient error tree

Error-based zeroing (i.e., compression process)

Smoothing error bound via patching process


Gradient Error Tree

Gradient Error G(V) V is a coefficient in wavelet coefficient tree

G(V) is defined as the max error that is incurred when the sub-tree rooted from node V is cut off:

Gradient Error Tree Computed from corresponding wavelet coefficient tree


Gradient Error Tree: an exampleTime series: [3, 4, 3, 2, 6, 8, 9, 7, 2, 3, 1, 2, 10, 8, 7, 9]

Coefficients: [5.25, 0, -2.25, -3.25, 0.5, -0.5, 0.5, 0.5, -0.5, 0.5, -1, 1, -0.5, -0.5, 1, -1]


Error based zeroing

Using error bound as threshold value, according to gradient error tree, apply magnitude-based zeroing to wavelet coefficient tree

Use symbol “t” to represent a zero-ed subtree


Error based zeroing

Example: threshold = 2 result in 8 symbols to encode


Error based zeroing

Example: threshold = 4 results in 6 symbols to encode


Important Properties

Error bound additivity Multihop network transmission

Multiple time series aggregation

Patch-ability Exploiting varying compressibility of input stream along timeline

Smoothing error range of output stream


Numerical evaluation

Data set Real world data from TAO project (http://www.pmel.noaa.gov/tao

) Including air temperature and subsurface temperature at

different depths

Air temperature characteristics


Adaptive Compression : Max normalized error


Adaptive Compression:smoothed max normalized error


Preservation of statistical interpretation How well to preserve multivariate correlationship?

Cross correlation between variables x and y is defined as:

Where d is temporal delay between x and y.


Data sets

Subsurface temperatures at depths 25m and 50m


Cross relation under different compression ratios


Conclusion

Rate adaptive compression scheme

Improve error bound, achieving soft guarantee

Preservation of multivariate correlationship

race: time series compression with rate adaptivity and error bound for sensor networks huamin chen,...

Documents

networks lab

uc davis

details of time series

original time series

dissemination data rate

small data rate

data element reconstruction

precision raw data