1 wavelet synopses with error guarantees minos garofalakis phillip b. gibbons information sciences...

Post on 21-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Wavelet synopseswith Error Guarantees

Minos Garofalakis Phillip B Gibbons1048576Information Sciences Research Center Bell Labs Lucent Technologies Murray Hill NJ 07974

ACM SIGMOD 2002

2

Outline

Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions

3

Introduction The wavelet decomposition has demonstrated

the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed ldquowavelet synopsesrdquo) that can be used to provide fast and reasonably accurate approximate answers to queries

Due to exploratory nature of many Decision Support Systems applications there are a number of scenarios in which the user may prefer a fast approximate answer

4

Introduction A major criticism of wavelet-based

techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers

5

Introduction The problem for approximate query

processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees

We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers

6

Introduction The technique is based on probabilistic thre

sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

2

Outline

Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions

3

Introduction The wavelet decomposition has demonstrated

the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed ldquowavelet synopsesrdquo) that can be used to provide fast and reasonably accurate approximate answers to queries

Due to exploratory nature of many Decision Support Systems applications there are a number of scenarios in which the user may prefer a fast approximate answer

4

Introduction A major criticism of wavelet-based

techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers

5

Introduction The problem for approximate query

processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees

We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers

6

Introduction The technique is based on probabilistic thre

sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

3

Introduction The wavelet decomposition has demonstrated

the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed ldquowavelet synopsesrdquo) that can be used to provide fast and reasonably accurate approximate answers to queries

Due to exploratory nature of many Decision Support Systems applications there are a number of scenarios in which the user may prefer a fast approximate answer

4

Introduction A major criticism of wavelet-based

techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers

5

Introduction The problem for approximate query

processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees

We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers

6

Introduction The technique is based on probabilistic thre

sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

4

Introduction A major criticism of wavelet-based

techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers

5

Introduction The problem for approximate query

processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees

We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers

6

Introduction The technique is based on probabilistic thre

sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

5

Introduction The problem for approximate query

processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees

We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers

6

Introduction The technique is based on probabilistic thre

sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

6

Introduction The technique is based on probabilistic thre

sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

12

Probabilistic wavelet synopsesAThe problem with conventional wavelets

d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

13

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

15

Probabilistic wavelet synopses BGeneral Approach

The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

16

Probabilistic wavelet synopses BGeneral Approach

Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

20

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

21

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

23

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

24

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

25

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

26

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

27

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

M[jB] depicted in (11)

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

28

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

29

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

top related