finding near-optimal configurations in product lines by ... · finding near-optimal configurations...

26
Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 Don Batory Margaret Myers Norbert Siegmund

Upload: others

Post on 25-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Finding Near-Optimal Configurations in Product Lines by Random Sampling

Jeho Oh

1

Don Batory Margaret Myers Norbert Siegmund

Page 2: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Quickly find SPL configurations with

near-optimal performance for a given workload

• Configuration space is often huge: 𝑛 features ≤ 2𝑛 configurations

( 273 optional features: 1082 products, one for every atom in universe )

• Searching for the optimal configuration is daunting,

as benchmarking all configurations is infeasible

• Find a way to get good enough configurations with practical effort

2

I want a hybrid car with laser headlight, but cheap and light as possible.

Now 273/275 options left to decide…

Page 3: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Big Picture

3

randomly sample on configuration space for

near-optimal configurations

featuremodel

user-imposedfeature constraints

near-optimalperforming

configuration

learnperformance

model

useoptimizer

sampleconfigurations

Performance Model Approach

Our Approach

Page 4: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Contributions

• Allow true random sampling of configurations

• Provide statistical bounds on searching by sampling

• Directly search the space for any given workload

4

Search by Random Samplingwhich we describe next…

Page 5: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

• To randomly sample configurations

from uniform distribution:

– Identify valid configuration space

– Select a random number in [ 1, total # of configs ]

– Return the configuration with matching number

• Binary Decision Diagram (BDD):

– Compile prop. formula into graph structure

– Derive all possible solutions (configs)

– Allows efficient sampling from traversing BDD

Random Sampling with BDD

5

(𝑟𝑜𝑜𝑡 ↔ 𝑡𝑟𝑢𝑒)∧ 𝑟𝑜𝑜𝑡 ↔ 𝐴∧ (𝐵 → 𝑟𝑜𝑜𝑡)

∧ 𝐶 ↔ 𝐴 ∧ ¬𝐷

∧ 𝐷 ↔ 𝐴 ∧ ¬𝐶

∧ (𝐷 → 𝐵)

𝑟𝑜𝑜𝑡

𝐴 𝐵

𝐶 𝐷

𝐷 implies 𝐵

Feature Model Prop. Formula

BDD

𝑟𝑜𝑜𝑡

𝐴

0 1

𝐵

𝐶

𝐷 𝐷

0 1

2

2

0 1

1

0

03

0 3

Randomly sample from space of all valid

configurations, not space of all features

Page 6: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

• 𝑛 random numbers over unit range [0,1], 𝑥 as the number closest to 1

Analyze the distance between 𝑥 and 1, (1 − 𝑥)

• C.D.F. of 𝑥 over 𝑛: 𝑝𝑛 𝑋 ≤ 𝑥 = 0𝑥𝑛 ⋅ 𝑥𝑛−1 ⋅ 𝑑𝑥 = 𝑥𝑛

• Average distance from 𝑥 to 1: 𝐸𝑛 = 011 − 𝑥 ⋅ 𝑛 ⋅ 𝑥𝑛−1 ⋅ 𝑑𝑥 =

𝟏

𝒏+𝟏

(Equations for sampling over discrete space on Section 3.3 of the paper)

Statistics of Random Sampling (1)

6

0 10 1

𝑥 = 0.5

𝐸1 = 0.5

0 1

𝑥 = 0.67

𝐸2 = 0.33

0 1

𝑥 = 0.75

𝐸3 = 0.25

0 1

𝐸4 = 0.20

𝑥 = 0.80

Statistical regularity: 𝐸𝑛 =1

𝑛+1with bounds 𝜎𝑛 ≈

1

𝑛+1

Page 7: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

• Correspondences:

– Selection of numbers in [ 0,1 ] ► Selection in [ 1, total # of configs ]

– Closeness to 1 ► Closeness to the best configuration, Ω

Statistics of Random Sampling (2)

7

On average, sample with the best performance has

top 𝟏𝟎𝟎

𝒏+𝟏% performance among all configurations

200

210

220

230

240

250

260

270

0 300 600 900

LLVM (Compiler infrastructure, 11 features, 1024 configs)

𝛀𝟎

𝐸1 =Ω − 𝑥

1024= 0.50𝐸3 =

Ω − 𝑥

1024= 0.25𝐸7 =

Ω − 𝑥

1024= 0.125

Per

form

an

ce

Sorted configurations

Sorted config space

Best performing config

Monotonic graph:

Closer to Ω (x-axis),

better perf. (y-axis)

Page 8: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Statistical Recursive Searching (SRS)

• Can search better than 99 samples for 1% by random sampling

• Feature selection:

– Makes some configs perform better

– Constrict config space

• Search within smaller and better

config space by feature selection

• Find most influential features by:

– Performance difference

– Welch’s t-Test

8

Use samples to recursively constrict the search space

200

210

220

230

240

250

260

270

0 200 400 600 800 1000

LLVM

Per

form

an

ce

Sorted configurations

Page 9: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Statistical Recursive Searching (SRS)

• Can search better than 99 samples for 1% by random sampling

• Feature selection:

– Makes some configs perform better

– Constrict config space

• Search within smaller and better

config space by feature selection

• Find most influential features by:

– Performance difference

– Welch’s t-Test

9

Use samples to recursively constrict the search space

200

210

220

230

240

250

260

270

0 200 400 600 800 1000

LLVM

Per

form

an

ce

Sorted configurations

Page 10: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Statistical Recursive Searching (SRS)

• Can search better than 99 samples for 1% by random sampling

• Feature selection:

– Makes some configs perform better

– Constrict config space

• Search within smaller and better

config space by feature selection

• Find most influential features by:

– Performance difference

– Welch’s t-Test

10

Use samples to recursively constrict the search space

200

210

220

230

240

250

260

270

0 200 400 600 800 1000

LLVM

Per

form

an

ce

Sorted configurations

Page 11: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Statistical Recursive Searching (SRS)

• Can search better than 99 samples for 1% by random sampling

• Feature selection:

– Makes some configs perform better

– Constrict config space

• Search within smaller and better

config space by feature selection

• Find most influential features by:

– Performance difference

– Welch’s t-Test

11

Use samples to recursively constrict the search space

200

210

220

230

240

250

260

270

0 200 400 600 800 1000

LLVM

Per

form

an

ce

Sorted configurations

Page 12: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Statistical Recursive Searching (SRS)

• Can search better than 99 samples for 1% by random sampling

• Feature selection:

– Makes some configs perform better

– Constrict config space

• Search within smaller and better

config space by feature selection

• Find most influential features by:

– Performance difference

– Welch’s t-Test

12

Use samples to recursively constrict the search space

200

210

220

230

240

250

260

270

0 200 400 600 800 1000

LLVM

Per

form

an

ce

Sorted configurations

4 features excluded:

- 1/16 of entire space,

- Better overall performance

Page 13: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

EVALUATION

13

ICSE 2012

Page 14: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Evaluation Method

• Use ground truth data from Siegmund et al. (http://fosd.de/SPLConqueror)

• Measured accuracy of search:

– 𝜹𝒙 : % of configurations better than the best config found so far

– 𝜹𝒚 : % performance difference to Ω

• Averaged from 100 searches per different conditions

• Full result is available in Section 5 of the paper

14

SPL Type # features # configs Performance

LLVM Compiler infrastructure 11 1024 Test suite compilation time

BerkeleyDBC Database system 18 2560 Benchmark response time

X264 Video encoder 16 1152 Video encoding time

Apache Web server 9 192 Maximum server load

Page 15: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

𝜹𝒙: Theory vs. Actual• 𝛿𝑥 from randomly sampling different # of samples

vs. theoretical value derived using same # of samples

15

Theory matches observations

0%

1%

2%

3%

4%

5%

6%

7%

8%

9%

10%

10 20 30 40 50 60 70 80 90 100

LLVM

X264

BerkeleyDBC

Apache

Theoretical

Apache_Theoretical

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

# of random samples (𝒏)

𝑬𝒏 (Eq. (9))

𝑬𝟏𝟗𝟐,𝒏 (Eq. (7), Apache)

Page 16: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

𝜹𝒙 : SRS vs. Random Sampling (non-recursive)

• 𝛿𝑥 of SRS over different # of samples per recursion

vs. theoretical 𝛿𝑥 of random sampling with same total # of samples (𝑁)

16

0%

2%

4%

6%

8%

10%

12%

5 15 25 35

LLVM

0%

2%

4%

6%

8%

10%

12%

14%

16%

5 15 25 35

BerkeleyDBC

0%

2%

4%

6%

8%

10%

12%

14%

5 15 25 35

X264

𝛿𝑥 (Measured)

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

# of samples per recursion (𝒏)

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

# of samples per recursion (𝒏)

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

# of samples per recursion (𝒏)

SRS is more efficient than random sampling alone

𝐸𝑁 (Theoretical)

Page 17: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

0 20 40 60 80 100 120 140 160 180

BerkeleyDBC

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

0 20 40 60 80 100

LLVM

𝜹𝒚: SRS vs. Performance Models (1)

• 𝛿𝑦 over total # of samples in SRS vs.

𝛿𝑦 of perf. models assuming an ideal optimizer (always finds best predicted config)

17

SRSSarkar2015 Siegmund2012

SRS needs many fewer samples for same accuracy, and

yields much better accuracy for same number of samples

Per

form

an

ce d

iffe

ren

ce t

o 𝛀

(𝜹𝒚)

Total # of samples (𝑵)P

erfo

rman

ce d

iffe

ren

ce t

o 𝛀

(𝜹𝒚)

Total # of samples (𝑵)

Page 18: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

0 20 40 60 80 100

160%

162%

164%

X264

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

0 20 40 60 80 100 120 140 160 180

BerkeleyDBC

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

0 20 40 60 80 100

LLVM

𝜹𝒚: SRS vs. Performance Models (2)

• 𝛿𝑦 over total # of samples in SRS vs.

𝛿𝑦 of performance models assuming an ideal optimizer

18

SRSSarkar2015 Siegmund2012

In SRS, more samples yields better accuracy

Per

form

an

ce d

iffe

ren

ce t

o 𝛀

(𝜹𝒚)

Total # of samples (𝑵)

Per

form

an

ce d

iffe

ren

ce t

o 𝛀

(𝜹𝒚)

Total # of samples (𝑵)

Per

form

an

ce d

iffe

ren

ce t

o 𝛀

(𝜹𝒚)

Total # of samples (𝑵)

Page 19: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

𝜹𝒙: Scalability of Searching• Combine SPLs to simulate

larger configuration spaces

• Measure 𝛿𝑥 for SRS and

non-recursive searching

19

𝐸𝑁

0%

2%

4%

6%

8%

10%

12%

5 15 25 35

Apache×X264×LLVM×BerkeleyDBC

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

# of samples per recursion (𝒏)

𝛿𝑥

Accuracy is independent of the size of the configuration space

Combined Systems # of Features # of Confgs.

Apache × LLVM × BerkeleyDBC 38 503,316,480

Apache × X264 × BerkeleyDBC 51 566,231,040

LLVM × Apache × X264 × BerkeleyDBC 62 579,820,584,960

# of random samples

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

30 50 70 90 110

Theory vs. Actual

ApacheH264

BerkeleyDBC

ApacheLLVM

BerkeleyDBC

E_n (Eqn. (9))𝑬𝒏 (Eqn. (9))

0%

2%

4%

6%

8%

10%

12%

5 15 25 35

Apache × LLVM × BerkeleyDBC

0%

2%

4%

6%

8%

10%

12%

5 15 25 35

Apache × X264 × BerkeleyDBC

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

# of samples per recursion (𝒏)

Dis

tan

ce t

o 𝛀

(𝜹𝒙)

# of samples per recursion (𝒏)

Page 20: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Conclusions and Contributions

1. True random sampling of configuration spaces

2. Guaranteed tight statistical bounds on finding good configurations

3. Can recursively search through configuration space for more efficient searching

4. Scalable search method - accuracy independent of the configuration space size

20

Page 21: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Thank You !Finding Near-Optimal Configurations in Product Lines by Random Sampling

Jeho Oh, Don Batory, Margaret Myers, Norbert Siegmund

21

Page 22: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Supplemental Slides from Now On

22

Page 23: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Statistics of Random Sampling

• 𝑛 random numbers over unit range [0,1], 𝑥 is the number closest to 1Analyze the distance between 𝑥 and 1, (1 − 𝑥)

• C.D.F. of 𝑥 over 𝑛:

𝑝𝑛 𝑋 ≤ 𝑥 = 0𝑥𝑛 ⋅ 𝑥𝑛−1 ⋅ 𝑑𝑥 = 𝑥𝑛

• Average distance from 𝑥 to 1:

𝐸𝑛 = 011 − 𝑥 ⋅ 𝑛 ⋅ 𝑥𝑛−1 ⋅ 𝑑𝑥 =

𝟏

𝒏+𝟏

• Standard deviation:ത𝐸𝑛 = 0

11 − 𝑥 2 ⋅ 𝑛 ⋅ 𝑥𝑛−1 ⋅ 𝑑𝑥 =

1

(𝑛+1)(𝑛+2)

𝜎𝑛 = ത𝐸𝑛 − 𝐸𝑛2 =

𝟏

(𝒏+𝟏)(𝒏+𝟐)−

𝟏

𝒏+𝟏 𝟐

23

0%

2%

4%

6%

8%

10%

10 20 30 40 50 60 70 80 90 100

En

sn

𝐸𝑛

𝜎𝑛

Dis

tan

ce t

o 𝟏

# of random numbers

Page 24: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

PCS Graphs of Real Systems

• All configuration measurements sorted by performance (descending order)

24

0.8K

1.3K

1.8K

2.3K

0 50 100 150

Apache (9, 192)

0

10

20

30

40

50

0 500 1000 1500 2000 2500

BerkeleyDBC (18, 2560)

200

400

600

800

0 200 400 600 800 1000

X264 (16, 1152)

200

220

240

260

0 200 400 600 800 1000

LLVM (11, 1024)

Sorted configurations

Per

form

ance

Sorted configurations

Per

form

ance

Sorted configurations

Per

form

ance

Sorted configurations

Per

form

ance

* System Name (# of features, # of configs)

Page 25: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

Statistical Recursive Searching(SRS)

25

Use samples to recursively reduce the config space to focus

0

5

10

15

20

25

30

35

40

45

0 500 1000 1500 2000 2500

BerkeleyDBC

Configurations sorted by performance

Per

form

ance

PS32K, HAVE_HASHPS32K, ¬HAVE_HASH

PS16K, HAVE_HASHPS16K, ¬HAVE_HASH

PS8K, ¬ HAVE_HASHPS8K, HAVE_HASH

PS4K, ¬ HAVE_HASHPS4K, HAVE_HASH

PS1K, ¬ HAVE_HASHPS1K, HAVE_HASH

¬ HAVE_CRYPTOHAVE_CRYPTO

Page 26: Finding Near-Optimal Configurations in Product Lines by ... · Finding Near-Optimal Configurations in Product Lines by Random Sampling Jeho Oh 1 ... but cheap and light as possible

1. Random sample from config space

2. From best 2 configs, get common decisions 𝑑

3. For each common decision 𝑑, measure:

𝛿𝑑 = Avg. perf. of samples with 𝑑

𝛿¬𝑑 = Avg. perf. of samples without 𝑑

Δ𝑑 = 𝛿𝑑 − 𝛿¬𝑑

4. If a Δ𝑑 indicates performance improvement,

perform Welch’s T-test to evaluate its certainty

5. Use features certain to improve performance

to constrain the configuration space to recurse

Finding Features to Recurse

26

Decisions to recurse: 𝑓1

𝜙 = 𝐹𝑀

𝑐2 𝑐1Ω0

𝑐2 = {𝑓1, 𝑓2, ¬𝑓3, 𝑓4, ¬𝑓5, 𝑓7, ¬𝑓8}

𝑐1 = {𝑓1, 𝑓2, 𝑓3, ¬𝑓4, 𝑓5, 𝑓7, ¬𝑓8}

𝑑 = {𝑓1, 𝑓2, 𝑓7, ¬𝑓8}

Common Decisions 𝑓1 𝑓2 𝑓7 ¬𝑓8

Δ𝑑 improves perf. Yes No No Yes

𝜙 = 𝐹𝑀 ∧ 𝑓1Ω0

Welch’s T-test Pass - - Fail