adaptive parallelism for web search

1

Adaptive Parallelism for Web Search

Myeongjae JeonRice University

In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice), and Scott Rixner (Rice)

2

Performance of Web Search

1) Query response time– Answer quickly to users– Reduce both mean and high-percentile latency

2) Response quality (relevance)– Provide highly relevant web pages– Improve with resources and time consumed

3

All web pages

How Microsoft Bing Works

Query

- All web pages are partitioned across index servers- Distributed query processing (embarrassingly parallel)- Aggregate top K relevant pages

Partition Partition Partition Partition Partition Partition

IndexServer

IndexServer

IndexServer

IndexServer

IndexServer

IndexServer

Aggregator

Top K pages

Top K pages

Top K pages

Top K pages

Top K pages

Top Kpages

4

Our Work

• Multicore index server– Multiple queries executed concurrently– Each query executed sequentially

• Optimization opportunity– Low CPU utilization– Many idle cores

• Contributions– Parallelize a single query– Reduce response times

IndexServer

Query 1Query 2Query 3

5

Outline

1. Query parallelism– Run a query with multiple threads

2. Adaptive parallelism– Select degree of parallelism per query

3. Evaluation

6

Query Processing and Early Termination

• Processing “Not evaluated” part is useless– Unlikely to contribute to top K relevant results

Inverted index“EuroSys”

Processing Not evaluated

Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N

Docs sorted by static rankHighest LowestWeb

documents

……. …….

7

Assigning Data to Threads• Purpose: data processing similar to sequential execution

Highest rank Lowest rank

Sorted web documents

Sequential execution

T1 T2

• Key approach: dynamic alternating on small chunks

T1 T2 T2 T2 T1 T2 T1 T2

87654321

8

Share Thread Info Using Global Heap

• Share information to reduce wasted execution• Use a global heap and update asynchronously• Small sync overhead by batched updates

Thread 1 (local top k)

Global heap (global top K) Sync Sync

Thread 2 (local top k)

9

Outline

1. Query parallelism– Run a query with multiple threads

2. Adaptive parallelism– Select degree of parallelism per query

3. Evaluation

10

Key Ideas

• System load– Parallelize a query at light load– Execute a query sequentially at heavy load

• Speedup– Parallelize a query more aggressively if parallelism

shows a better speedup

11

No Load Heavy Load

Query 1Query 4Query 5Query 6

Query 1Query 2Query 3

12

No Speedup Linear Speedup

Query Query

1 2 3 4 5 61

3

5

# Threads

Spee

dup

1 2 3 4 5 60

204060

# Threads

Exec

. tim

e T1

1 2 3 4 5 61

3

5

# Threads

Spee

dup

1 2 3 4 5 60

204060

# Threads

Exec

. tim

e T1

T6

13

Speedup in Reality

• Mostly neither no speedup nor linear speedup

1 2 3 4 5 60

20406080

# Threads

Exec

. Tim

e (m

s)

1 2 3 4 5 6123456

Linear speedupBingNo speedup

# Threads

Spee

dup

14

Adaptive Algorithm• Decide parallelism degree at runtime– Pick a degree (p) that minimizes response time

min(Tp + K Tp p / N)

K: system load (queue length)

Tp: Execution time with parallelism degree p

My execution time

Latency impact on waiting queries

15

Experimental Setup

• Machine setup– Two 6-core Xeon

processors (2.27GHz)– 32GB memory

• 22GB dedicated to caching

– 90GB web index in SSD

• Workload– 100K Bing user queries

• Experimental system– Index server– Client

• Replay obtained queries• Poisson distribution• Varying arrival rate (query

per second)

16

Mean Response Time- Fixed Parallelism -

• No fixed degree of parallelism performs well for all loads

10 20 30 40 50 60 70 80 90 10030405060708090

100

Degree = 1Degree = 2Degree = 3Degree = 6

System Load (QPS)

Mea

n Re

spon

se

Tim

e (m

s)

17

Mean Response Time- Adaptive Parallelism -

10 20 30 40 50 60 70 80 90 10030405060708090

100

Degree = 1Degree = 2Degree = 3Degree = 6Adaptive

System Load (QPS)

Mea

n Re

spon

se

Tim

e (m

s)

• Lower than all other fixed degrees

18


10 20 30 40 50 60 70 80 90 10030405060708090

100


System Load (QPS)

Mea

n Res

pons

e Tim

e (m

s)

1 2 3 4 5 60

10

20

30

40

50

60

0.537.09

40.88

51.04

0.330.14

Parallelism Degree

Fra

cti

on

(%

)

- Select any degree among all possible options- Parallelism degrees are utilized unevenly to produce the best performance

19


• Much lower than sequential execution

10 20 30 40 50 60 70 80 90 10030405060708090

100

Degree = 1Adaptive

System Load (QPS)

Mea

n Re

spon

se

Tim

e (m

s)

Interesting range

47%

20

95th-Percentile Response Time

• Similar improvements in 99% response time

10 20 30 40 50 60 70 80 90 100100

150

200

250

300

350

400


System Load (QPS)

95%

Res

pons

e Ti

me

(ms) 52%

21

More Experimental Results

• Quality of responses– Equivalent to sequential execution

• Importance of speedup– Compare to system load only– Using both system load and speedup provides up

to 17% better for the average

22

Conclusion

1. Effectively parallelize search server– Parallelize a single query with multiple threads– Adaptively select parallelism degree per query

2. We improve response time by:– 47% on average, 52% for the 95th-percentile– Same quality of responses

3. In Bing next week!

adaptive parallelism for web search

Documents

concurrentlyeach query

fixed degree of parallelism

response time mintp

caching90gb web index

relevant web pagesimprove

doc ndocs

tp p n

data processing similar