web search using mobile cores

38
Web Search Using Mobile Cores Quantifying and Mitigating the Price of Efficiency Vijay Janapa Reddi Engineering & Applied Science Harvard University Benjamin Lee Electrical Engineering Stanford University Trishul Chilimbi Runtime Analysis & Design Microsoft Research Kushagra Vaid Global Foundation Services Microsoft Corporation B.C. Lee 1

Upload: others

Post on 03-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Search Using Mobile Cores

Web Search Using Mobile CoresQuantifying and Mitigating the Price of Efficiency

Vijay Janapa ReddiEngineering & Applied Science

Harvard University

Benjamin LeeElectrical Engineering

Stanford University

Trishul ChilimbiRuntime Analysis & Design

Microsoft Research

Kushagra VaidGlobal Foundation Services

Microsoft Corporation

B.C. Lee 1

Page 2: Web Search Using Mobile Cores

Conventional Wisdom

◦ Moore’s Law provides transistors◦ Simple cores improve energy efficiency

◦ Parallelism recovers lost performance

B.C. Lee 2

Page 3: Web Search Using Mobile Cores

Simple Cores

◦ Pursue aggregate throughput, energy efficiency◦ Assume task parallelism

◦ Assume latency tolerance

B.C. Lee 3

Page 4: Web Search Using Mobile Cores

Data Deluge

◦ Data centers are not keeping up

◦ Data is not information, value

“Data, data everywhere.” The Economist, 2010.

B.C. Lee 4

Page 5: Web Search Using Mobile Cores

Value from Data

• Statistical Inference◦ Draw useful information from free data

◦ Useful information applies statistical inference

◦ Free data drawn from Internet’s webpages

• Ex: Language Translation◦ Statistical inference trumps linguistic structure

◦ Early 90’s, IBM French/English with O(1E+6) documents

◦ Presently, Google Translate with O(1E+9) documents

“Clicking for gold.” The Economist, 2010.

B.C. Lee 5

Page 6: Web Search Using Mobile Cores

Computational Intensity◦ Microsoft Bing ranks pages with neural network

◦ RMS foreshadows future analytic workloads

B.C. Lee 6

Page 7: Web Search Using Mobile Cores

Applications in Transition

• Conventional Enterprise◦ Process independent requests

◦ Exhibit high memory, I/O intensity

◦ Ex: web, database, Java, mail, file servers

• Emerging Cloud◦ Extract information, value from data

◦ Exhibit high compute intensity

◦ Ex: analytics, machine learning

B.C. Lee 7

Page 8: Web Search Using Mobile Cores

Cloud Efficiency

• Challenges◦ Migrate computation, data to cloud

◦ Choose efficient components

◦ Understand application, component interaction

• Case Study◦ Mobile cores for efficiency, parallelism for performance?

◦ Achieve efficiency with mobile cores (Intel Atom)

◦ Quantify price of efficiency (Microsoft Bing)

B.C. Lee 8

Page 9: Web Search Using Mobile Cores

EfficiencyAtom is more energy, cost efficient than Xeon

Price of EfficiencyAtom limitations impact robustness, flexibility, reliability

Mitigating Price of EfficiencyAtom strategy spans architecture, system design

B.C. Lee 9

Page 10: Web Search Using Mobile Cores

EfficiencyAtom is more energy, cost efficient than Xeon

Price of EfficiencyAtom limitations impact robustness, flexibility, reliability

Mitigating Price of EfficiencyAtom strategy spans architecture, system design

B.C. Lee 9

Page 11: Web Search Using Mobile Cores

Search Architecture

◦ Rank pages using neural network

◦ Deploy on server (Xeon), mobile (Atom) processors

B.C. Lee 10

Page 12: Web Search Using Mobile Cores

Processor Architecture

• Server-class◦ Xeon-Harpertown

◦ Out-of-order, 4-issue

◦ Power efficiency from process (45nm, L5420)

• Mobile-class◦ Atom-Diamondville

◦ In-order, 2-issue

◦ Power efficiency from microarchitecture

B.C. Lee 11

Page 13: Web Search Using Mobile Cores

Processor Activity

◦ Measure Xeon activity with hardware counters

◦ 44 percent of cycles retire instructions

B.C. Lee 12

Page 14: Web Search Using Mobile Cores

Processor Power

◦ Measure processor power at voltage regulator

◦ Compare Xeon (15W per core) and Atom (1.5W per core)

B.C. Lee 13

Page 15: Web Search Using Mobile Cores

Processor Efficiency◦ Maximize QPS subject to QoS target

◦ Demonstrate Atom energy, cost efficiency

B.C. Lee 14

Page 16: Web Search Using Mobile Cores

EfficiencyAtom is more energy, cost efficient than Xeon

Price of EfficiencyAtom limitations impact robustness, flexibility, reliability

Mitigating Price of EfficiencyAtom strategy spans architecture, system design

B.C. Lee 15

Page 17: Web Search Using Mobile Cores

Challenge of Simple Cores

◦ Measure Atom activity with hardware counters

◦ 3× CPI increase from µarch bottlenecks

B.C. Lee 16

Page 18: Web Search Using Mobile Cores

Price of Efficiency

• Robustness◦ Quality-of-service in throughput, latency

◦ Relevance in query responses

• Flexibility◦ Increase in query complexity

◦ Increase in absolute load

• Reliability◦ Tolerance to fail-over nodes

◦ Increase in relative load

B.C. Lee 17

Page 19: Web Search Using Mobile Cores

Robustness :: Sustainable Throughput

◦ Maximize QPS without violating QoS

◦ Xeon sustains higher throughput (56% per core)

B.C. Lee 18

Page 20: Web Search Using Mobile Cores

Robustness :: Cut-off Latency

◦ Refine results subject to cut-off latency

◦ Atom per-query latency 3× that of Xeon

B.C. Lee 19

Page 21: Web Search Using Mobile Cores

Robustness :: Relevance

◦ Consider choice, ordering of results

◦ Atom results differ 1-3% from those of Xeon

B.C. Lee 20

Page 22: Web Search Using Mobile Cores

Flexibility :: Query Complexity

◦ Split complex queries at service node

◦ Complexity increases absolute load

B.C. Lee 21

Page 23: Web Search Using Mobile Cores

Flexibility :: Latency Variance

◦ Atom increases latency average (µ)

◦ Atom increases latency variance (σ2)

B.C. Lee 22

Page 24: Web Search Using Mobile Cores

Reliability :: Fail-Over

◦ Fail-over increases relative load

◦ Atom fail-overs marginally increase load

B.C. Lee 23

Page 25: Web Search Using Mobile Cores

Reliability :: Overheads

◦ Atom platforms unbalanced

◦ Over-provisioning inefficient

B.C. Lee 24

Page 26: Web Search Using Mobile Cores

EfficiencyAtom is more energy, cost efficient than Xeon

Price of EfficiencyAtom limitations impact robustness, flexibility, reliability

Mitigating Price of EfficiencyAtom strategy spans architecture, system design

B.C. Lee 25

Page 27: Web Search Using Mobile Cores

Mitigating Price of Efficiency

• Addressing Latency & Relevance◦ Address µarchitectural limitations

◦ Deploy heterogeneous servers

◦ Reduce latency at service nodes

• Addressing Flexibility◦ Integrate more cores per chip

◦ Re-engineer platform

◦ Provision more servers per data center

B.C. Lee 26

Page 28: Web Search Using Mobile Cores

Microarchitecture :: Enhancements

• Branch Predictor◦ Apply different strategies based on query, cut-off latency

◦ Predictor stalls when max branches in-flight

• Divider◦ Exercise divider with neural network

◦ Divider performs scalar ops with SIMD unit

• Cache◦ L2 cache misses increase by 14× w.r.t. Xeon

◦ L2 provisions 0.5MB per core

George et al. “Penryn...,” ASSCC, 2007; Gerosa et al. “A sub-1W to 2W...,” ISSCC, 2008.

B.C. Lee 27

Page 29: Web Search Using Mobile Cores

Architecture :: Heterogeneity

• Heterogeneous Servers◦ Deploy both Xeon and Atom servers

◦ Identify tasks requiring Xeon resources

◦ Provide both QoS and power efficiency

• Heterogeneous CMP/SoC◦ Deploy Atom servers

◦ Identify computation phases for custom hardware

◦ Mitigate µarch limitations

B.C. Lee 28

Page 30: Web Search Using Mobile Cores

Platform :: Integration

◦ Xeon: 4-core, 2-socket

◦ Atom: 2-core, 1-socket⇒ Hyp-Atom: 8-core, 2-socket

B.C. Lee 29

Page 31: Web Search Using Mobile Cores

Total Cost of Ownership (TCO)

◦ Pie slice shows breakdown of $

◦ Pie size shows throughput per $

Hamilton. “Cost of power in large-scale data centers,” perspectives.mdvirona.com, 2008.

B.C. Lee 30

Page 32: Web Search Using Mobile Cores

Case for Integration

◦ Platform overheads constrain server density

◦ 2-core Atom servers sustain fewer QPS

Hamilton. “Cost of power in large-scale data centers,” perspectives.mdvirona.com, 2008.

B.C. Lee 31

Page 33: Web Search Using Mobile Cores

Case for Integration

◦ More $’s to servers, fewer $’s to power

◦ 16-core Atom servers sustain more QPS

Hamilton. “Cost of power in large-scale data centers,” perspectives.mdvirona.com, 2008.

B.C. Lee 32

Page 34: Web Search Using Mobile Cores

Platform :: Virtualization

◦ SeaMicro: 8 Atom cores per chipset

◦ Virtualize to eliminate 90% of board components

Takahasi. “SeaMicro drops an atom bomb on the server industry.” venturebeat.com, 2010.

B.C. Lee 33

Page 35: Web Search Using Mobile Cores

EfficiencyAtom is more energy, cost efficient than Xeon

Price of EfficiencyAtom limitations impact robustness, flexibility, reliability

Mitigating Price of EfficiencyAtom strategy spans architecture, system design

B.C. Lee 34

Page 36: Web Search Using Mobile Cores

Also in the paper ...

• µarchitecture◦ Processor activity from hardware counters

◦ µarchitectural bottlenecks

• Search◦ Application phases in computation

◦ Execution time breakdown

• Reference◦ V.J. Reddi, B.C. Lee, T. Chilimbi, K. Vaid. “Web search

using mobile cores: Quantifying and mitigating theprice of efficiency.” ISCA-37: International Symposium onComputer Architecture 2010.

B.C. Lee 35

Page 37: Web Search Using Mobile Cores

Conclusion

• Emerging Cloud Applications◦ Extract value from data

◦ Increase compute intensity

• Energy Efficiency◦ Improve efficiency by 5× with mobile processors

◦ Exact price in robustness, flexibility, reliability

• Future Challenges◦ Pursue efficiency for balanced systems

◦ Consider specialization, heterogeneity, acceleration

B.C. Lee 36

Page 38: Web Search Using Mobile Cores

Web Search Using Mobile CoresQuantifying and Mitigating the Price of Efficiency

Vijay Janapa ReddiEngineering & Applied Science

Harvard University

Benjamin LeeElectrical Engineering

Stanford University

Trishul ChilimbiRuntime Analysis & Design

Microsoft Research

Kushagra VaidGlobal Foundation Services

Microsoft Corporation

B.C. Lee 37