web search using mobile cores
TRANSCRIPT
![Page 1: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/1.jpg)
Web Search Using Mobile CoresQuantifying and Mitigating the Price of Efficiency
Vijay Janapa ReddiEngineering & Applied Science
Harvard University
Benjamin LeeElectrical Engineering
Stanford University
Trishul ChilimbiRuntime Analysis & Design
Microsoft Research
Kushagra VaidGlobal Foundation Services
Microsoft Corporation
B.C. Lee 1
![Page 2: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/2.jpg)
Conventional Wisdom
◦ Moore’s Law provides transistors◦ Simple cores improve energy efficiency
◦ Parallelism recovers lost performance
B.C. Lee 2
![Page 3: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/3.jpg)
Simple Cores
◦ Pursue aggregate throughput, energy efficiency◦ Assume task parallelism
◦ Assume latency tolerance
B.C. Lee 3
![Page 4: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/4.jpg)
Data Deluge
◦ Data centers are not keeping up
◦ Data is not information, value
“Data, data everywhere.” The Economist, 2010.
B.C. Lee 4
![Page 5: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/5.jpg)
Value from Data
• Statistical Inference◦ Draw useful information from free data
◦ Useful information applies statistical inference
◦ Free data drawn from Internet’s webpages
• Ex: Language Translation◦ Statistical inference trumps linguistic structure
◦ Early 90’s, IBM French/English with O(1E+6) documents
◦ Presently, Google Translate with O(1E+9) documents
“Clicking for gold.” The Economist, 2010.
B.C. Lee 5
![Page 6: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/6.jpg)
Computational Intensity◦ Microsoft Bing ranks pages with neural network
◦ RMS foreshadows future analytic workloads
B.C. Lee 6
![Page 7: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/7.jpg)
Applications in Transition
• Conventional Enterprise◦ Process independent requests
◦ Exhibit high memory, I/O intensity
◦ Ex: web, database, Java, mail, file servers
• Emerging Cloud◦ Extract information, value from data
◦ Exhibit high compute intensity
◦ Ex: analytics, machine learning
B.C. Lee 7
![Page 8: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/8.jpg)
Cloud Efficiency
• Challenges◦ Migrate computation, data to cloud
◦ Choose efficient components
◦ Understand application, component interaction
• Case Study◦ Mobile cores for efficiency, parallelism for performance?
◦ Achieve efficiency with mobile cores (Intel Atom)
◦ Quantify price of efficiency (Microsoft Bing)
B.C. Lee 8
![Page 9: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/9.jpg)
EfficiencyAtom is more energy, cost efficient than Xeon
Price of EfficiencyAtom limitations impact robustness, flexibility, reliability
Mitigating Price of EfficiencyAtom strategy spans architecture, system design
B.C. Lee 9
![Page 10: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/10.jpg)
EfficiencyAtom is more energy, cost efficient than Xeon
Price of EfficiencyAtom limitations impact robustness, flexibility, reliability
Mitigating Price of EfficiencyAtom strategy spans architecture, system design
B.C. Lee 9
![Page 11: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/11.jpg)
Search Architecture
◦ Rank pages using neural network
◦ Deploy on server (Xeon), mobile (Atom) processors
B.C. Lee 10
![Page 12: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/12.jpg)
Processor Architecture
• Server-class◦ Xeon-Harpertown
◦ Out-of-order, 4-issue
◦ Power efficiency from process (45nm, L5420)
• Mobile-class◦ Atom-Diamondville
◦ In-order, 2-issue
◦ Power efficiency from microarchitecture
B.C. Lee 11
![Page 13: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/13.jpg)
Processor Activity
◦ Measure Xeon activity with hardware counters
◦ 44 percent of cycles retire instructions
B.C. Lee 12
![Page 14: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/14.jpg)
Processor Power
◦ Measure processor power at voltage regulator
◦ Compare Xeon (15W per core) and Atom (1.5W per core)
B.C. Lee 13
![Page 15: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/15.jpg)
Processor Efficiency◦ Maximize QPS subject to QoS target
◦ Demonstrate Atom energy, cost efficiency
B.C. Lee 14
![Page 16: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/16.jpg)
EfficiencyAtom is more energy, cost efficient than Xeon
Price of EfficiencyAtom limitations impact robustness, flexibility, reliability
Mitigating Price of EfficiencyAtom strategy spans architecture, system design
B.C. Lee 15
![Page 17: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/17.jpg)
Challenge of Simple Cores
◦ Measure Atom activity with hardware counters
◦ 3× CPI increase from µarch bottlenecks
B.C. Lee 16
![Page 18: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/18.jpg)
Price of Efficiency
• Robustness◦ Quality-of-service in throughput, latency
◦ Relevance in query responses
• Flexibility◦ Increase in query complexity
◦ Increase in absolute load
• Reliability◦ Tolerance to fail-over nodes
◦ Increase in relative load
B.C. Lee 17
![Page 19: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/19.jpg)
Robustness :: Sustainable Throughput
◦ Maximize QPS without violating QoS
◦ Xeon sustains higher throughput (56% per core)
B.C. Lee 18
![Page 20: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/20.jpg)
Robustness :: Cut-off Latency
◦ Refine results subject to cut-off latency
◦ Atom per-query latency 3× that of Xeon
B.C. Lee 19
![Page 21: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/21.jpg)
Robustness :: Relevance
◦ Consider choice, ordering of results
◦ Atom results differ 1-3% from those of Xeon
B.C. Lee 20
![Page 22: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/22.jpg)
Flexibility :: Query Complexity
◦ Split complex queries at service node
◦ Complexity increases absolute load
B.C. Lee 21
![Page 23: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/23.jpg)
Flexibility :: Latency Variance
◦ Atom increases latency average (µ)
◦ Atom increases latency variance (σ2)
B.C. Lee 22
![Page 24: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/24.jpg)
Reliability :: Fail-Over
◦ Fail-over increases relative load
◦ Atom fail-overs marginally increase load
B.C. Lee 23
![Page 25: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/25.jpg)
Reliability :: Overheads
◦ Atom platforms unbalanced
◦ Over-provisioning inefficient
B.C. Lee 24
![Page 26: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/26.jpg)
EfficiencyAtom is more energy, cost efficient than Xeon
Price of EfficiencyAtom limitations impact robustness, flexibility, reliability
Mitigating Price of EfficiencyAtom strategy spans architecture, system design
B.C. Lee 25
![Page 27: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/27.jpg)
Mitigating Price of Efficiency
• Addressing Latency & Relevance◦ Address µarchitectural limitations
◦ Deploy heterogeneous servers
◦ Reduce latency at service nodes
• Addressing Flexibility◦ Integrate more cores per chip
◦ Re-engineer platform
◦ Provision more servers per data center
B.C. Lee 26
![Page 28: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/28.jpg)
Microarchitecture :: Enhancements
• Branch Predictor◦ Apply different strategies based on query, cut-off latency
◦ Predictor stalls when max branches in-flight
• Divider◦ Exercise divider with neural network
◦ Divider performs scalar ops with SIMD unit
• Cache◦ L2 cache misses increase by 14× w.r.t. Xeon
◦ L2 provisions 0.5MB per core
George et al. “Penryn...,” ASSCC, 2007; Gerosa et al. “A sub-1W to 2W...,” ISSCC, 2008.
B.C. Lee 27
![Page 29: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/29.jpg)
Architecture :: Heterogeneity
• Heterogeneous Servers◦ Deploy both Xeon and Atom servers
◦ Identify tasks requiring Xeon resources
◦ Provide both QoS and power efficiency
• Heterogeneous CMP/SoC◦ Deploy Atom servers
◦ Identify computation phases for custom hardware
◦ Mitigate µarch limitations
B.C. Lee 28
![Page 30: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/30.jpg)
Platform :: Integration
◦ Xeon: 4-core, 2-socket
◦ Atom: 2-core, 1-socket⇒ Hyp-Atom: 8-core, 2-socket
B.C. Lee 29
![Page 31: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/31.jpg)
Total Cost of Ownership (TCO)
◦ Pie slice shows breakdown of $
◦ Pie size shows throughput per $
Hamilton. “Cost of power in large-scale data centers,” perspectives.mdvirona.com, 2008.
B.C. Lee 30
![Page 32: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/32.jpg)
Case for Integration
◦ Platform overheads constrain server density
◦ 2-core Atom servers sustain fewer QPS
Hamilton. “Cost of power in large-scale data centers,” perspectives.mdvirona.com, 2008.
B.C. Lee 31
![Page 33: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/33.jpg)
Case for Integration
◦ More $’s to servers, fewer $’s to power
◦ 16-core Atom servers sustain more QPS
Hamilton. “Cost of power in large-scale data centers,” perspectives.mdvirona.com, 2008.
B.C. Lee 32
![Page 34: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/34.jpg)
Platform :: Virtualization
◦ SeaMicro: 8 Atom cores per chipset
◦ Virtualize to eliminate 90% of board components
Takahasi. “SeaMicro drops an atom bomb on the server industry.” venturebeat.com, 2010.
B.C. Lee 33
![Page 35: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/35.jpg)
EfficiencyAtom is more energy, cost efficient than Xeon
Price of EfficiencyAtom limitations impact robustness, flexibility, reliability
Mitigating Price of EfficiencyAtom strategy spans architecture, system design
B.C. Lee 34
![Page 36: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/36.jpg)
Also in the paper ...
• µarchitecture◦ Processor activity from hardware counters
◦ µarchitectural bottlenecks
• Search◦ Application phases in computation
◦ Execution time breakdown
• Reference◦ V.J. Reddi, B.C. Lee, T. Chilimbi, K. Vaid. “Web search
using mobile cores: Quantifying and mitigating theprice of efficiency.” ISCA-37: International Symposium onComputer Architecture 2010.
B.C. Lee 35
![Page 37: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/37.jpg)
Conclusion
• Emerging Cloud Applications◦ Extract value from data
◦ Increase compute intensity
• Energy Efficiency◦ Improve efficiency by 5× with mobile processors
◦ Exact price in robustness, flexibility, reliability
• Future Challenges◦ Pursue efficiency for balanced systems
◦ Consider specialization, heterogeneity, acceleration
B.C. Lee 36
![Page 38: Web Search Using Mobile Cores](https://reader031.vdocuments.us/reader031/viewer/2022020704/61fb65c72e268c58cd5db0b5/html5/thumbnails/38.jpg)
Web Search Using Mobile CoresQuantifying and Mitigating the Price of Efficiency
Vijay Janapa ReddiEngineering & Applied Science
Harvard University
Benjamin LeeElectrical Engineering
Stanford University
Trishul ChilimbiRuntime Analysis & Design
Microsoft Research
Kushagra VaidGlobal Foundation Services
Microsoft Corporation
B.C. Lee 37