cs 152 computer architecture and engineering lecture …cs152/sp16/lectures/l20... · cs 152...

50
4/20/2016 CS152, Spring 2016 CS 152 Computer Architecture and Engineering Lecture 20: Datacenters Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152 Thanks to Christina Delimitrou, Christos Kozyrakis

Upload: ngotram

Post on 28-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

4/20/2016 CS152,Spring2016

CS152ComputerArchitectureandEngineering

Lecture20:DatacentersDr. George Michelogiannakis

EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

http://inst.eecs.berkeley.edu/~cs152!

!Thanks to Christina Delimitrou, Christos

Kozyrakis!

4/20/2016 CS152,Spring2016

Administrivia

§  PS5dueNOW

§ Quiz5onWednesdaynextweek

§  PleaseshowuponMondayApril21st(lastlecture)–  Neuromorphic,quantum–  ParLngthoughtsthathavenothingtodowitharchitecture–  ClassevaluaLon

2

4/20/2016 CS152,Spring2016

WhatIsADatacenter?

§ Thecomputeinfrastructureforinternet-scaleservices&cloudcompuLng– x10kofservers,x100kharddisks– Examples:Google,Facebook,MicrosoW,Amazon(+AmazonWebServices),TwiYer,Yahoo,…

§ Bothconsumerandenterpriseservices– WindowsLive,Gmail,Hotmail,Dropbox,bing,Google,Adcenter,GoogleApps,Webapps,Exchangeonline,salesforce.com,Azure,…

3

4/20/2016 CS152,Spring2016

OtherDefiniCons

§ Centralizedrepositoryforthestorage,management,anddisseminaLonofdataandinformaLon,pertainingtoaparLcularbusinessorservice

§ DatacentersinvolvelargequanLLesofdataandtheirprocessing

§  Largelymadeupofcommoditycomponents

4

4/20/2016 CS152,Spring2016

Components

5

§ Apartfromcomputers&networkswitches,youneed:–  Powerinfrastructure:voltageconvertersandregulators,generatorsandUPSs,…–  Coolinginfrastructure:A/C,coolingtowers,heatexchangers,airimpellers,…

§  Everythingisco-designed!

4/20/2016 CS152,Spring2016

Example:MSQuincyDatacenter

§  470ksqfeet(10footballfields)§  Nexttoahydro-electricgeneraLonplant

–  Atupto40MegaWaYs,$0.02/kWhisbeYerthan$0.15/kWhJ

–  That’sequaltothepowerconsumpLonof30,000homes

4/20/2016 CS152,Spring2016

Example:MSChicagoDatacenter

Microsoft’s Chicago Data Center

Kushagra$Vaid,$HotPower'10$ 10$Oct$3,$2010$

[K. Vaid, Microsoft Global Foundation Services, 2010]

4/20/2016 CS152,Spring2016

Google’sDatacenterLocaCons

8

4/20/2016 CS152,Spring2016

ApplicaConsThatUseDatacenters

§  Storage–  Largeandsmallfiles(e.g.,phonecontacts)

§  Searchengines

§ ComputeLmerental&webhosLng–  AmazonEC2–virtualserverhosLng

§ Cloudgaming–  Fileorvideostreaming

9

4/20/2016 CS152,Spring2016

WhyIsCloudGamingPossible?

10

4/20/2016 CS152,Spring2016

TheInside

11

4/20/2016 CS152,Spring2016

Example:FBDatacenterRacks

Tuesday, August 16, 11

4/20/2016 CS152,Spring2016

StorageHierarchy

13

4/20/2016 CS152,Spring2016

StorageHierarchy

14

4/20/2016 CS152,Spring2016

CommodityHardware

2-socket server

Low-latency 10GbE Switch

10GbE NIC Flash Storage

JBOD disk array

4/20/2016 CS152,Spring2016

BasicUnit:2-socketServer

§  1-2mulL-corechips§  8-16DRAMDIMMS§  1-2ethernetports

–  10Gbpsorhigher§  Storage

–  InternalSATA/SASdisks(2-6)–  Externalstorageexpansion

§ ConfiguraLon/sizevary–  DependingonLerrole–  1U-2U(1U=1.67inches)

5 Efficient Performance Motherboard Features

5.1 Block Diagram Figure 5 illustrates the functional block diagram of the efficient performance motherboard. Figure 5 が⽰すのは、このマザーボード全体の機能ブロックダイアグラムである。

Figure 5 Efficient Performance Motherboard Functional Block Diagram

5.2 Placement and Form Factor The motherboard's form factor is 6.5x20 inches. Figure 6 illustrates board placement. The placement shows the relative positions of key components, while exact dimension and position information is available in the mechanical DXF file. Form factor, PCIe slot position, front IO port

4/20/2016 CS152,Spring2016

Example:FB2-socketServer

§ CharacterisLcs–  UpgradableCPUs&memory,bootonLAN,externalPCIelink,featurereduced

–  SimilardesignforAMDservers(why?)

Figure 6 Efficient Performance Board Placement

Figure 25 Open Compute Project Server Chassis for Intel Motherboards

11.1 Fixed Locations Refer to the mechanical DXF file for fixed locations of the mounting hole, PCIe x16 slot, and power connector. マウント・ホールや、PCIe x16 スロット、電源コネクターの固定位置については、mechanical DXF ファイルを参照して欲しい。

11.2 PCB Thickness To ensure proper alignment of the FCI power connector and mounting within the mechanical enclosure, the boards should follow the PCB stackups described in sections 4.5 and 5.5 respectively and have 85mil (2.16mm) thickness. The mid-plane PCB thickness is also 85mil (2.16mm). The mezzanine card PCB thickness is 62mil (≈1.6mm). FCI 電源コネクターの位置および、機械的なエンクロジャー内への適切な配置を保証するために、Section 4.5 と 5.5 に記載されるで記述されるPCB stackup に従い、このボードは 85mil(2.16 mm)の厚さを持つべきである。そして、 mid-plane PCB の厚さも 85mil(2.16mm)である。 さらに、mezzanine card PCB の厚さは、62mil(≈1.6mm)となる。

4/20/2016 CS152,Spring2016

ApplicaConMapping(FBExample)

10

Architecture

Service Cluster Back-End Cluster

Front-End Cluster

Web 250 racks

Ads 30 racks

Cache (~144TB)

Search Photos Msg Others UDB ADS-DB Tao Leader

Multifeed 9 racks

Other Small services

4/20/2016 CS152,Spring2016

ServersUsedforaFBRequest

11

Front end

Back end

Life of a �hit�

Front-End Back-End

Web

MC

MC

MC

MC

Ads

Database

L

Feed agg

Request starts

Time

Request completes

LLLLL

4/20/2016 CS152,Spring2016

WhatServerShouldWeUseinaDatacenter?

§ ManyopLons–  1-socketserver–  2-socketserver–  4-socketserver–  …–  64-socketserver–  …

§ Whataretheissuestoconsider?

4/20/2016 CS152,Spring2016

2vs4vs8SocketsperServer

§ Whatisgreatabout2vs1socket?§ Whynot4or8socketsthen?

4/20/2016 CS152,Spring2016

PerformanceScalingofInternetScaleApplicaCons

§  ScalinganalysisforSearch&MapReduceatMicrosoW§ AnyobservaLons?

[IEEE Micro’11]

4/20/2016 CS152,Spring2016

PerformanceMetrics

§ CompleLonLme(e.g.,howfast)–  OfacertainoperaLons

§ Availability

§  Power/energy

§  Totalcostofownership(TCO)

23

4/20/2016 CS152,Spring2016

PowerUsageEffecCveness

§  PUE=Totaldatacenterpower/ITequipmentpower

24

4/20/2016 CS152,Spring2016

TotalCostofOwnership(TCO)

§ Capitalexpenses–  Land,building,generators,aircondiLoning,compuLngequipment

§ OperaLngexpenses–  Electricityrepairs

§ Costofunavailability

25

4/20/2016 CS152,Spring2016

TCOBreakdown

n  ObservaLonsn  >50%ofcostinbuyingthehardwaren  ~30%costsrelatedtopowern  Networking~10%ofoverallcosts(includingcostforservers)

61%16%

14%

6%3%

Servers

Energy

Cooling

Networking

Other

4/20/2016 CS152,Spring2016

TCOBreakdown(2)

27

4/20/2016 CS152,Spring2016

CostAnalysis

§ Costmodelpowerfultoolfordesigntradeoffs–  Evaluate“what-if”scenarios

§  E.g.,canwereducepowercostwithdifferentdisk?

§ A1TBdiskuses10Wofpower,costs$90.Analternatediskconsumesonly5W,butcosts$150.Ifyouwerethedatacenterarchitect,whatwouldyoudo?

4/20/2016 CS152,Spring2016

Answer

§ A1TBdiskuses10Wofpower,costs$90.Analternatediskconsumesonly5W,butcosts$150.Ifyouwerethedatacenterarchitect,whatwouldyoudo?

§ @$2/WaY–evenifwesavedtheenLre10Wofpowerfordisk,wewouldsave$20peryear.Wearepaying$60moreforthedisk–probablynotworthit.– Whatisthisanalysismissing?

4/20/2016 CS152,Spring2016

Reliability&Availability

§ Commongoalforservices:99.99%availability–  1hourofdown-Lmeperyear

§ Butwiththousandsofnodes,thingswillcrash–  Example:with10Kserversratedat30yearsofMTBF,youshouldexpecttohave1failureperday

4/20/2016 CS152,Spring2016

ReliabilityChallenges

31

4/20/2016 CS152,Spring2016

DownCmeDensity

32

4/20/2016 CS152,Spring2016

SourcesofOutages

33

4/20/2016 CS152,Spring2016

RobustnesstoFailures

§ Failovertootherreplicas/datacenters§ BadbackenddetecLon

– StopusingforliverequestsunLlbehaviorgetsbeYer§ Moreaggressiveloadbalancingwhenimbalanceismoresevere

§ Makeyourappsdosomethingreasonableevenifnotallisright– BeYertogiveuserslimitedfuncLonalitythananerrorpage

4/20/2016 CS152,Spring2016

Consistency

§ MulLpledatacentersimpliesdealingwithconsistencyissues– Disconnected/parLLonedoperaLonrelaLvelycommon,e.g.,datacenterdownformaintenance

–  InsisLngonstrongconsistencylikelyundesirable– "Wehaveyourdatabutcan'tshowittoyoubecauseoneofthereplicasisunavailable"

– MostproductswithmutablestategravitaLngtowards"eventualconsistency"model

– Abithardertothinkabout,butbeYerfromanavailabilitystandpoint

4/20/2016 CS152,Spring2016

Performance/AvailabilityTechniquesinDCs

Technique Performance Availability

Replication ✔ ✔ Partitioning (sharding) ✔ ✔ Load-balancing ✔ Watchdog timers ✔ Integrity checks ✔ App-specific compression ✔ Eventual consistency ✔ ✔

4/20/2016 CS152,Spring2016

CharacterisCcsofInternet-scaleServices§ Hugedatasets,usersets,…

§ Highrequestlevelparallelism– Withoutmuchread-writesharing

§ Highworkloadchurn– Newreleasesofcodeonaweeklybasis

§ Requirefault-freeoperaLon

4/20/2016 CS152,Spring2016

PerformanceMetrics

§  Throughput–  Userrequestspersecond(RPS)–  Scale-outaddressthis(moreservers)

§ QualityofService(QoS)–  Latencyofindividualrequests(90th,95th,or95thpercenLle)–  Scale-outdoesnotnecessarilyhelp

§  InteresLngnotes–  ThedistribuLonmaYers,notjusttheaverages

–  OpLmizingthroughputoWenhurtslatency

•  AndopLmizinglatencyoWenhurtspowerconsumpLon

–  Attheend,itisRPS/$withinsomeQoSconstraints

4/20/2016 CS152,Spring2016

TailAtScale

§ Largerclustersàmorepronetohightaillatency

1The Tail at Scale. Jeffrey Dean, Luiz André Barroso. CACM, Vol. 56 No. 2, Pages 74-80, 2013

4/20/2016 CS152,Spring2016

ResourceAssignmentOpCons

§ Howdoweassignresourcestoapps?§ TwomajoropLons:privatevssharedassignment

4/20/2016 CS152,Spring2016

PrivateResourceAssignment

§ Eachappreceivesaprivate,staLcsetofresources§ AlsoknownasstaLcparLLoning

mysql- cassandra- Rails/nginx- hadoop- memcache-

4/20/2016 CS152,Spring2016

SharedResourceAssignment

§ Sharedresources:flexibilityàhighuLlizaLon– Commoncase:user-facingservices+analyLcsonsameservers

– Alsohelpswithfailures,maintenance,andprovisioning

Rails/nginx-

hadoop-

memcache-

utilization-

Rails/nginx-

hadoop-

memcache-

utilization-

4/20/2016 CS152,Spring2016

SharedClusterManagement

§ Themanagerschedulesappsonsharedresources–  AppsrequestresourcereservaLons(cores,DRAM,…)– Managerallocatesandassignsspecificresources

•  Consideringperformance,uLlizaLon,faulttolerance,prioriLes,…•  PotenLally,mulLpleappsoneachserver

– MulLplemanagerarchitectures(seeBorgpaperforexample)

Mesos-implements-weighted-DRF-

masters-

masters(can(be(configured(with(weights(per'role'

resource(allocation(decisions(incorporate(the(weights(to(determine(dominant(fair(shares(

cluster-manager-status-quo-

cluster-manager-

application/human-

specification-

the(specification(includes(as(much(information(as(possible(to(assist(the(cluster(manager(in(scheduling(and(execution(

cluster-manager-architectures-

master-

slaves-

cluster-manager-status-quo-

cluster-manager-

application/human-

result-

4/20/2016 CS152,Spring2016

Autoscaling

§ Monitorappperformanceorserverload–  [Chase’01,AWSAutoScale,Lim’10,Shen’11,Gandhi’12,…]

§ Adjustresourcesgiventoapp– Addorremovetomeetperformancegoal– Feedback-basedcontrolloop

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

Load balancer

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

workers-

distributed-system*-anatomy-

coordinator-

* overlooking peer-to-peer distributed systems

4/20/2016 CS152,Spring2016

Map+Reduce

§ Map:– Acceptsinputkey/valuepair

– Emitsintermediatekey/valuepair

§ Reduce:– Acceptsintermediatekey/value*pair

– Emitsoutputkey/valuepair

Very

big

data

Result M

A

P

R

E

D

U

C

E

Partitioning

Function

4/20/2016 CS152,Spring2016

AnalyCcsExample:MapReduce

§  Single-Lerarchitecture– DistributedFS,workerservers,coordinator– Diskbasedorin-memory

§ Metric:throughput

[Figure credit: Paul Krzyzanowski]

4/20/2016 CS152,Spring2016

Example3-CerApp:WebMail

§ Mayincludethousandsofmachines,PetaBytesofdata,andbillionsofusers

§  1stLer:protocolprocessing–  Typicallystateless–  Usealoadbalancer

§  2ndLer:applicaLonlogic–  OWencachesstatefrom3rdLer

§  3rdLer:datastorage–  Heavilystateful–  OWenincludesbulkofmachines

Load Balancer

Front-end tier (HTTP, POP, IMAP…)

Middle tier (Mail delivery, user info,

stats, …)

Back-end tier (Mail metadata & files)

4/20/2016 CS152,Spring2016

Example:SocialNetworking

§ 3Lersystemn Webserver,fastuserdatastorage,persistentstorage

n 2rdLer:latencycriLcal,largenumberofservers

n Userdatastoragen  Usingmemcachedfordistributedcachingn  10sofTbytesinmemory(Facebook150TB)

n  Shardedandreplicatedacrossmanyserversn  Read/write(unlikesearch),bulkisread-dominated

n Fromin-memorycachingtoin-memoryFSn RAMcloud@Stanford,Sinfonia@HP,…

Front-end tier (HTTP, presentation, …)

Middle tier (memcached or in-

memory FS)

Back-end tier (persistent DB)

4/20/2016 CS152,Spring2016

Acknowledgements

§ ChristosKozyrakis,ChrisLnaDelimitrou–  BasedonEE282@Stanford

§  “Designs,Lessons,andAdvicefromBuildingLargeDistributedSystems”byJeffDean,GoogleFellow

§  “Athousandchickensortwooxen?Part1:TCO”byIsabelMarLn

§  “UPSasaservicecouldbeaneconomicbreakthrough”byMarkMonroe

§  “TakeacloselookatMapReduce”byXuanhuaShi

49

4/20/2016 CS152,Spring2016

ReducingTailLatency

§ Reducequeuing(reduceheadoflineblocking)§ Separatedifferenttypesofrequests§ CoordinatebackgroundacLviLes§ Hedgedrequeststoreplicas§ Tiedrequeststoreplicas§ Micro-sharding&selecLvereplicaLon§ LatencyinducedprobaLon,canaryrequests

§ SeeTheTail@Scalepaperfordetails