introduction
DESCRIPTION
Introduction. Readings. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, L. Barroso and U. Holze. Introduction. Increasingly we are seeing more of our applications moving from the PC to the Internet e.g., Email – gmail, yahoo - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/1.jpg)
Introduction
![Page 2: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/2.jpg)
Readings
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, L. Barroso and U. Holze
![Page 3: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/3.jpg)
Introduction Increasingly we are seeing more of our
applications moving from the PC to the Internet e.g., Email – gmail, yahoo Photo management – Picasso, Kodak,
Sutterbug Word processing – Google apps
Why? Less work on the user’s behalf Maybe the potential for less cost for the
user
![Page 4: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/4.jpg)
Introduction To support this move from the PC to the
“Internet” requires a large number of servers, storage, network support etc; Companies like Amazon, Google, eBay are
running data centers with tens of thousands of machines
To make users trust these systems requires that a number of issues be addressed e.g., failure handling
![Page 5: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/5.jpg)
Architecture
![Page 6: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/6.jpg)
Architecture
Common elements include Low end servers typically in a blade
enclosure within a rack The interconnection of servers within a rack
is supported with a local Ethernet switch (rack switch)
The local Ethernet switch has a number of uplink connections to one or more cluster-level (data center level) Ethernet switch
![Page 7: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/7.jpg)
Storage
Disks can be connected directly to each server and managed by a global distributed file system (e.g., Google’s GFS); or
Disks can be part of Network Attached Storage (NAS) devices that are directly connected to the cluster level switch
![Page 8: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/8.jpg)
Storage
NAS Reliability is provided by the device through
replication and error codes Server node
Need a fault-tolerant file system at the cluster level which is not trivial to implement
• Writes are slower Potentially is lower cost then using NAS
• Disks can be the same as what is on your PC
![Page 9: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/9.jpg)
Storage Hierarchy
![Page 10: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/10.jpg)
Networking Fabric
Tradeoffs between speed, scale and cost Intra rack connectivity is relatively
inexpensive to achieve Network switches with high port counts
have a different price structure then switches used for rack connectivity Much more expensive
Network switches with few ports require programmers to be aware of the scarce bandwidth
![Page 11: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/11.jpg)
Latency, Bandwidth, Capacity
Much faster for an application to retrieve data from local disks then from off rack disks but
Applications often need more storage then found on a local disk (e.g., Google search)
How is this dealt with efficiently?
![Page 12: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/12.jpg)
Power Usage
Peak power usage measured at one of Google’s data centers: Networking 5% CPUs 23% Disks 10% DRAMS 30% Other 22%
![Page 13: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/13.jpg)
Handling Failures
The high number of components almost guarantee failures Disk drives can exhibit annualized failure
rates higher than 4% Lots of restarts needed
This issue has received a good deal of attention
![Page 14: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/14.jpg)
Request Handling
Lots of disks so how is data placed so that it can be found
Let’s look at Amazon Partition the data so that groups of
servers handle just a part of the inventory (or any other data) Router needs to be able to extract keys from
request • Hashing is one strategy for doing this• Based on the key you then determine the server
to handle the request
![Page 15: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/15.jpg)
Online Evolution Internet-time implies constant change
Need acceptable quality Three approaches to managing upgrades
Fast reboot: Cluster at a time• Minimize yield impact
Rolling upgrade: Node at a time• Versions must be compatible
Big flip: Half the cluster at a time• Reserved for complex changes
Either way: use staging area, be prepared to revert
![Page 16: Introduction](https://reader031.vdocuments.us/reader031/viewer/2022013011/56813d18550346895da6d734/html5/thumbnails/16.jpg)
Summary
We have briefly discussed a high-level view of data centers
In this course we will discuss how Google, Amazon, etc deal with some of the implications of these architectures