reduced and alternative energy for cloud and telephony ... · reduced and alternative energy for...
TRANSCRIPT
HUAWEI TECHNOLOGIES CO., LTD.
www.huawei.com
Reduced and Alternative Energy for Cloud and Telephony ApplicationsJames Hughes, FellowCloud Computing
HUAWEI TECHNOLOGIES CO., LTD.
Agenda
Energy usage in a Telco Trends and direction
Energy usage in data centers Total Energy Server Energy Consolidation Saving energy by selecting data center location
Latency issues Promising research areas
FAWN Ceph
What about data center recycling
HUAWEI TECHNOLOGIES CO., LTD.
Green Radio ScenariosTwo Market Profiles:
Developed World Developed Infrastructure Saturated Markets Quality of Service Key Issue Drive is to Reduce Costs
Emerging Markets Less Established Infrastructure Rapidly Expanding Markets Large Geographical Areas Often no mains power supply
power consumption a major issuePeter Grant, Green Radio – The Case for More Efficient Cellular Base Stations, May 2009, University of Edinburgh
www.mobilevce.com © 2009 Mobile VCE
Green Radio Scenarios
Two Market Profiles:
1.! Developed World
!! Developed Infrastructure
!! Saturated Markets
!! Quality of Service Key Issue
!! Drive is to Reduce Costs
2.! Emerging Markets
!! Less Established Infrastructure
!! Rapidly Expanding Markets
!! Large Geographical Areas
!! Often no mains power supply
– power consumption a major issue
www.mobilevce.com © 2009 Mobile VCE
Green Radio Scenarios
Two Market Profiles:
1.! Developed World
!! Developed Infrastructure
!! Saturated Markets
!! Quality of Service Key Issue
!! Drive is to Reduce Costs
2.! Emerging Markets
!! Less Established Infrastructure
!! Rapidly Expanding Markets
!! Large Geographical Areas
!! Often no mains power supply
– power consumption a major issue
HUAWEI TECHNOLOGIES CO., LTD.
Energy Cost per Subscriber
Operation vs Embodied Data Center a small part Room for improvement
www.mobilevce.com © 2009 Mobile VCE
Where is the Energy Used?
!! For the operator, 57% of
electricity use is in radio access
!! Operating electricity is the
dominant energy requirement at
base stations
!! For user devices, most of the energy used is due to
manufacturing
9kg
CO2
4.3kg
CO2
2.6kg
CO2
8.1kg
CO2
Mobile
CO2 emissions per subscriber
per year3
Operation
Embodied
energy
Base station
3. Tomas Edler, Green Base
Stations – How to Minimize CO2
Emission in Operator Networks, Ericsson, Bath Base Station
Conference 2008
www.mobilevce.com © 2009 Mobile VCE
Where is the Energy Used?
!! For the operator, 57% of
electricity use is in radio access
!! Operating electricity is the
dominant energy requirement at
base stations
!! For user devices, most of the energy used is due to
manufacturing
9kg
CO2
4.3kg
CO2
2.6kg
CO2
8.1kg
CO2
Mobile
CO2 emissions per subscriber
per year3
Operation
Embodied
energy
Base station
3. Tomas Edler, Green Base
Stations – How to Minimize CO2
Emission in Operator Networks, Ericsson, Bath Base Station
Conference 2008
Tomas Edler, Green Base Stations – How to Minimize CO2 Emission in Operator Networks, Ericsson, Bath Base Station Conference 2008
HUAWEI TECHNOLOGIES CO., LTD.
Base Station Power useCentral Equipment
Combining/Demultiplexing
Transceiver Power conversion
Cooling Fans
Power Supply
Transceiver Idling
Power Amplifier
Tomas Edler, Green Base Stations – How to Minimize CO2 Emission in Operator Networks, Ericsson, Bath Base Station Conference 2008
HUAWEI TECHNOLOGIES CO., LTD.
Where does the data center power go?
Energy losses in the U.S. T&D system are ~7.2%
1w server savings is a 3.3w overall savings
Ambient cooling UPS
http://climatetechnology.gov/library/2003/tech-options/tech-options-1-3-2.pdfhttp://doe.thegreengrid.org/files/temp/E12A2B5D-B0E1-CA1A-97C1553AF4A01249/Green_Grid_Guidelines_WP.pdf
of ways before it even reaches the IT loads. Practical
requirements such as keeping IT equipment properly
housed, powered, cooled, and protected is one
example of how energy consumption is sidetracked,
or rendered less efficient (see Figure 1).
Note that all energy consumed by the datacenter in
Figure 1 ends up as waste heat, which is rejected
outdoors into the atmosphere. This diagram is based
on a typical datacenter with 2N power and N+1
cooling equipment, operating at approximately 30%
of rated capacity.1
System design issues that commonly reduce the
efficiency of datacenters include:
• Power distribution units and/or transformers
operating well below their full load capacities.
• Air conditioners forced to consume extra power
to drive air at high pressures over long distances.
• Cooling pumps which have their flow rate
automatically adjusted by valves (which
dramatically reduces the pump efficiency).
• N+1 or 2N redundant designs, which result
in underutilization of components.
• The tradition of oversizing a UPS to avoid
operating near its capacity limit.
• The decreased efficiency of UPS equipment
when run at low loads.
• Under-floor blockages that contribute to
inefficiency by forcing cooling devices to
work harder to accommodate existing load
heat removal requirements. (This can lead
to temperature differences and high-heat
load areas might receive inadequate cooling).
BEST PRACTICES
Right-sizing the physical infrastructure system to the
load, using efficient physical infrastructure devices,
and designing an energy-efficient system are all
techniques to help reduce energy costs. A successful
strategy for addressing the datacenter energy
management challenge requires a multi-pronged
approach that should be enforced throughout the
lifecycle of the datacenter. The following categories
of practices serve as cornerstones for implementing
an energy-efficient strategy: engineering, deployment,
operations, and organization.
white paper 3
Chiller
HumidifierCRAC
IT Equipment IndoorDataCenterHeat
ElectricalPowerIN
WasteHeatOUT
PDU
UPS
Switchgear/generatorLighting
33%
3%9%
30%
5%
18%
1%1%
Figure 1: Where Does It Go?
Data Center 3 Year TCO
HUAWEI TECHNOLOGIES CO., LTD.
Economic arguments
Servers and Power are 70% of data center TCO
Servers
Power
Other IT
Facilities
Networking
Labor
http://scap.nist.gov/events/2009/itsac/presentations/day2/Day2_Cloud_Blakley.pdf
HUAWEI TECHNOLOGIES CO., LTD.
Breakdown of Server Power Consumption
!"#$%&'( !"#$"#%&'("#%)'*+,-./0'*%1!',#2"3%4*/"5%678+9%:;;<=
!"#$"#%&'("#%>#"7?@'(*%8A%)'-.'*"*/+
B0C,#" % D % E0CE50CE/+ % E'( %.'("# % 0+ % 2'*+,-"@ % '* % 7$"#7C" %(0/E0* % 7* % 0*@0$0@,75%+"#$"#F%&#'2"++'#+%7*@%-"-'#A%2'*+,-"%/E"%-'+/%7-',*/%'G%.'("#9%G'55'("@%8A%/E"%.'("#%+,..5A%"GG020"*2A% 5'++F%H0+?%@#0$"%.'("#%'*5A%8"2'-"+%+0C*0G027*/% 0*%+"#$"#+%(0/E%+"$"#75%@0+?%@#0$"+F
&#'2"++'#%.'("#%2'*+,-./0'*%$7#0"+%C#"7/5A%8A%/E"%/A."%'G%+"#$"#%.#'2"++'#%,+"@F%&'("#%2'*+,-./0'*%27*%$7#A%G#'-%IJK%/'%:;;K%."#%-,5/0L2'#"%)&MF%N"("#%4*/"5%.#'2"++'#+%0*25,@"%.'("#%+7$0*C%/"2E*'5'C0"+9%+,2E%7+%H"-7*@%>7+"@%!(0/2E0*C%7*@ % O*E7*2"@ % !.""@ % !/". % /"2E*'5'CAF % PE"+" % *"("# % .#'2"++'#+ % 75+' % +,..'#/%.'("#%+7$0*C%+/7/"+%+,2E%7+%)QO%7*@%))DF%R,5/0L2'#"%.#'2"++'#+%7#"%-,2E%-'#"%.'("# %"GG020"*/ % /E7*%.#"$0',+ %C"*"#7/0'*+F %!"#$"#+ %,+0*C% /E"%#"2"*/ %S,7@2'#"%4*/"5T%U"'*V%.#'2"++'#+%27*%@"50$"#%QF<%/"#7G5'.+%7/%."7?%."#G'#-7*2"%,+0*C%5"++%/E7*%Q;9;;;%(7//+%'G%.'("#F%&"*/0,-T%.#'2"++'#+%0*%QWW<%(',5@%E7$"%2'*+,-"@%78',/%<;;9;;;%(7//+%/'%72E0"$"%/E"%+7-"%."#G'#-7*2"F%!"#$"#%.#'2"++'#%.'("#%2'*+,-./0'*%(055%75+'%$7#A%@"."*@0*C%'*%/E"%+"#$"#%('#?5'7@F
)'.A#0CE/%X%:;;W%4*/"5%)'#.'#7/0'* Y
Server Power Consumption (Source: Intel Labs, 2008)
HUAWEI TECHNOLOGIES CO., LTD.
The Argument for Server Consolidation
Average server is 10% utilized
Consolidation increases utilization
CPU Utilization and Power Consumption (Source: Blackburn 2008)
0
100
200
300
400
0 20 40 60 80 100
Power usage vs Processor Utilization
Ser
ver
Pow
er C
onsu
mp
tion
Server Utilization
HUAWEI TECHNOLOGIES CO., LTD.
RAM Power
Energy based mostly on access 25% static power utilization RAM Dedup
Reduce static power utilization Embodied Energy
PDF: 09005aef829559ff/Source: 09005aef828dcdbf Micron Technology, Inc., reserves the right to change products or specifications without notice.TN41_01DDR3 Power.fm - Rev. B 8/07 EN 23 ©2007 Micron Technology, Inc. All rights reserved.
TN-41-01: Calculating Memory System Power for DDR3DDR3 Power Spreadsheet Usage Example
Figure 16: Power Consumption Summary
Figure 17: Power Consumption per Device
Psys(PRE_PDN) 0.0 mW Psys(PRE_STBY) 18.6 mW Psys(ACT_PDN) 0.0 mW Psys(ACT_STBY) 91.4 mW Psys(REF) 3.5 mW
Total Background Power 113.5 mW
Psys(ACT) 123.2 mW Total Activate Power 123.2 mW
Psys(WR) 57.8 mW Psys(RD) 71.4 mW Psys(DQ) 26.5 mW Psys(TERM) 43.6 mW
Total RD/WR/Term Power 199.3 mW Total DDR3 SDRAM Power 435.9 mW
0
50
100
150
200
250
300
350
400
450
500
Devi
ce P
ower
(mW
)
Total RD/WR/Term Pow er Total Activate Pow er
Total Background Pow er
PDF: 09005aef829559ff/Source: 09005aef828dcdbf Micron Technology, Inc., reserves the right to change products or specifications without notice.TN41_01DDR3 Power.fm - Rev. B 8/07 EN 23 ©2007 Micron Technology, Inc. All rights reserved.
TN-41-01: Calculating Memory System Power for DDR3DDR3 Power Spreadsheet Usage Example
Figure 16: Power Consumption Summary
Figure 17: Power Consumption per Device
Psys(PRE_PDN) 0.0 mW Psys(PRE_STBY) 18.6 mW Psys(ACT_PDN) 0.0 mW Psys(ACT_STBY) 91.4 mW Psys(REF) 3.5 mW
Total Background Power 113.5 mW
Psys(ACT) 123.2 mW Total Activate Power 123.2 mW
Psys(WR) 57.8 mW Psys(RD) 71.4 mW Psys(DQ) 26.5 mW Psys(TERM) 43.6 mW
Total RD/WR/Term Power 199.3 mW Total DDR3 SDRAM Power 435.9 mW
0
50
100
150
200
250
300
350
400
450
500
Devi
ce P
ower
(mW
)
Total RD/WR/Term Pow er Total Activate Pow er
Total Background Pow er
PDF: 09005aef829559ff/Source: 09005aef828dcdbf Micron Technology, Inc., reserves the right to change products or specifications without notice.TN41_01DDR3 Power.fm - Rev. B 8/07 EN 9 ©2007 Micron Technology, Inc. All rights reserved.
TN-41-01: Calculating Memory System Power for DDR3Introduction
Figure 6: Current Profile – WRITEs
When several WRITEs are added between ACT commands, the consumption of current associated with the WRITE is IDD4W. To identify the power associated with only the WRITEs and not the standby current, IDD3N must be subtracted. The calculation for the data sheet write component of power, Pds(WR), is shown in Equation 13.
(Eq. 13)
To scale the data sheet power to actual power based on command scheduling, it must be calculated as a ratio of the write bandwidth. This is noted as WRsch%, which is the total number of clock cycles that write data is on the bus (not WRITE commands) versus the total number of clock cycles. The WRsch% calculation for the example show in Figure 6 is shown in Equation 14.
(Eq. 14)
nACT= 36
ACT WR WR
Data In Data In
PRE ACT WR WR
Data In
WRITEs
Pds(WR) = (IDD4W - IDD3N) ! VDD
Pds(WR) = (240mA - 75mA) ! 1.575V
Pds(WR) = 260mW
num_ of_WR_cyclesWRsch% =
WRsch% =
WRsch% = 22%
8 cycles36tCK
nACT
PDF: 09005aef829559ff/Source: 09005aef828dcdbf Micron Technology, Inc., reserves the right to change products or specifications without notice.TN41_01DDR3 Power.fm - Rev. B 8/07 EN 9 ©2007 Micron Technology, Inc. All rights reserved.
TN-41-01: Calculating Memory System Power for DDR3Introduction
Figure 6: Current Profile – WRITEs
When several WRITEs are added between ACT commands, the consumption of current associated with the WRITE is IDD4W. To identify the power associated with only the WRITEs and not the standby current, IDD3N must be subtracted. The calculation for the data sheet write component of power, Pds(WR), is shown in Equation 13.
(Eq. 13)
To scale the data sheet power to actual power based on command scheduling, it must be calculated as a ratio of the write bandwidth. This is noted as WRsch%, which is the total number of clock cycles that write data is on the bus (not WRITE commands) versus the total number of clock cycles. The WRsch% calculation for the example show in Figure 6 is shown in Equation 14.
(Eq. 14)
nACT= 36
ACT WR WR
Data In Data In
PRE ACT WR WR
Data In
WRITEs
Pds(WR) = (IDD4W - IDD3N) ! VDD
Pds(WR) = (240mA - 75mA) ! 1.575V
Pds(WR) = 260mW
num_ of_WR_cyclesWRsch% =
WRsch% =
WRsch% = 22%
8 cycles36tCK
nACT
http://download.micron.com/pdf/technotes/ddr3/TN41_01DDR3%20Power.pdf
HUAWEI TECHNOLOGIES CO., LTD.
Data center location matters
On how many days is cooling necessary in each city? Assume air cooling
Cost of electricity? Latency? Network Capacity?
Shenzhen
UrumqiHarbin
Yichang
Lanzhou
HUAWEI TECHNOLOGIES CO., LTD.
Ambient Cooling, Harbin, China
-25
0
25
50
75
100
Jan Mar May Jul Sep Nov
Average Temperature
LowHighAir cooling Limit
http://www.travelchinaguide.com/climate/harbin.htm
HUAWEI TECHNOLOGIES CO., LTD.
Eliminating Transmission loss
Colocate data centers with electricity generation
Dams have water to cool data centers
http://en.wikipedia.org/wiki/Three_Gorges_Dam
HUAWEI TECHNOLOGIES CO., LTD.
Hydroelectric power
Not Constant
http://en.wikipedia.org/wiki/Three_Gorges_Dam
HUAWEI TECHNOLOGIES CO., LTD.
Improvements in Networking
2,250 miles 24ms optical latency 130ms measured (San Francisco to Chicago)
5x improvement with AON
Scales linearly Using small processors High performance Energy efficiency
75x of a “Server” Can Fawn nodes scale to 1000s? Can the cost of Flash be
competitive to disk? When?
HUAWEI TECHNOLOGIES CO., LTD.
Fast Array of Wimpy Nodes
System QPSWattsQueries/JouleAlix 704 6 117Soekris (1) 334 3.75 89Soekris (8) 2431 30 81Desktop+SSD 2728 80 34Gumstix 50 2 25Macbook Pro 53 29 1.8Desktop 160 87 1.8Server 600 400∗ 1.2
Table 2. Query rates and power costs using different machine configurations. (∗ Server poweris estimated–our power measuring device does not accept the 220V feed that the server uses.)
• Soekris: A Soekris net4801-66. 266MHz SC1100 CPU, 128MB DRAM, 100 Mbit/sec Eth-ernet.
• Gumstix: A Gumstix Verdex XL6P. 600MHz Marvell PXA270 XScale CPU, 128MB RAM.16-bit PIO Compact Flash interface. 100Mbit/s Ethernet.
We first evaluate the power and query rate that these systems can achieve using the FAWNdatabase on a single node. The tests requested the data over the network to measure the rateat which these nodes could serve data if they were acting in a FAWN cluster. Table 2 shows theresults. The embedded systems are decisively more power-efficient than even the low-power desktopnode with a modern SSD. The best-performing node, the Alix.1c, achieved 117 queries per joule.Laptops and desktops serving queries from a single hard drive achieved only 1.8 queries per joule(and were considerably slower). A high-end server with a 15-disk RAID5 array achieved 1.2 queriesper joule.
Scaling with more FAWN nodes To determine whether the FAWN architecture would scale,we performed a small -scale scaling test using the 8 available Soekris nodes in our cluster. Theresults, in Figure 6, show that at this scale, the FAWN database scales linearly with an increasingnumber of back-end nodes. This cluster is quite small, but there is little in the architecture thatimpairs scaling to the small thousands of nodes: the front-end nodes track only a small amountof data per back-end node, and there is no inter-node communication. The FAWN design is verysimilar to other massively-scalable designs such as Amazon’s Dynamo [8]. While these scalingresults are as expected as they are encouraging, we plan to conduct larger-scale tests in the future.
5.2 Power and Cost of a FAWN Cluster
A FAWN cluster is not simply an independent set of nodes; it requires front-end nodes and anetwork infrastructure to link them together. In this section, we first extrapolate the queries/joulethat could be provided by a complete FAWN cluster. For this analysis, we use as a baseline theAlix.1c system (Table 2) instead of the relatively old (2003) Soekris nodes from our test cluster.The analysis includes the power overhead of the FAWN nodes, the front-end nodes, and the networkinfrastructure. We conclude by briefly speculating about the cost and performance of a customFAWN cluster.
Front-end nodes: The Desktop node (Table 2) with a small amount of flash consumes 80Wand can forward 81,000 queries/second to back-end FAWN nodes. As a result, using Alix-based
12
0
500
1000
1500
2000
2500
0 2 4 6 8 10
qp
s (v
alu
e =
10
0B
)
# of fawn nodes
qps
Figure 6. The FAWN cluster scales linearly with an increasing number of nodes to the limitof the cluster size.
back-end nodes, a FAWN cluster requires one front-end node per 115 back-ends, for a total of 7front-end nodes for 720 back-end nodes.
FE power draw 560WFE cost $4,200
Network infrastructure: With 100 byte requests, each FAWN node sends approximately720Kbit/s of traffic under full load (128bytes ∗ 704queries
sec ∗ 8 bitsbyte). This low query rate means
that a single gigabit uplink can support 1000 back-end nodes, and so FAWN can use a relativelyflat network hierarchy of 16-port switches ($65, 1.5 watts)5 connected to a single 48-port 100/1000switch ($750, 66 watts).6 We note that the network infrastructure adds nearly 0.2 watts per node;while this is not a large increase with the Alix nodes, it would represent a 20% increase with a0.8W node.
Network power draw 134WNetwork cost $3,675
Back-end nodes: 720 Alix.1c-based FAWN back-end nodes with 16GB of compact flash pernode.
node power draw 4320Wnode cost $191,520
Alix node FAWN cluster: A cluster of 720 Alix.1c-based FAWN nodes, with 7 front-endnodes and associated network infrastructure.
FAWN total power draw 5014WFAWN total cost $199,395
5Listed DC power input for Linksys EZXS16W is 3.3V, 300mA. 1.5W value multipled by 1.5 for power supply
inefficiency.6Maximum listed power draw for HP ProCurve 2610.
13
http://www.cs.cmu.edu/~dga/papers/fawn-pdl-tr-08-108.pdf
HUAWEI TECHNOLOGIES CO., LTD.
Ceph: A Scalable, High-Performance Distributed File System Problem?
12 x 1TB disk server from Dell ~$8000
12 x 1TB disk sells for ~$1000 750W
Can this be reduced? Ceph uses a $100 ARM server per 2 x 1TB disks Does it scale? Can the centralized metadata server be eliminated? Reliability at scale? 180W?
……
… … …
…
CRUSH(pgid) (osd1, osd2)
OSDs(grouped byfailure domain)
File
Objectshash(oid) & mask pgid
PGs
(ino,ono) oid
Figure 3: Files are striped across many objects, groupedinto placement groups (PGs), and distributed to OSDsvia CRUSH, a specialized replica placement function.
capacity and aggregate performance by delegating man-agement of object replication, cluster expansion, failuredetection and recovery to OSDs in a distributed fashion.
5.1 Data Distribution with CRUSHCeph must distribute petabytes of data among an evolv-ing cluster of thousands of storage devices such that de-vice storage and bandwidth resources are effectively uti-lized. In order to avoid imbalance (e. g., recently de-ployed devices mostly idle or empty) or load asymme-tries (e. g., new, hot data on new devices only), we adopta strategy that distributes new data randomly, migrates arandom subsample of existing data to new devices, anduniformly redistributes data from removed devices. Thisstochastic approach is robust in that it performs equallywell under any potential workload.
Ceph first maps objects into placement groups (PGs)using a simple hash function, with an adjustable bit maskto control the number of PGs. We choose a value thatgives each OSD on the order of 100 PGs to balance vari-ance in OSD utilizations with the amount of replication-related metadata maintained by each OSD. Placementgroups are then assigned to OSDs using CRUSH (Con-trolled Replication Under Scalable Hashing) [29], apseudo-random data distribution function that efficientlymaps each PG to an ordered list of OSDs upon which tostore object replicas. This differs from conventional ap-proaches (including other object-based file systems) inthat data placement does not rely on any block or ob-ject list metadata. To locate any object, CRUSH requiresonly the placement group and an OSD cluster map: acompact, hierarchical description of the devices compris-ing the storage cluster. This approach has two key ad-vantages: first, it is completely distributed such that anyparty (client, OSD, or MDS) can independently calcu-late the location of any object; and second, the map isinfrequently updated, virtually eliminating any exchangeof distribution-related metadata. In doing so, CRUSH si-multaneously solves both the data distribution problem(“where should I store data”) and the data location prob-lem (“where did I store data”). By design, small changes
to the storage cluster have little impact on existing PGmappings, minimizing data migration due to device fail-ures or cluster expansion.
The cluster map hierarchy is structured to align withthe clusters physical or logical composition and potentialsources of failure. For instance, one might form a four-level hierarchy for an installation consisting of shelvesfull of OSDs, rack cabinets full of shelves, and rows ofcabinets. Each OSD also has a weight value to controlthe relative amount of data it is assigned. CRUSH mapsPGs onto OSDs based on placement rules, which de-fine the level of replication and any constraints on place-ment. For example, one might replicate each PG on threeOSDs, all situated in the same row (to limit inter-rowreplication traffic) but separated into different cabinets(to minimize exposure to a power circuit or edge switchfailure). The cluster map also includes a list of downor inactive devices and an epoch number, which is incre-mented each time the map changes. All OSD requests aretagged with the client’s map epoch, such that all partiescan agree on the current distribution of data. Incrementalmap updates are shared between cooperating OSDs, andpiggyback on OSD replies if the client’s map is out ofdate.
5.2 ReplicationIn contrast to systems like Lustre [4], which assume onecan construct sufficiently reliable OSDs using mecha-nisms like RAID or fail-over on a SAN, we assume thatin a petabyte or exabyte system failure will be the normrather than the exception, and at any point in time severalOSDs are likely to be inoperable. To maintain systemavailability and ensure data safety in a scalable fashion,RADOS manages its own replication of data using a vari-ant of primary-copy replication [2], while taking steps tominimize the impact on performance.
Data is replicated in terms of placement groups, eachof which is mapped to an ordered list of n OSDs (forn-way replication). Clients send all writes to the firstnon-failed OSD in an object’s PG (the primary), whichassigns a new version number for the object and PG andforwards the write to any additional replica OSDs. Aftereach replica has applied the update and responded to theprimary, the primary applies the update locally and thewrite is acknowledged to the client. Reads are directedat the primary. This approach spares the client of any ofthe complexity surrounding synchronization or serializa-tion between replicas, which can be onerous in the pres-ence of other writers or failure recovery. It also shifts thebandwidth consumed by replication from the client to theOSD cluster’s internal network, where we expect greaterresources to be available. Intervening replica OSD fail-ures are ignored, as any subsequent recovery (see Sec-tion 5.5) will reliably restore replica consistency.
OSDI ’06: 7th USENIX Symposium on Operating Systems Design and Implementation USENIX Association312
http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf
HUAWEI TECHNOLOGIES CO., LTD.
Recycling
Average server / storage system has a 3 year life Companies like Amazon replace racks at the 3 year mark
How do we reduce the embodied CO2? Can we recover value from the waste?
Does “Fail in place” make a difference? “DESIGN FOR DISASSEMBLY TO RECOVER EMBODIED ENERGY”
Buildings
http://eprints.qut.edu.au/2846/1/Crowther-PLEA1999.PDF
HUAWEI TECHNOLOGIES CO., LTD.
Innovation over time
Moore’s law is not cause but effect Doubling every 2 years is 41%/yr Feature size improvement of 19%/yr
Continuous process improvement Green IT will be similar
Many improvements can be made No single fundamental roadblock
Energy use (by itself) has no value Increases in efficiency lowers cost and increases competitiveness
In a commodity market, can lower cost indicate more green? If government force manufacturers to pay recycling fee?
HUAWEI TECHNOLOGIES CO., LTD.
Conclusion
Energy usage in a Wireless Telco focuses on base stations Energy usage in data centers has much room for improvement
Location Cooling, low transmission loss, UPS
Latency Server Energy
Consolidation Necessary but not sufficient
Can more/smaller processors be an answer? Now that we know how to horizontally scale?
The future for energy savings is bright
Thank Youwww.huawei.com