rio info 2009 - optimizing it costs using virtualization, green and cloud computing - david royer
DESCRIPTION
TRANSCRIPT
PRESENTATION TITLE GOES HEREOptimizing IT Costs using Virtualization, Green and
Cloud Computing
David Royer
SNIA Brasil, Chairman
Rio Info 2009
Rio de Janeiro, Brazil
SNIA At A Glance
Voice of the storage industry representing approximately $50-60B in worldwide revenue for hardware and software
Founded in 1997 as a non-profit trade association
Worldwide headquarters in San Francisco USA
Global presence in A/NZ, Canada, China, EMEA, India, Japan and South-Asia
Technology Center activities in Colorado, Beijing, Tokyo, and Bangalore
Focus on education, conferences, specifications / standards, software, industry alliances, best practices, plugfests, and conformance testing for SNIA specifications
Co-owner of Storage Networking World (SNW) conference with Computerworld/IDG Enterprise
a collaborative environment and serve as global contributors toward the advancement of standards, education, and innovation in the storage and information management industry
Storage Outlook and Growth
YoY Growth by Segment
-70.00%
-60.00%
-50.00%
-40.00%
-30.00%
-20.00%
-10.00%
0.00%
10.00%
20.00%
30.00%20
08Q1
2008
Q2
2008
Q3
2008
Q4
Tape - Entry Level Tape - Midrange Tape - High End
Int Disk - Entry Int Disk - Midrange Ext Disk - Entry
Ext Disk - Midrange Ext Disk - High End
Worldwide Disk Storage Systems and
Branded Tape Storage Segment Factory
Revenue Growth
• Entry level and midrange external DSS are the only segments showing flat/positive YoY growth in 4Q
2008. This can be attributed to: customers deferring purchase of larger, more expensive storage systems
in favor of lower cost, more modular systems and; the emergence of technologies, such as iSCSI, that
offer enterprise level features yet at a lower price point than traditional FC SAN systems
Source IDC Doc # 218274
Storage Hardware 2009 Outlook
Tape will continue to decline as disk-based archival and back-up technologies
emerge
Internal storage is closely tied to the server market, which is expected to be
weaker in the coming quarters than the external disk market
External disk storage systems market will feel further the impact of the
economic crisis. Weakness seen in higher end systems, specifically
mainframes and FC SAN.
Healthier segments include:
iSCSI SAN – specifically in the upper entry level and midrange market
Verticals such as Healthcare, Video Surveillance, and Government
Midrange product offerings: as customers fulfilling their enterprise
storage needs with midrange products
Enterprise VTL: Will augment midrange and enterprise tape drives,
especially in tape libraries and automation
Source IDC Doc # 218274
Storage Software Growth – Average 7%
Data Protection, growth rate through 2013, 6.2%
Archiving Software, growth rate through 2013, 10.4%
Storage Device Management Software, growth rate through 2013, 2.8%
Storage Management Software, growth rate through 2013, 5.6%
Storage Infrastructure, growth rate through 2013, 5.9%
Storage Replication, growth rate through 2013, 7.6%
File System, growth rate through 2013, 7.1%
Source IDC Doc # 217529
E-Discovery Growth
Combination of software:Storage infrastructure, e-discovery, collaboration, ECM, data management, and security
HardwareStorage spending growth was underpinned by data volume
and requirements to store, manage, index, archive, and preserve data
Servers
Source IDC Doc # 218259
Focus on a Few
Industry Storage Trends
Green IT
Cloud Computing
Virtualization
9
Abstract
Best Practices in Managing Virtualized Environments
Today, data center environments are increasingly complex with
virtualization at all layers of the IT stack, including network, server,
SAN and storage. IT professionals are often challenged in diagnosing
application performance issues, optimizing infrastructure resource
utilization, and planning for future changes. The best practices for
managing complex data center environments include cross domain
management orientation, watching the infrastructure response time
for cross-domain performance, looking for application contention and
contention-based latency in the storage layer, best fit analysis of
workloads to storage resources, and working toward infrastructure
performance SLAs. Key requirements for this new breed of
management software include agent-less discovery and SMI-S support.
10
Virtualization is
Everywhere
SAN SAN
NETWORK
App Servers Web Servers Security
Array Virtualization
Storage Network
Server Virtualization
Client Network
Tremendous BenefitsPooling of resources
Rapidly deploy new
applications
Increase resource
utilization
Over-subscribe resources
Lower acquisition cost and
TCO
Traditional system
management practices
may no longer work
11
What’s “Real” about
Virtualization?
Like the Emperor‟s new (virtualized) clothes –
A logical interface presenting a
normalized “resource” that isn‟t “all there”
Built over physical and other virtual layers that do not look at all like
the presented logical resource
We will discuss two major IT virtualization initiatives
Storage Virtualization
Server Virtualization
(and the combination of the two!)
Check out SNIA Tutorial:
Virtualization 1- What, Why,
Where, and How
12
Virtualization Pools Resources
SAN SAN
CLIENT NETWORK
Server Pool
STORAGE NETWORLK
Storage Pool Tier 1Tier 2
Archive
CLIENT NETWORK
Physical Infrastructure Model Virtual Infrastructure Model
13
Managing Virtualized
Environments
Managing through Virtualization is Challenging
Diagnosing Performance Problems
Optimizing Resource Utilization
Planning for Future Changes
Virtualization Feature “New” Admin Challenge
Clients Reserve and Share
Resource Capacity
Resource Performance still
Degrades Non-linearly with Load
Dynamic Infrastructure Finding Transitional bottlenecks
Increased Resource Utilization Optimal Resource Deployment
Easy to provision new VMs Predicting if the next VM fits
14
The Bottom Line…
Applications share resources
Poor performance is caused by:Hard-to-find I/O bottlenecks and resource contention
Mis-alignment between layers of virtualization
Under-provisioning shared resources
Over-provisioning of shared resources as insurance negates ROI
Inhibitors to successVirtualized data center complexity
Lack of cross-domain management
Lack of cross-domain communication
15
Best Practices in Managing Virtualized
Environments
Solving Old Problems in a New Environment
Recommended Best Practices -
1. Cross Domain Analysis and Shared Resource Contention
2. Adopt an Application View of Performance
3. Use Automation Wisely
4. “Effective Capacity” Management
5. Model-based Optimization and Planning
16
1. Cross Domain Analysis
Virtualization Management is “Cross-Domain” -
Create a Cross-Domain Baseline (discover and collect)
Mapping from multiple layers (app, server, storage, physical & virtual)
Aim for agent-less and “on-line”
Standards like SMI-S are essential for heterogeneous environments
Check Configuration First
Don‟t optimize or “plan a baseline” from a poorly configured system
Checklist vendor configuration best practices
Newer technologies (Thin-wide arrays, 10 GbE networks,
SSDs) move performance bottlenecks elsewhere.Check out SNIA Tutorial:
Solving Business-Oriented Goals
with SMI-S
17
I/O Paths Through
VirtualizationApplications and Servers
Virtual Server Hosts
Virtual Storage
Storage Arrays
18
Find Shared Resource
Contention
Stepping Through a Virtual Looking Glass -
Need to Map through Virtualization LayersMap relationships at every level
Exponential problem of server virtualization over storage virtualization
Sum up the loads from every client that shares each resource
Quantify Application Contention due to SharingCalculate performance impact back to each application
Root cause is mostly figuring out What’s Changed when Capacity runs out
If Load changed, was it aberrant behavior or growth?
If Configuration changed, does it violate policy or show thrashing?
If Contention arose, who is new to the pool?
1919
Application Contention
Cross Domain visibility is naturally “foggy”
Domain specific management has limited view
Virtualization makes it harder
Management requires end-to-end picture
A common map
helps different domain
admins communicate
Need a map through
all the indirection
Sharing can be
dynamic – maps
must be too
Long data path from application to array…
20
Cross-Domain: Navigating the Virtualized
Environment
21
2. Adopt Application View of Performance
The Customer is Always Right –
Application Infrastructure PerformanceHow long do it take an I/O to complete from the application point of view (Response Time)
Some applications ($$$) are more loved than others
Manage to this “Service” PerformanceElement utilizations are interesting,but service performance is the goal
Look for Abnormal “Service” BehaviorNot just default rule-of-thumb thresholds on utilizations
22
Service Layer Metrics
Customer Resource
Throughput @
Response Time0
5
10
15
20
25
30
35
40
0 200 400 600 800 1000 1200 1400
Throughput ( transactions / sec )
Response
Time ( sec )
Service Level Agreement
Optimal
Throughput
Maximum
Throughput
23
Look for Abnormal Behavior
Check for Abnormal Behavior
Calculate baseline
A statistical analysis of variance of performanceover time
Compare data to baseline
Shared Resources tend to average out peaks that will show in dedicated resources
Helps Justify Virtualization
Acceptable Variance
24
4. “Effective Capacity”
Management
Capacity Management Isn’t Just “Enough GBs”
Storage has both space and time constraints
(server folk have it easy!)
Manage to the total “Effective Capacity”
Maximum utilization that gives good performance
Not to total actual utilization (aka “saturation”)
Build in Automation for Scalability
Virtualized environments tend to sprawl
And they can change dynamicallyCheck out SNIA Tutorial:
Storage Virtualization II –
Effective Use of Virtualization
25
Effective Capacity = Optimal Usage
0
5
10
15
20
25
30
35
40
0 200 400 600 800 1000 1200 1400
Throughput ( transactions / sec )
Response
Time ( sec )
Service Level Agreement
Optimal
Throughput
Maximum
Throughput
26
4. Use Automation Wisely
Build in Automation for ScalabilityVirtualized environments tend to sprawl
And they can change dynamically
Almost everything can be automatedEvent Monitoring
Performance collection and reporting
Analysis of Performance and Configuration
correlation of events with performance, first and second order analysis
Provisioning, Reconfiguration and Migration
Don‟t forget to leave an audit trail
Feedback loopWhat where the effects of the change?
Check out SNIA Tutorial:
Storage Virtualization II –
Effective Use of Virtualization
27
5. Model based Optimization and Planning
Moving Towards a Real-Time Datacenter -
Constantly Increase Operational EfficiencyMost working infrastructure is sub-optimized
Dedicated resources
“If it ain‟t broke, don‟t fix it” attitudes (or capabilities)
However, when everything is shared, everyone goes down together…
Real-er Time Capacity PlanningUtilizations are related to Response Time through Queuing Theory
Need to predict performance degradation under future application load changes
Need to predict performance improvements from possible architectural/technology changes
Planning and tuning will go from large cyclical events to smaller, more dynamic perturbations
28
Queuing Theory to The Rescue…
Queuing Models create Response Time curves
Based on established mathematics (Buzen, et.al – see www.cmg.org )
Useful analytically (historically) as well as predictively
For a simple example think of a check-out line at the grocery store
Complex Queuing Network Models can represent
nested and virtualized IT domains
Advanced cross-domain solutions model IT virtualization
29
Best Practices in
Managing Virtualized Environments
In Summary -1. Cross Domain Analysis and Shared Resource Contention
Virtualization is about sharing across IT domains,and that‟s often the problem
2. Adopt an Application View of PerformanceManage to customer service levels
3. Use Automation WiselyDoing more with less time and fewer errors
4. “Effective Capacity” ManagementShared resources still obey the laws of physics
5. Model-based Optimization and PlanningLeverage Prediction to Improve your Future
Green IT and
Storage, Energy and the Industry
Storage is a notable contributor to Data
Center energy consumption
Data storage is projected to increase 6-
fold between 2007 to 2011(1)
“Building the Green Data Center”
© 2008 SNIA All Rights Reserved
Industry Concerns today
Fear of „Green Washing‟ – lack of industry wide comparisons tools
Inappropriate comparisons of technologies – Apples to Oranges
New technologies being introduced – how will they effect energy usage?
Benefit of product features vs. bigger picture of data management
(1) IDC White Paper, “The Diverse and Exploding Digital Universe,” March 2008.
Energy Cost of Data Storage
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
50,000
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
Capacity (
PB
s)
0
500
1,000
1,500
2,000
2,500
3,000
$M
Installed # of Petabytes
Cost to Power and Cool
(57% 2006-2011 CAGR)
(19% 2006-2011 CAGR)
IDC #212714, “The Real Costs to Power and Cool All the World's External Storage” – June 2008 Dave Reinsel
Chart used by permission of IDC
What Impacts Energy
Consumption for Data Storage
Storage capacity / usage efficiency
increasing data larger capacity more disks
redundant copies magnify capacity needs
variability in usage and utilization inefficient allocation of space
What is valuable data? What is the retention policy?
Data transfer rate / access speed
high I/O bandwidth higher rotational speed; striping across many drives
low access times faster actuators; higher rotational speeds; caches
How fast and immediate must data be available? (time-to-data)
Data integrity
25% of “digital universe” is unique, but 75% are replicas / duplicates
partly to ensure data integrity and survivability; partly wasteful
Data availability / system reliability
RAID uses extra drives, plus redundant power supplies, fans, controllers,
How valuable is data? How likely are failures? How fast must data be available?
Potential Paths to “Green” Storage
Improve usage efficiency
De-duplication
Thin provisioning
Minimize energy consumption
Improved component designs – high-efficiency power
supplies, advanced & flexible drives
Variants of MAID – idle and spin-down
New technologies
Solid state storage
Alternative + hybrid system designs (opportunity to rethink)
must be driven by
metrics / standards
/ guidelines
Anatomy of a Storage System
Disk Arrays
UPSs
PDUs
Fans
Switches
Hard drives
Controllers
Power Distribution Unit
Uninterruptible Power Supply
System design, complexity andredundancy vary depending on applications & usage
Component designs, software features, and workload affect power consumption and efficiency
Appliances
Power Supplies
Apps Software
Storage –
Power Supply Efficiency
Fans
Hard drives
Controllers
Power Supplies
1 - Redundant power supplies are
standard, except in the smallest systems
2 - Significant
mechanical
components, require
dual-output power
supplies (12V, 5V)
3 - Power supplies often custom-
designed for reliability
(for
servers)*
*presented by EPA at ENERGY STAR Computer
Server Stakeholder Meetings; July 2008
Idle Power versus Active Power
Idle Mode for a Storage Array
storage system is protecting data, ready to process IOs
background maintenance & optimization tasks on-going
factors: time-to-data, overhead electronics, fan, maintenance
systems are idle large fractions of the time
Active Mode for a Storage Array
storage system is carrying out IOs
background tasks continue in parallel
factors: workload (seq/random), response time, throughput
evaluate a variety of workloads, plus sustained peak power
HDD Capacity versus
High Performance
Capacity
focused on GB/watt at rest
1 TB SATA: 15W
4 x 250 GB FC: 64W
also tend to have better $/GB
NOTE: power use is quadratic with respect to rotational speed
Use the slowest drives that will fit your needs
Performance
focused on seek time
1 TB SATA: 12 – 15 ms
300 GB FC: 3 – 4 ms
also designed for higher RAS * environments
* RAS = Reliability, Availability, Security
SSD vs HDD
Power Value - Significant Power Savings
EnterpriseSSD
15k RPMHDD
Idle Temp
6.8W 0.5W
Idle Power
85°F
~38% Less Heat, ~90% Less Power
Load Temp
10.1W 0.9W
Load Power
94°F
SSDs reduce
energy cost to
operate and cool
the data center
Storage Taxonomy
for Energy Measurement
Need a taxonomy (product classification) to enable fair
comparisons among similar storage products
e.g. for motor vehicles – motorcycles, cars, trucks
Similar green metrics may apply to all product categories, but
different values establish best-in-class
Unique considerations apply to special categories
e.g. amphibious cars, skid steer loaders, tanks
Clear taxonomy will simplify comparisons and aid regulatory
efforts
SNIA Measurement Standard - Draft
Storage taxonomy
Measurement conditions
Idle metric
Active metric(s)
Reporting results
1) Storage Taxonomy (1 of 2)
Storage Taxonomy Summary
Online Storage Near Online Storage
Prime storage, able to serve random as well as
sequential workloads with minimal delay
Intended as second tier storage behind Online
Storage. Able to service Random and
Sequential workloads, but perhaps with
noticeable delay in time to 1st data access.
Maximum Capacity Guidance Note: Maximum Capacity Guidance reflects the
maximum capacity a given offering can be purchased with and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute
value. There will be case where a device may have greater or small capabilities, but otherwise is an appropriate match for a given classification due to
other criteria, e.g.: redundancy capabilities
Max Storage Devices Max Storage Devices
Group 1) SoHo & Consumer
Up to 4 DevicesStorage which is designed primarily for home (consumer) or home / small office usage.–Often Direct Connected (USB, IP, etc)
–No option for redundancy (will contain SPOFs)
Group 2) Entry, DAS, or JBOD
More than 4 Devices Up to 4 DevicesStorage which is dedicated to one or at most a very limited number of servers. Often will not include any
integrated controller, but rely on server host for that functionality.–Often Direct Connected (SATA, IP, etc.)
–May optionally offer limited number of redundancy features
Group 3) Entry / Midrange
More than 20 Devices More than 4 DevicesSAN or NAS connected storage which places a higher emphasis on value than scalability and
performance. This is often referred to as „Entry Level‟ storage.–Network connected (IP, SAN, etc.)
–Has options for redundancy features
Group 4) Midrange / Enterprise
More than 100 Devices More than 100 DevicesSAN or NAS connected storage which delivers a balance of performance and features. Offers higher level
of management as well as scalability and reliability capabilities.–Network connected (IP, SAN, etc.)
–Has options for and often delivered with full redundancy (no SPOF)
Group 5) Enterprise / Mainframe
More than 1000 DevicesStorage which exhibits large scalability and extreme robustness associated with Mainframe deployments,
though are not restricted to Mainframe only deployments.–Mainframe connectivity with optional network connection (IP, SAN..)
–Always delivered with full redundancy (no SPOF)
–Often Capable of non-disruptive serviceability
Storage Taxonomy Summary
Online Storage Near Online Storage
Prime storage, able to serve random as well as
sequential workloads with minimal delay
Intended as second tier storage behind Online
Storage. Able to service Random and
Sequential workloads, but perhaps with
noticeable delay in time to 1st data access.
Maximum Capacity Guidance Note: Maximum Capacity Guidance reflects the
maximum capacity a given offering can be purchased with and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute
value. There will be case where a device may have greater or small capabilities, but otherwise is an appropriate match for a given classification due to
other criteria, e.g.: redundancy capabilities
Max Storage Devices Max Storage Devices
Group 1) SoHo & Consumer
Up to 4 DevicesStorage which is designed primarily for home (consumer) or home / small office usage.–Often Direct Connected (USB, IP, etc)
–No option for redundancy (will contain SPOFs)
Group 2) Entry, DAS, or JBOD
More than 4 Devices Up to 4 DevicesStorage which is dedicated to one or at most a very limited number of servers. Often will not include any
integrated controller, but rely on server host for that functionality.–Often Direct Connected (SATA, IP, etc.)
–May optionally offer limited number of redundancy features
Group 3) Entry / Midrange
More than 20 Devices More than 4 DevicesSAN or NAS connected storage which places a higher emphasis on value than scalability and
performance. This is often referred to as „Entry Level‟ storage.–Network connected (IP, SAN, etc.)
–Has options for redundancy features
Group 4) Midrange / Enterprise
More than 100 Devices More than 100 DevicesSAN or NAS connected storage which delivers a balance of performance and features. Offers higher level
of management as well as scalability and reliability capabilities.–Network connected (IP, SAN, etc.)
–Has options for and often delivered with full redundancy (no SPOF)
Group 5) Enterprise / Mainframe
More than 1000 DevicesStorage which exhibits large scalability and extreme robustness associated with Mainframe deployments,
though are not restricted to Mainframe only deployments.–Mainframe connectivity with optional network connection (IP, SAN..)
–Always delivered with full redundancy (no SPOF)
–Often Capable of non-disruptive serviceability
See: Green Storage Power Measurement Specification for complete details
1) Storage Taxonomy (Continued: 2 of 2)
Storage Taxonomy Summary(Continued)
Removable Media
Libraries
Virtual Media
Libraries
Infrastructure
Appliances
Infrastructure
Interconnect
Archival storage used in a
sequential access mode. A
Typical example would be Tape
based archival, both Stand Along
and Robotically assisted libraries.
Storage which simulates
removable Media Libraries.
Will typically use non tape
based storage and as such are
able to respond to data
requests more quickly
Devices placed in the storage SAN
or network adding value through
one or more dedicated Storage
enhancements. Examples include:
SAN Virtualization, Compression,
De-duplication, etc.
Devices which enable a SAN or
other Storage Network data
switching or routing.
Maximum Capacity Guidance Note:
Maximum Capacity Guidance reflects the maximum capacity a given offering can be purchased with
and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute value. There
will be case where a device may have greater or small capabilities, but otherwise is an appropriate match
for a given classification due to other criteria, e.g.: redundancy capabilities
Max Tape DrivesMax Storage Devices
Supported*Max Port Count
Group 1) SoHo & Consumer
Stand Alone Drive(No Robotics)
Note: * Infrastructure Appliances by
definition have no intrinsic storage, other than what is used for local
processing and/or local Cashing of
data.
Storage Devices Support in this case
refers to the number of storage
devices controllable down stream of
the Appliance
Storage which is designed primarily for home (consumer) or home / small
office usage.–Often Direct Connected (USB, IP, etc)
–No option for redundancy (will contain SPOFs)
Group 2) Entry, DAS, or JBOD
Up to 4 Drives Up to 32Storage which is dedicated to one or at most a very limited number of
servers. Often will not include any integrated controller, but rely on server
host for that functionality.–Often Direct Connected (SATA, IP, etc.)
–May optionally offer limited number of redundancy features
Group 3) Entry / Midrange
More than 4 Drives Up to 100 DevicesSupport for up to 20
DevicesUp to 128
SAN or NAS connected storage which places a higher emphasis on value
than scalability and performance. This is often referred to as „Entry Level‟
storage.–Network connected (IP, SAN, etc.)
–Has options for redundancy features
Group 4) Midrange / Enterprise
More than 24 Drives
More than 100
DevicesSupport for more than 20
DevicesMore than 128
SAN or NAS connected storage which delivers a balance of performance
and features. Offers higher level of management as well as scalability and
reliability capabilities.–Network connected (IP, SAN, etc.)
–Has options for and often delivered with full redundancy (no SPOF)
Group 5) Enterprise / Mainframe
More than 11 DrivesMore than 100
Devices
Support for more than
100 Devices
© SNIA 2009
Storage which exhibits large scalability and extreme robustness associated
with Mainframe deployments, though are not restricted to Mainframe only
deployments.–Mainframe connectivity with optional network connection (IP, SAN..)
–Always delivered with full redundancy (no SPOF)
–Often Capable of non-disruptive serviceability
Storage Taxonomy Summary(Continued)
Removable Media
Libraries
Virtual Media
Libraries
Infrastructure
Appliances
Infrastructure
Interconnect
Archival storage used in a
sequential access mode. A
Typical example would be Tape
based archival, both Stand Along
and Robotically assisted libraries.
Storage which simulates
removable Media Libraries.
Will typically use non tape
based storage and as such are
able to respond to data
requests more quickly
Devices placed in the storage SAN
or network adding value through
one or more dedicated Storage
enhancements. Examples include:
SAN Virtualization, Compression,
De-duplication, etc.
Devices which enable a SAN or
other Storage Network data
switching or routing.
Maximum Capacity Guidance Note:
Maximum Capacity Guidance reflects the maximum capacity a given offering can be purchased with
and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute value. There
will be case where a device may have greater or small capabilities, but otherwise is an appropriate match
for a given classification due to other criteria, e.g.: redundancy capabilities
Max Tape DrivesMax Storage Devices
Supported*Max Port Count
Group 1) SoHo & Consumer
Stand Alone Drive(No Robotics)
Note: * Infrastructure Appliances by
definition have no intrinsic storage, other than what is used for local
processing and/or local Cashing of
data.
Storage Devices Support in this case
refers to the number of storage
devices controllable down stream of
the Appliance
Storage which is designed primarily for home (consumer) or home / small
office usage.–Often Direct Connected (USB, IP, etc)
–No option for redundancy (will contain SPOFs)
Group 2) Entry, DAS, or JBOD
Up to 4 Drives Up to 32Storage which is dedicated to one or at most a very limited number of
servers. Often will not include any integrated controller, but rely on server
host for that functionality.–Often Direct Connected (SATA, IP, etc.)
–May optionally offer limited number of redundancy features
Group 3) Entry / Midrange
More than 4 Drives Up to 100 DevicesSupport for up to 20
DevicesUp to 128
SAN or NAS connected storage which places a higher emphasis on value
than scalability and performance. This is often referred to as „Entry Level‟
storage.–Network connected (IP, SAN, etc.)
–Has options for redundancy features
Group 4) Midrange / Enterprise
More than 24 Drives
More than 100
DevicesSupport for more than 20
DevicesMore than 128
SAN or NAS connected storage which delivers a balance of performance
and features. Offers higher level of management as well as scalability and
reliability capabilities.–Network connected (IP, SAN, etc.)
–Has options for and often delivered with full redundancy (no SPOF)
Group 5) Enterprise / Mainframe
More than 11 DrivesMore than 100
Devices
Support for more than
100 Devices
© SNIA 2009
Storage which exhibits large scalability and extreme robustness associated
with Mainframe deployments, though are not restricted to Mainframe only
deployments.–Mainframe connectivity with optional network connection (IP, SAN..)
–Always delivered with full redundancy (no SPOF)
–Often Capable of non-disruptive serviceability
See: Green Storage Power Measurement Specification for complete details
Desired Storage Metric –
“Productivity”
• “typical workload”, with levels
• “four corners”, maximum
performance, maximum power
Standard Performance Evaluation Corporation
• The Green Grid Productivity Proxy Proposals
example – Proxy #4 – bits/kilowatt-hour
• detailed performance benchmark – results/W
Random,
read
Random,
write
Sequential
write
Sequential,
read
Many possible definitions – must balance simplicity against applicability
Complications
Single disk drive power profile
IBM Haifa Research Labs
SPECweb 2005 (banking) + storage
Storage powerServer power
• Significant
whole-system
considerations
• Max power =/= Max performance
“Storage Modeling for Power
Estimation”, Miriam Allalouf , Yuriy
Arbitman, Michael Factor, Ronen I.
Kat, Kalman Meth, and Dalit Naor;
IBM Haifa Research Labs;
manuscript; March 2009
“The Next Frontier for Power/Performance Benchmarking:
Energy Efficiency of Storage Subsystems” Klaus-Dieter Lange;
SPEC Benchmark Workshop 2009; January 2009
Need for Data Redundancy
RAID 10 – protect against multiple disk failures
DR Mirror – protect against whole-site disasters
Backups – protect against failures and unintentional deletions/changes
Compliance archive – protect against heavy fines
Test/dev copies – protect live data from mutilation by unbaked code
Overprovisioning – protect against volume out of space application crashes
Snapshots – quicker and more efficient backups
App
Data
1 TB
5 TB
10 TB
Test/Dev
copies
Data
RAID10
“Growth”
Snapshots
Data
RAID10
“Growth”
Snapshots
Backup
Archive
Test
Test
Test
Test
Test
Compliance
Archive
Data
RAID10
“Growth”
Snapshots
Data
RAID10
“Growth”
Snapshots
Backup
Archive
Disk
Backup
Data
RAID10
“Growth”
Snapshots
Data
RAID10
“Growth”
Snapshots
Backup
DR
Mirror
Data
RAID10
“Growth”
Snapshots
Data
RAID10
“Growth”
Snapshots
Snap-
shots
Data
RAID10
“Growth”
Snapshots
Over-
provision
Data
RAID10
“Growth”
~10x +
RAID 10
Overhead
Data
RAID10
Data
- Power consumption is roughly linear in
the number of naïve (full) copies
Result of Redundancy
Positive Effect of
Green Storage Technologies
RAID 5/6 Thin
Provisioning
Virtual
Clones
Dedupe
&
Compression
1 TB
5 TB
10 TB
Data
RAID10
“Growth”
Snapshots
Data
RAID10
“Growth”
Snapshots
Backup
Archive
Test
Test
Test
Test
Test
DataRAIDDP
“Growth”
Snapshots
DataRAID DP
“Growth”
Snapshots
Backup
Archive
Test
Test
Test
Test
Test
- Green storage technologies use less raw
capacity to store and use the same data set
- Power consumption falls accordingly
DataRAIDDP“Growth”
Snapshots
DataRAID DP“Growth”
Snapshots
Backup
Archive
Test
Test
Test
Test
Test
DataRAIDDP“Growth”
Snapshots
DataRAID DP“Growth”
Snapshots
Backup
Archive
Test
Test
Test
Test
Test
DataRAIDDP“Growth”
Snapshots
DataRAID DP“Growth”
Snapshots
Backup
Archive
Test
Test
Test
Test
Test
DataRAIDDP“Growth”
Snapshots
DataRAID DP“Growth”
Snapshots
BackupArchive
Multi-
Use
Backups
Green Storage Technologies
Enabling technologies
Storage virtualization
Storage capacity planning
Green software
Compression
Snapshots
Virtual (writeable) clones
Thin provisioning
Non-mirrored RAID
Deduplication and SIS
Resizeable volumes
Typical Savings
Thin provisioning
40 - 60%
Average 30% utilization over 80% utilization
RAID 6
35%
For 14-disk RAID 6 set, compared to RAID 1/10
Deduplication
40 – 95%, depending on dataset and time interval
~ 40 – 50% average over time
Resizeable volumes
20 – 50%
Green Storage Technologies
(cont.)
Other storage technologies and power saving techniques
Capacity vs. high performance drives
ILM / HSM
MAID
SSDs
Power supply and fan efficiencies
Facilities-side technologies
Hot aisle/cold aisle
Water & natural cooling
Flywheel UPSs
Savings Matrix
C SS VC TP R DD RV
Compression (C)
Snapshots (SS)
Virtual Clones (VC)
Thin Provisioning (TP)
RAID (R)
Deduplication (DD)
Resizeable Vols (RV)
Savings can multiply in combinations with checkboxes
SNIA Green Efforts
SNIA Green Storage Initiative (GSI) and SNIA Green Storage Technical
Work Group (TWG)
on-going efforts to develop data-driven green standards & metrics
power measurements at multi-vendor “unplugged” fests
alliances with other active green organizations
(The Green Grid, 80PLUS/Climate Savers, DMTF, SPEC, SPC)
collaboration with EPA on the ENERGY STAR for Storage program
Whitepapers / workshops
four tutorials at SNW; online tutorials available
(www.snia.org/education/tutorials)
white papers from GSI
Cloud Computing and Storage
IDC: Worldwide IT Cloud Services Spending*/**
54
Storage5%
Server9%
App Dev &
Deployment11%
Business Applications
57%
Infrastructure Software
18%
Storage13%
Server8%
App Dev &
Deployment9%
Business Applications
52%
Infrastructure Software
18%
2008
$16.2 billion
2012
$42.3 billion
* by Product/Service Type, 2008 & 2012
** Includes enterprise IT spending on Business Applications, Systems Infrastructure
Software, Application Development
& Deployment Software, Servers and Storage
Source: IDC - IT Cloud Services Forecast - 2008, 2012: A Key Driver of New Growth
$5.5 billion
Some basic cloud storage
attributes
Pay as you go
Self service provisioning
Scalable, Elastic
Rich application interfaces
No need for consumers to directly manage their own storage
resource
By offloading the Storage Management, data
owners can focus more on the management of data
requirements ...
Cloud Computing Perceived Benefits
and Demand Drivers
Cloud computing‟s “nirvana-like”
promise drives higher service
level expectations among
business entities and individual
users
Which in turn puts pressure on
the enterprise data center to
deliver higher service quality (at
lower cost)IT Providers
Key Benefit:
Competitivenes
Lower TCO
Faster Time to Market
Higher Cust Rentention
Service quality
Resource optimization
Resiliency
Flexibility
Efficiency
“Green”
Enhanced chargeback
Business Entities
Key Benefit:
Innovation
Faster, easier innovation
New business models
New products and services
Faster time to market
Lower IT cost
Lower IT risk (brand
protection)
Improved IT user productivity
Improved Client Satisfaction
Improved Disaster Recovery
IT Users
Key Benefit:
Quality of Experience
Speed of access
Ease of access (anywhere,
anytime)
Ease of use
Minimal software requirements
on access device
No long-term commitments
What is Cloud Storage?
Cloud Storage can be contrasted with SAN/NAS storageBoth are “Storage Networking”
Provisioning may be different (some interfaces do not require this)
How you pay for it may be different
One primary difference is that essential management tasks for storage resources are performed by the Cloud operator and not the storage user
Public Storage CloudsLatency may be an issue for most enterprise applications
Primarily aimed at web-facing applications that already serve data over the web
Importance of SLA Management
Private Storage CloudsCan be either web-facing or used for enterprise applications
Can be operated by internal IT departments – driving costs down and achieving better utilizations
Importance of SLA Management
Hybrid use of public and private clouds (including existing data centers)
This is not only about capacity provisioningData Assurance, Security, Delivery, Migration…
Leverage Virtualized and Self*/Automated Management EnvironmentsAlso part of Virtual Data Centers
Some Examples of Cloud Interfaces
De facto and proprietary interfaces
Amazon S3 (http://aws.amazon.com/s3) “As simple as possible, but no simpler”
GoGrid (http://wiki.gogrid.com/wiki/index.php/Cloud_Storage)
Some offer standard data path APIs, but allocation and provisioning are behind “storefronts” or proprietary APIs
SAMBA, RSYNC, SCP – “standard” open source
Microsoft Azure Interface
De jure APIs
WebDAV (http://www.ietf.org/rfc/rfc2518.txt)
iSCSI (http://www.ietf.org/rfc/rfc3720.txt)
NFS (http://www.ietf.org/rfc/rfc3530.txt)
FTP (http://www.ietf.org/rfc/rfc959.txt)
But very few of these interfaces support the use of
metadata on individual data elements
Cloud Storage:
Use Cases and Requirements
Store my file and give me back a URL (i.e. Amazon S3)
Best Effort Quality of Service?
Provision a filesystem and mount it (i.e. WebDAV)
Quality of Service specification via provisioning interface
Give me Filesystems/LUNs for my Cloud Computing
NAS box in the cloud…
Store my backup files until I need them back
Maybe offer me a local cache as well
Archive my files in the Cloud for Preservation/Compliance
Maybe offer me eDiscovery services, “tape in the mail” retrieval
Store all my files, allowing me to set the Data Requirements, let me cache
and distribute geographically
Policy driven Data Services based on Data System Metadata markings
Types of APIs
Besides the “Data Path” APIs (previous slide), there are other interfaces that Cloud Storage may require
E.g. Storage Provisioning
For certain types of data storage interfaces (block, file) from the cloud you will need to provision/allocate storage before you can use it
This provisioning can be done via a UI or an API
Existing standards can be leveraged (e.g. SNIA SMI-S)
E.g. Storage Metering
Since the cloud storage paradigm is “pay as you go”, you need to know what your bill will be at the end of the billing cycle
What operations affect my bill?
UI typical, but an API standard would enable interoperability and better automation
Telecom Industry Practice – every transaction has a “Call
Detail Record” that is aggregated for billing
Some Example
Data Storage Interfaces
Block Interfaces
SCSI, ATA, IDE
Local File Interfaces
POSIX, NTFS
Network File Interfaces
NFS, CIFS, SMB2, Appletalk, Novell, AFS
Object Based
OSD, XAM
Database
JDBC, ODBC
Not all of these make sense for the Cloud
Cloud API to the
Resource Domain Model
Cloud interfaces with all 3 domains (Information, Data, Storage)
Integration of services with different type of Clouds (Compute, Applications...)
Federation of Clouds
Cloud Exchange, Cloudbursting…
Data Movement
Migration, Delivery, Regulations
XAM API: an example
Data Storage Interface
XAM is the first interface to standardize system metadata for retention of data
Given this we can see that XAM is a data storage interface that is used by both Storage and Data Services (functions)
XAM implements the basic capability to Read and Write Data (through Xstreams)XAM has the ability to locate any XSet with a query or by supplying the XUID
XAM allows Metadata to be added to the data and keeps both in an XSet objectXAM uses and produces system metadata for each XSet
For example Access and Commit times (Storage System Metadata)
But it also uniquely specifies Data System Metadata for Retention Data Services
XAM User metadata is un-interpretable by the system, but stored with the other data and is available for use in queries
Standards for Cloud Storage
Service access interfaces
Storage service interfacesProvisioning
QOS
Performance management
Chargeback accounting
Data protection
Storage Security
Storage infrastructure management interfaces (SMIS)
Service Management
SOA
Application
Middleware
Virtualized Infrastructure
Server / Storage / Network
Virtu
al Im
ag
e M
an
ag
em
en
t
Clo
ud
Se
rvic
e U
se
r
Compute
SNIA Cloud Technical Work Group
www.snia.org/cloud
Engaging the industry
http://groups.google.com/group/snia-cloud
Alliances
Education & Whitepapers
Use Cases & Taxonomy
Interface Specification
And coming soon to Brazil! Cloud Storage Brasil
http://groups.google.com/group/snia-cloud-br?hl=pt-br
PRESENTATION TITLE GOES HEREThank You
Muito Obrigado!
www.snia.org
www.snia.com.br