sponsored workshop by amplidata from structure:data 2012:
TRANSCRIPT
Storage facts and trends
Recent studies estimate that data storage capacities will likely increase by over 30X in the coming decade to over 35 Zettabytes
30X
35ZB
Time
Sto
rag
e C
on
sum
pti
on
High-capacity drivesLess Staff / TBUnstructured Data
2020
Friday, July 27, 2012
Storage facts and trends
But…. The number of qualified people to manage this huge volume of data will stay flat (~1.5X)
Administrators will be expected to manage 20X more data each
Time
Cap
city
/ B
ud
get
Efficiency: automate & reduce overhead
Storag
e Req
uirem
ents
Storage Budget
Friday, July 27, 2012
4
• Much of that growth (80%) is driven by unstructured data• Billions of large objects and files
Media Archives Online Images Large Files
Medical Images Online Storage Online Movies
Storage facts and trends
Friday, July 27, 2012
5
M&E is driving huge capacity requirements, both with file sizes and volume of files and storage capacities in use, driven by HD, 3D video formats:
“Petabytes are peanuts”
3TB per hour for 4K video
Storage facts and trends: Media & Entertainment Industry Example
Friday, July 27, 2012
Big Data for Analytics
7
• In the 90’s, we experienced an explosion of data captured for analytics purposes:• Academic Research• Chemical R&D facilities• Travel industry• Geo-industry, oil & gas• Financial / Trading• Agriculture
• In the 2000’s, online applications & social media triggered a flood of trend data
Friday, July 27, 2012
Big Data for Analytics
8
• Data is captured as many small log files & concatenated as “Big Data”
• Relational databases were not optimal: • Too much data, too big• Insufficient performance for analytics
• This stimulated innovations:• Hadoop, MapReduce, GFS• XML databases
• => This is Big Data for Analytics
Friday, July 27, 2012
Big Data Evolution
9
• Today, Big Data trend refers to Big Data for Analytics & Big Unstructured Data:
• Media• Streaming• Business• Scientific
• Fundamentally different data but with lots of similarities
• Immense capacities• Number of transactions or objects
• Unstructured data is traditionally stored on host files systems but:
• Host file systems impose fixed limits - do not scale up to the size we need
• File systems do not meet performance requirements due to host limiting access
Friday, July 27, 2012
Big Unstructured Data
10
• Most unstructured data is archived, often to tape (cost), then difficult to access
• Volumes are increasing exponentially
• Data archives are an organization & management burden (Grandma’s Attic)
Friday, July 27, 2012
Big Unstructured Data
11
• Companies are starting to see the value of the data in their archives:• Documents of individuals can be valuable for others• Some companies have legal reasons to keep data available• Unexplored analytics opportunities• This data can be mined and monetized
Friday, July 27, 2012
Big Unstructured Data
12
But how do store all this data in a cost efficient way?
“Building cost-efficient Live Archives”
Friday, July 27, 2012
Big Unstructured Data
13
What are the requirements?
• Tape is a difficult option: access latency is key
• Data has to be always available online
• Direct interface to the applications
• Petabyte scalability
• Extreme reliability, integrity
• Cost-efficient
• Security
Disk Storage (online, low-latency access)
+ Open application API’s (App & Cloud-enabled)
+ Ultra-high data durability (Erasure Coding)
= Optimized Object Storage
}}
Friday, July 27, 2012
Disk vs. Tape
14
Tape has several obvious advantages over disk & there will always be use cases for tape
But disks enable live archives with instant data accessibility
More arguments for disk-based archives• Disks can be powered down• Tape requires replication to protect against media errors• Data integrity checking • Massive migration projects• …
Friday, July 27, 2012
Object Storage Simplifies this Problem
15
• File System organization of data becomes a burden
• File systems impose limitations on numbers of files & directories
• Very time-consuming to organize data
• Object Storage simplifies this problem
• Flat “Namespaces” (not file systems) - without storage limits
• Let’s the applications talk directly to the Storage
• Use “Object” application API’s to let applications directly manage objects & metadata
• File Gateways can be used as a transition bridge
• Bring legacy data and apps into Object Storage
Object API
Application Application Application
Friday, July 27, 2012
Petabyte Scalability and Beyond
16
Systems should scale BIG• Beyond petabytes of data – no built-in limits• Beyond billions of data objects
Systems should scale uniformly• Add resources incrementally and grow as a Single System View• Manage from a “Single Pane of Glass”• Scale performance and capacity separately• Migration and seamless growth across newer generations of component
technologies (processors, disk densities)
Friday, July 27, 2012
Ultra-High Levels of Data Integrity
17
• Data needs to be archived for lifetimes• Expect “bit perfect” integrity to store gold-copy of critical assets• Consolidate multiple copies of data into a single highly-durable tier
• Ensuring the integrity of long-term unstructured data archive requires new data protection algorithms, to:• Address the increasing capacity of disk drives• Solve issues related to long RAID rebuild windows
“Object storage systems based on erasure-coding can not only protect data from higher numbers of drive failures, but also against the failure of entire storage modules.”
Friday, July 27, 2012
Big Unstructured Data
18
What are the requirements?
• Tape is a difficult option: access latency is key
• Data has to be always available online
• Direct interface to the applications
• Petabyte scalability
• Extreme reliability, integrity
• Cost-efficient
• Security
Disk Storage (online, low-latency access)
+ Open application API’s (App & Cloud-enabled)
+ Ultra-high data durability (Erasure Coding)
= Optimized Object Storage
}}
Friday, July 27, 2012