High Availability
.
Mission Statement
1. high availability business-level cloud data store2. federated clouds = diversification3. many DCs and/or cloud providers
4. we care mostly about performance = high availability
5. practical solutions are needed
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 2/21...
2/21
.
haStore : The Short Story
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 3/21...
3/21
.
haStore: One DC is Not Enough
• rememberJune 2013?• most services today use vertical intergration -- no diversity
• Hitachi does not share DCs with NEC
• regional diversity of one provider is bad◦ how many Amazon DCs in Japan?
.(the only possible) Solution..
.
... is to sign contracts with multiple DCs and manage on
client side
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 4/21...
4/21
.
haStore: One DC is Not Enough
Kansai
DC1
Okinawa Locations
Data Centers
DC2
Kyushu
Osaka Office DC1
DC1 DC2 Naha Office
Network distance
Network distance
storage network
Employee A …. Content / Social Metadata High Availability Data Store DC1 DC2 ….
DC1 DC2 Business trip
Store APIs
Proposed Software
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 5/21...
5/21
.
haStore: Store Diversification• store = sum ofmultiple substores• in software: not a priority list -- optimization engine!• realtime performance monitoring, read/write optimization, etc.
• sub-file data unit -- chunks
SSD Growing network
distance User
HDD DC1 DC2 …
Network
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 6/21...
6/21
.
haStore: Socially Aware Store• content relevance based on
social graph• relevance is a distribution• individual redundancy based on distribution
• other link types: same time, location,filetype, ...
• link strength != 1Descending
order
Relevance
Distribution
Redundancy (user setting)
Physical limit of redundancy
End of content
There is a link
When a file is …
Between Created Viewed Edited Deleted
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 7/21...
7/21
.
hsStore: Software Design
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 8/21...
8/21
.
Design: Specs
• many substores, heterogeneous e2e performance and capacity• each substore has its own API (Dropbox, GDrive, SSD, etc.), but haStore exports a
generic API• data unit: sub-file blobs, for now fixed 100kb size
• social graph is used to define priority lists of files◦ different for each user
• optimization is key element of software engines
1. sync logic2. redundancy logic
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 9/21...
9/21
.
Design: API Stack
• Generic API starts fromLevel 2, similar to drivers
• the stack is implemented by each client = each user
Employee A …. Content / Social Metadata High Availability Data Store DC1 DC2 …. Store
Proposed Software
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 10/21...
10/21
.
Design: Sync Engine• optimization for throughput minimization• same logic for SSD, HDD and over-the-network
haStore
Storage SyncEngine
Optimization
LocalCache
Check1 2
Use
GUI,Clients
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 11/21...
11/21
.
Design: Sync Engine Logic
Bulk
Thro
ughp
ut History Data
Increase timeout
PerformanceTradeoff
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 12/21...
12/21
.
Design: Redundancy Logic (1)
Descending order
Relevance
Distribution
Redundancy (user setting)
Physical limit of redundancy
End of content
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 13/21...
13/21
.
Design: Redundancy Logic (2)
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 14/21...
14/21
.
haStore: Social Graph
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 15/21...
15/21
.
Social Graph : Basics• current version: only simple types of links
• no link strength
There is a link
When a file is …
Between Created Viewed Edited Deleted
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 16/21...
16/21
.
Social Graph : Advanced
• community detection
• files that could be linked:
1. touched at roughly the same time2. touched by the same user3. same location, filetype, size, etc.
• link strength, different for each kind of relation, variable e2e cost onpaths
• discovery based on e2e cost, not hop count
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 17/21...
17/21
.
Implementation, Tests
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 18/21...
18/21
.
Performance : Demo
A-san B-san
DBX GDR
2014-01-22 12:13:30 Block DONEBlock UPLOADBlock DOWNLOAD
• also demo
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 19/21...
19/21
.
Wrapup
• haStore: high availability cloud store
• main features
◦ throughput-aware sync/redundancy optimization◦ sub-file blocks, smart distribution
◦ social graph• current status: v1.0 in operation, v2.0 on the way
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 20/21...
20/21
.
That’s all, thank you ...
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 21/21...
21/21