implement object storage with smr based key-value store · pdf fileimplement object storage...

29
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved. Implement Object Storage with SMR based Key-Value Store [email protected] [email protected] Huawei Technologies Co.

Upload: phamthien

Post on 14-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Implement Object Storage with SMR based Key-Value Store

[email protected] [email protected]

Huawei Technologies Co.

Page 2: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Agenda

Object Storage Market Overview Object Storage Design with SMR based Key-Value Store

Summary Future Works

Page 3: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Big Data

Cloud

BYOD Media &Entertainment

Virtualization SDS 2020 Total Capacity 40ZB(Gartner)

Massive Data Storage Trend

A Revolution That Will Transform How We Live, Work, and Think ······ ——Kenneth Cukier

Page 4: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Components and Characteristics of Massive Data

Data Components

Video

Music

Picture

Data file

Email

Seldom updated

Undefined value

Large capacity and high growth speed

Long storage time

25%

75% (unstructured data)

Object Storage Technology matches these requirements

Page 5: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMR matches Object Storage Market

Object Storage Requirement Huge volume need large capacity drive. Competitive TCO need cheap storage media Write once few modification matches SMR write out-of-place feature

SMR Technology Background

Type – Drive managed,Host Aware, Host Managed Standard – ZBC, ZAC Industry – all HDD vendors will release SMR 2015~2016

Huawei cooperation with HDD vendors on SMR: http://events.linuxfoundation.org/sites/events/files/slides/SMR%20in%20Linux%20Systems%20-%20Vault.pdf http://www.hgst.com/company/media-room/press-releases/HGST-Delivers-Worlds-First-10TB-Enterprise-HDD-for-Active-Archive-Applications

Page 6: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Agenda

Object Storage Market Overview Object Storage Design with SMR based Key-Value Store

Summary and Future Works

Page 7: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Huawei Object Storage Architecture

Standalone Key Value Store, Provide simple KV access on HDD/SDD

Distributed Object Pools, Provides redundant KV access, like replica and Erasure code

Service, Provide S3/Swift like access

Infrastructure

Standalone Key-Value Store

Services (S3/Swift)

Protocols Cluster C

ontrol

Cloud M

anagement

Distributed Object Pools

Page 8: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Why Key-Value(KV) based not Logical Block Address(LBA)

LBA=Logical Block Address KV=Key Value

metadata

Complicated (Huge Metadata)

Simple (LBA)

Complicated (KV)

Simple Name Policy

Standalone Drive Layer

Distribution Layer

get() / put()

Page 9: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Huawei Key Value Store(KVS) Data Model

container … …

Object Object Object … …

container

SSD/HDD KVS

container

One KVS has multiple containers/Pools Every Container has it own policy, like key size, value size, shared allocation/ reserved allocation, delete policy, etc… Access Object by KV API, Object can store metadata

Page 10: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Key Value Store(KVS) API

Pool (Container) operations: Create(name, config_file, pl_id) Destroy(name) Open(name, mode, pl_id) Close(pl_id) Set_prop(pl_id, prop, value) Get_prop(pl_id, prop, value) Get_stats(pl_id, stats) Xcopy(src, dest, flag, regex, regex_len) …

Object operations:

put (pl_id, key, value, kv_props, put_opts) get(pl_id, key, value, kv_props, get_opts) del(pl_id, key, value, kv_props, del_opts) Iter_open(pl_id, flag, regex, regex_len, limit, *iter_id) Iter_next(pl_id, iter_id, kvarray) Iter_close(pl_id, iter_id)

Page 11: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

KVS core --- LDB (Log structured DB) Modules

OM, Operate & Maintenance Layout & Scheduler provides SMR write-out-of place allocation and IO stack.

KV Space Management (Layout)

IO Scheduler

Key Index &Cache

KV Record Manager

KV Background

Task OM

Object Semantic Layer

Page 12: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMR IO stack

SMR lib

ioctl(*, SG_IO, *)

sense code parser

Application(LDB) smr_read(), smr_write(),

SCSI low level driver

SCSI middle level driver (include libata)

SD

libAIO

Asyc IO

aio_read(), aio_write() ...

AIO for parallel Access

SMR lib for SMR new commands in user space

AIO only get EIO when error, SMR lib can get sense code then parse detail error

Page 13: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

LDB KV access Overview

KV Record Manage

1. get(pl_id, * key, * value, * props, * getopts);

Key Index &Cache

key1 kvr_offset1 Key2 key_offset2 key3 key_offset3

2. find kvr_offset with key.

3. read KVR from drive based on kvr_offset.

Drive is divided into zones, and zone size align media characteristics (256M for SMR). Store KVR(Key Value Record) in Zone.

Zone Zone Zone … Zone

KVR KVR KVR KVR KVR KVR KVR KVR … KVR

Page 14: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Zone based Layout

Zone Type Sub Zone Type Function

Super Zone Super Zone Store meta data information

Data Zone

Index Zone Store memory index checkpoint to make boot faster

KVR Zone Store generic Object KVR information

Tombstone Zone Store deleted Object KVR information

Reserve Zone Reserve Zone For Add new functions in the future

Super Zone

Data Zone

Date Zone … Data

Zone Reserved

Zone

Page 15: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Super Zone

One Super Zone has many super block (SB), every SB stores LDB metadata of one specified time, seq is used to record the sequence of SB.

There are more than one super zone, and the super block is stored sequentially with log style, not overwrite; after write new super zone the old super zone can be reset and reuse again.

Super Zone

seq … zone _size

Zone _nr … space

_info zone _info … pool

_info tail

_seq

Super Block1

Super Block2 … … Super

Block n

Page 16: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Detail Information of SB

Space Info Allocation statistics KVR statistics garbage information

Zone Info

Index zone info Data zone info Tombstone info

Pool Info

Pool numbers Pool name, id, capacity Pool key hints, value hints, policies

Page 17: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

KVR in Zone

KVR1 KVR2 KVR3 … KVRn Disk Index Table

Data Zone

generation … pool_id key value meta pre_kvr checksum

Each Object is stored as KVR, and KVR allocation is log style as SMR required write out-of-place. KVR has pool_id field, then multiple pool’s KVR allocation can share one zone. Each KVR can store meta-data, upper layer application will leverage. KVR has pre_kvr field when delete/overwrite exist key, at that time it will generate a tombstone KVR in tombstone zone. At the end of each data zone, put all the KVR index together as disk index table, for recovery oriented design.

Page 18: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Index Zone Definition

IZ Head Cell1 … Celln … IZ

Taile Pad

IZ, Index Zone Seq, identify an fully index zone entry. magic, index zone entry magic number Bucket, Memory index is organized as bucket, each bucket is about1MB To make boot faster, memory index will make checkpoint and store into cell with log style.

Index Zone

seq magic … Index Table Bucket

Page 19: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMR drive ZONE layout

0TB(OD)

x TB (ID)

0TB(OD)

xTB (ID)

SMR Drive Throughput depends on ZONE layout on CHS (Cylinder, Head, Sector). Two kinds of zone layout, performance differs at OD(Outer Disc), MD(Middle Disc), ID(Inner Disc)

Access Zones at OD/MD/ID, then measure performance, random 4K accesses at the head of each zone. And test SMR related command latency.

Vendor1 (HA)

Vendor2 (HM)

Vendor3 (HM)

report zone 20ms 24ms 388ms open zone X X X close zone X X X write point

reset 428ms 1353ms 456ms

finish zone X X X

Page 20: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMR drive IOPS

0100200300400500600700800

ODMDID

Different IOPS at OD/MD/ID

Page 21: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMR drive Throughput

0

50

100

150

200

250

ODMDID

Different throughput at OD/MD/ID

MB/s

Page 22: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

LDB KV Throughput 5 threads (HA SMR vs. CMR)

Test Environment: • ARM 7+1 GB • Linux 2.6.34 • LDB • KV test tool

0

20

40

60

80

100

120

1MB 512KB 256KB 64KB

(MB

/s)

5 threads write

HA SMR

CMR

0

10

20

30

40

50

60

70

80

90

1MB 512KB 256KB 64KB

(MB

/s)

5 threads random read

HA SMR

CMR

HA SMR throughput is half of CMR now

Page 23: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

LDB KV Latency 5 threads (HA SMR vs. CMR)

0

20

40

60

80

100

120

140

1MB 512KB 256KB 64KB 4KB

HA-SMR PUT-avg(ms)

HA SMR SEQ GET-avg(ms)

HA SMR RANDOM GET-avg(ms)

CMR PUT-avg(ms)

CMR SEQ GET-avg(ms)

CMR RANDOM GET-avg(ms)

Now, HA SMR latency is higher than CMR(xx ms level), and lots of jitter

Page 24: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

LDB KV Throughput & Latency over HM SMR

Test Environment: • X86+4 GB (HM need HBA FW & driver modification) • Linux 3.0.76 • LDB • KV test tool

0

20

40

60

80

100

120

HM PUT (MB/s) HM SEQ GET(MB/s)

HM RANDOMGET

(MB/s)

1MB

512KB

256KB

64KB

4KB

0

5

10

15

20

25

PUTavg(ms)

SET GETavg(ms)

RANDOM GETavg(ms)

1MB

512KB

256KB

64KB

4KB

HM SMR throughput is close to CMR, even better some time

HM SMR latency like CMR, and there are few jitter

Page 25: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Agenda

Object Storage Market Overview Object Storage Design with SMR based Key-Value Store

Summary and Future Works

Page 26: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Summary

LDB is log structure and in memory index design, most operations are “1 memory access + 1 disk access” , and all writes(include random) are sequential.

Memory Index Table can be swap to disk depends on memory size configuration

Recovery oriented design atomic, write out-of-place not write-in-place Super block tracks zone allocation and pool configuration Memory index table checkpoint KVR(s) in one zone are packed together

Page 27: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Future Work 1: SMR drive related

SMR drives have new sense code, how to do SMR handle error?

Check libata translate sense code well for ZAC?

Standards for these sense code and translation…

IOCTL is synchronous IO model, check the performance?

Confirm IOCTL works well with NCQ?

Page 28: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Future Work 2: Integrate with Applications

Applications Delete and update data will cause garbage collection (GC), and GC on SMR will use reset write pointer. How to design efficient GC? reset write pointer may cause FTI (Far Track Interference ) and drives have new sense code, how to do SMR handle error?

Application may put KVR with meta-data, how to implement application aware metadata processing?

Page 29: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low

2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

29

Thank You