performance measurements of a user-space dafs server … · page 7 august 27, 2003 oracle disk...

22
© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space DAFS Server with a Database Workload Samuel A. Fineberg Don Wilson NonStop Labs

Upload: phamdat

Post on 30-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

©20

03 H

ewle

tt-Pa

ckar

d D

evel

opm

ent C

ompa

ny, L

.P.

The

info

rmat

ion

cont

aine

d he

rein

is s

ubje

ct to

cha

nge

with

out n

otic

e

Perf

orm

ance

Mea

sure

men

ts o

f a

Use

r-Sp

ace

DA

FS S

erve

rw

ith a

Dat

abas

e W

orkl

oad

Sam

uel A

. Fin

eber

gD

on W

ilson

Non

Stop

Lab

s

page

2A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

Out

line

•Ba

ckgr

ound

on

DA

FS a

nd O

DM

•Pr

otot

ype

clie

nt a

nd s

erve

r•

I/O

tests

per

form

ed•

Raw

ben

chm

ark

resu

lts•

Ora

cle

TPC

-H re

sults

•Su

mm

ary

and

conc

lusi

ons

page

3A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

Wha

t is

the

Dire

ct A

cces

s Fi

le S

yste

m (D

AFS

)?

•C

reat

ed b

y th

e D

AFS

Col

labo

rativ

e–

Gro

up c

onsi

sting

of o

ver 8

0 m

embe

rs fr

om in

dustr

y, g

over

nmen

t, an

d ac

adem

ic in

stitu

tions

–D

AFS

1.0

spe

c w

as a

ppro

ved

in S

epte

mbe

r 200

1•

DA

FS is

a d

istri

bute

d fil

e ac

cess

pro

toco

l–

Dat

a re

ques

ted

from

file

s, n

ot b

lock

s–

Base

d lo

osel

y on

NFS

v4•

Opt

imiz

ed fo

r loc

al fi

le s

harin

g en

viro

nmen

ts–

Syste

ms

are

in re

lativ

ely

clos

e pr

oxim

ity–

Con

nect

ed b

y a

high

-spee

d lo

w-la

tenc

y ne

twor

k•

Built

on

top

of d

irect

-acc

ess

trans

port

netw

orks

–In

itial

ly ta

rget

ed a

t Virt

ual I

nter

face

Arc

hite

ctur

e (V

IA) n

etw

orks

–D

irect

Acc

ess

Tran

spor

t (D

AT)

API

was

late

r gen

eral

ized

and

por

ted

to

othe

r net

wor

ks (e

.g.,

Infin

iban

d, iW

arp)

page

4A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

Char

acte

ristic

s of

a D

irect

Acc

ess

Tran

spor

t

•C

onne

cted

mod

el, i

.e.,

VIs

mus

t be

conn

ecte

d be

fore

co

mm

unic

atio

n ca

n oc

cur

•Tw

o fo

rms

of d

ata

trans

port

–Se

nd/r

ecei

ve –

two-

side

d–

RDM

A re

ad a

nd w

rite

–on

e si

ded

•Bo

th tr

ansp

orts

supp

ort d

irect

dat

a pl

acem

ent

–Re

ceiv

es m

ust b

e pr

e-po

sted

•M

emor

y re

gion

s m

ust b

e “r

egis

tere

d” b

efor

e th

ey c

an b

e tra

nsfe

rred

thro

ugh

a D

AT

–Pi

ns d

ata

in p

hysi

cal m

emor

y–

Esta

blis

hes

VM n

slatio

nta

bles

for t

he N

IC

page

5A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

DA

FS D

etai

ls

•Se

ssio

n ba

sed

–D

AFS

clie

nts

initi

ate

sess

ions

with

a s

erve

r–

DA

T/VI

A c

onne

ctio

n is

ass

ocia

ted

with

a s

essi

on

•RP

C-li

ke C

omm

and

form

at–

Impl

emen

ted

with

sen

d/re

ceiv

e–

Serv

er “

rece

ives

” re

ques

ts “s

ent”

from

clie

nts

–Se

rver

“se

nds”

resp

onse

s to

be

“rec

eive

d” b

y cl

ient

•O

pen/

Clo

se–

Unl

ike

NFS

v2, f

iles

mus

t be

open

and

clo

sed

(not

sta

tele

ss)

•Re

ad/W

rite

I/O

“m

odes

”–

Inlin

e: li

mite

d am

ount

of d

ata

incl

uded

in re

ques

t/re

spon

se

–D

irect

: Ser

ver i

nitia

tes

RDM

A re

ad o

r writ

e to

mov

e da

ta

page

6A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

Inlin

e vs

. Dire

ct I/

OTime

Clie

ntSe

rver

Clie

ntSe

rver

Inlin

eD

irect

Resp

onse

Requ

est

disk

read

or w

rite

Requ

est

disk

writ

eRD

MA

read

Resp

onse

disk

read

RDM

A w

rite

Resp

onse

Requ

est

Dire

ct

writ

e

Dire

ct

read

Inlin

e

Read

or w

rite

page

7A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

Ora

cle

Dis

k M

anag

er (O

DM

)

•Fi

le a

cces

s in

terfa

ce s

pec

for t

he O

racl

e D

atab

ase

–Su

ppor

ted

as a

sta

ndar

d fe

atur

e in

Ora

cle

9i–

Impl

emen

ted

as a

ven

dor s

uppl

ied

DLL

–Fi

les

that

can

not

be

open

ed u

sing

OD

M u

se s

tand

ard

API

s•

Basi

c co

mm

ands

–Fi

les

are

crea

ted

and

pre-

allo

cate

d th

en c

omm

itted

–Fi

les

are

then

“id

entif

ied”

(ope

n) a

nd “

unid

entif

ied”

(clo

sed)

–A

ll r/

w I/

O u

ses

an a

sync

hron

ous

“odm

_io”

com

man

d•

I/O

s sp

ecifi

ed a

s de

scrip

tors

, mul

tiple

per

odm

_io

call

–M

ultip

le w

aitin

g m

echa

nism

s: w

ait f

or s

peci

fic, w

ait f

or a

ny

–O

ther

com

man

ds a

re s

ynch

rono

us, e

.g.,

resi

zing

page

8A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

Prot

otyp

e Cl

ient

/Ser

ver

•D

AFS

Ser

ver

–Im

plem

ente

d fo

r Win

dow

s 20

00 a

nd L

inux

(all

testi

ng w

as

on W

indo

ws)

–Bu

ilt o

n VI

PL 1

.0 u

sing

DA

FS 1

.0 S

DK

prot

ocol

stu

bs–

All

data

buf

fers

are

pre

-allo

cate

d an

d pr

e-re

giste

red

–D

ata-

driv

en m

ultit

hrea

ded

desi

gn•

OD

M C

lient

–Im

plem

ente

d as

a W

indo

ws

2000

dll

for O

racl

e 9i

–M

ultit

hrea

ded

to e

nabl

e de

coup

ling

of a

sync

hron

ous

I/O

fro

m O

racl

e th

read

s–

Inlin

e bu

ffers

are

cop

ied,

dire

ct b

uffe

rs a

re

regi

stere

d/de

regi

stere

d as

par

t of t

he I/

O–

Inlin

e/di

rect

thre

shol

d (s

et w

hen

libra

ry is

initi

aliz

ed)

page

9A

ugus

t 27,

200

3Fi

nebe

rg a

nd W

ilson

NIC

ELI P

rese

ntat

ion

Test

Sys

tem

Con

figur

atio

n

•G

oal w

as to

com

pare

loca

l I/O

with

DA

FS•

Loca

l I/O

con

figur

atio

n–

Sing

le s

yste

m ru

nnin

g O

racl

e on

loca

lly a

ttach

ed d

isks

•D

AFS

/OD

M I/

O c

onfig

urat

ion

–O

ne s

yste

m ru

nnin

g D

AFS

ser

ver s

oftw

are

with

loca

lly

atta

ched

dis

ks–

Seco

nd s

yste

m ru

nnin

g O

racl

e an

d O

DM

clie

nt, f

iles

on

DA

FS s

erve

r acc

esse

d us

ing

OD

M o

ver a

net

wor

k•

4-pr

oces

sor W

indo

ws

2000

ser

ver b

ased

sys

tem

s–

500M

Hz

Xeon

, 3G

B RA

M, d

ual-b

us P

CI 6

4/33

–Se

rver

Net

II (V

IA 1

.0 b

ased

) Sys

tem

Are

a N

etw

ork

–D

isks

wer

e 15

K RP

M a

ttach

ed b

y tw

o PC

I RA

ID c

ontro

llers

, co

nfig

ured

for R

AID

1/0

(mirr

ored

-strip

ed)

page

10

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Expe

rimen

ts

•Ra

w I/

O T

ests

–O

dmbl

ast –

strea

min

g I/

O te

st–

Odm

lat –

I/O

late

ncy

test

–D

AFS

tests

use

d O

DM

dll

to a

cces

s fil

es o

n D

AFS

ser

ver

–Lo

cal t

ests

used

spe

cial

loca

l OD

M li

brar

y bu

ilt o

n W

indo

ws

unbu

ffere

d I/

O•

Ora

cle

data

base

test

–St

anda

rd T

PC-H

ben

chm

ark

–SQ

L ba

sed

deci

sion

sup

port

code

–D

AFS

tests

use

d O

DM

dll

to a

cces

s fil

es o

n D

AFS

ser

ver

–Lo

cal t

ests

used

ran

with

out O

DM

(Ora

cle

uses

win

dow

s un

buffe

red

I/O

dire

ctly

)

page

11

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Odm

blas

t

•O

DM

bas

ed I/

O s

tress

test

–In

tend

ed to

pre

sent

a c

ontin

uous

load

to th

e I/

O s

yste

m–

Issue

s m

any

sim

ulta

neou

s I/

Os

(to a

llow

for p

ipel

inin

g)•

In o

ur e

xper

imen

ts, O

dmbl

ast s

tream

s 32

I/O

s to

ser

ver

–16

I/O

s pe

r odm

_io

call

–w

ait f

or I/

Os

from

the

prev

ious

odm

_io

call

•I/

Os

can

be re

ads,

writ

es, o

r a ra

ndom

mix

•I/

Os

can

be a

t seq

uent

ial o

r ran

dom

offs

ets

page

12

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Odm

blas

t rea

d co

mpa

rison

0.0

50.0

100.

0

150.

0

200.

0

250.

0

020

0000

4000

0060

0000

8000

0010

0000

0

I/O S

ize

(byt

es)

Bandwidth (MB/sec)

Loca

l Seq

Rd

Loca

l Ran

d R

dD

AFS

Seq

Rd

DA

FS R

and

Rd

page

13

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Odm

blas

t writ

e co

mpa

rison

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.

0

020

0000

4000

0060

0000

8000

0010

0000

0

I/O S

ize

(byt

es)

Bandwidth (MB/sec)

Loca

l Seq

Wr

Loca

l Ran

d W

rD

AFS

Seq

Wr

DA

FS R

and

Wr

page

14

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Odm

lat

•I/

O L

aten

cy te

st–

How

long

doe

s a

sing

le I/

O ta

ke

•(n

ot n

eces

saril

y re

late

d to

agg

rega

te I/

O ra

te)

–Fo

r the

se e

xper

imen

ts, <

16K

= in

line,

≥ 1

6K =

dire

ct–

Der

ived

the

com

pone

nts

that

mak

e up

I/O

tim

e us

ing

linea

r re

gres

sion

–M

ore

deta

ils in

pap

er

page

15

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Odm

lat p

erfo

rman

ce

0.0

1000

.0

2000

.0

3000

.0

016

384

3276

849

152

6553

6B

ytes

per

I/O

Ope

ratio

n

Time per Operation (microseconds)

Rea

d Ti

me

Writ

e Ti

me

page

16

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Ora

cle-

base

d re

sults

•St

anda

rd D

atab

ase

Benc

hmar

k -T

PC-H

–W

ritte

n in

SQ

L–

Dec

isio

n su

ppor

t ben

chm

ark

–M

ultip

le a

d-ho

c qu

ery

strea

ms

with

an

“upd

ate

thre

ad”

–30

GB

data

base

siz

e•

Ora

cle

conf

igur

atio

n–

All

I/O

s 51

2-by

te a

ligne

d (re

quire

d fo

r unb

uffe

red

I/O

)–

16K

data

base

blo

ck s

ize

–D

atab

ase

files

dis

tribu

ted

acro

ss tw

o N

TFS

file

syste

ms

•M

easu

rem

ents

–C

ompa

red

aver

age

runt

ime

for l

ocal

vs.

DA

FS b

ased

I/O

–Sk

ippe

d of

ficia

l “TP

C-H

pow

er”

met

ric–

Varie

d in

line/

dire

ct th

resh

old

for D

AFS

bas

ed I/

O

page

17

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Ora

cle

TPC-

H P

erfo

rman

ce

13:2

317

:13

14:3

9

loca

l D

AFS

16k

-di

rect

DA

FS 1

6k-

inlin

e

Time (Hrs:Min)

page

18

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Ora

cle

TPC-

H O

pera

tion

Dis

trib

utio

n

16 K

Byte

Rea

d79

.4%

>16

KByt

e W

rite

1.1%

>16

KByt

e R

ead

19.1

%

16 K

Byte

Writ

e0.

3%

page

19

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Ora

cle

TPC-

H C

PU U

tiliz

atio

n

0102030405060708090100

020

040

060

080

010

0012

00

Elap

sed

Tim

e (m

ins)

% CPU Used

Daf

s C

lient

(inl

ine

I/O)

DA

FS S

erve

r (in

line

I/O)

DA

FS S

erve

r (di

rect

I/O

)Lo

cal I

/OD

AFS

Clie

nt (d

irect

I/O

)

page

20

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

TPC-

H S

umm

ary

•Lo

cal I

/O s

till f

aste

r–

Limite

d Se

rver

Net

II b

andw

idth

–M

emor

y re

gistr

atio

n or

cop

ying

ove

rhea

d –

Win

dow

s un

buffe

red

I/O

is a

lread

y ve

ry e

ffici

ent

•D

AFS

stil

l has

mor

e ca

pabi

litie

s th

an lo

cal I

/O–

Cap

able

of c

luste

r I/O

(RA

C)

•M

emor

y re

gistr

atio

n is

stil

l a p

robl

em w

ith D

ATs

–Re

gistr

atio

n ca

chin

g ca

n be

pro

blem

atic

•C

an n

ot g

uara

ntee

add

ress

map

ping

s w

ill n

ot c

hang

e•

OD

M h

as n

o m

eans

for n

otify

ing

NIC

of m

appi

ng c

hang

es–

Nee

d ei

ther

bet

ter i

nteg

ratio

n of

I/O

libr

ary

with

Ora

cle

or

bette

r int

egra

tion

of O

S w

ith D

AT

•Tr

ansp

aren

cy is

exp

ensi

ve!

page

21

Aug

ust 2

7, 2

003

Fine

berg

and

Wils

on N

ICEL

I Pre

sent

atio

n

Conc

lusi

ons

•D

AFS

Ser

ver/

OD

M C

lient

ach

ieve

d pe

rform

ance

clo

se to

the

limits

of

our

net

wor

k–

Loca

l SC

SI I/

O w

as s

till f

aste

r•

Runn

ing

a da

taba

se b

ench

mar

k, D

AFS

TPC

-H p

erfo

rman

ce w

as

with

in 1

0% o

f loc

al I/

O–

Also

pro

vide

s ad

vant

ages

of a

net

wor

k fil

e sy

stem

(i.e

., cl

uste

ring

supp

ort)

•Lim

itatio

ns o

f our

tests

–Se

rver

Net

II ba

ndw

idth

was

inad

equa

te –

no s

uppo

rt fo

r mul

tiple

NIC

s–

Nee

ded

to d

o cl

ient

-side

regi

strat

ion

for a

ll di

rect

I/O

s•

TPC

-H b

ench

mar

k w

as n

ot o

ptim

ally

tune

d–

Nee

ded

to b

ring

clie

nt C

PU c

lose

r to

100%

•M

ore

disk

s, le

ss C

PUs,

oth

er tu

ning

–C

PU o

ffloa

d is

not

a b

enef

it if

I/O

is th

e bo

ttlen

eck

HP

logo