the case for open infrastructure services in java david culler computer science division u.c....

46
The Case for Open Infrastructure Services in Java David Culler Computer Science Division U.C. Berkeley www.cs.berkeley.edu/~culler Java Grande Dinner Keynote, June 2000

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

The Case for Open Infrastructure Services in Java

David Culler

Computer Science Division

U.C. Berkeley

www.cs.berkeley.edu/~culler

Java Grande Dinner Keynote, June 2000

6/4/2000 Java Grande 2

Servers

Clients

ClientsClients

ClientsClients

ClientsServers

Servers

Appetizer • ‘Grande’-scale computing dominated by internet services

• Delivered to millions per day on well-engineered clusters over service interfaces

6/4/2000 Java Grande 3

Servers

Clients

ClientsClients

ClientsClients

ClientsServers

Servers

Infrastructure Services

Opportunity: infrastructure services

• Prehistoric: DNS, IP route tables, …

• Historic: crawl, index, search,

• Emerging: compose and manipulate data and services

And clientdiversity hasjust begun!

6/4/2000 Java Grande 4

Servers

Clients

ClientsClients

ClientsClients

ClientsServers

Servers

Infrastructure Services

Open

Danger: loss of distributed innovation

• PC generation of individual authoring & distr.

• vs ATT, IBM, AOL scale service engineering

• …

6/4/2000 Java Grande 5

UCB Ninja Vision

• Open platform architecture for world-scale internet services

• receptive execution environment– push services into the platform

• scalability and availability “built-in”

• service composition as a first-class programming concept

=> make it easy to author and publish high quality services into a well-engineered infrastructure

..for example

6/4/2000 Java Grande 6

Example: Ninja Jukebox 98

CD “ripper”service

CDDBservice

Ninja iSpace

Fetches track/title & artist information from an online DB.

1

Ninja iSpace

Music Directoryservice

HTTPdservice

Pushes an index of locally available songs to the master directory.

2

WWW Browser

Web page with song playlists

3

.au/.mp3 player

Music stream (.au or .mp3)

4

Collaborative Community: anyone can add content

=> mp3.com, real jukebox, napster

Authentication and authorization was built-in

Jukebox 99: Music similarity query engine

=> mongomusic.com, ...

6/4/2000 Java Grande 7

AOL clientAOL client

ICQ clientICQ client

profileprofileDDSDDS

profileprofileDDSDDS

sanctio service

(cluster)

AOL protocol

ICQ protocol

AOL protocol

AOL worker

ICQ protocol

ICQ worker

english to

spanish

english to

spanish

english to

spanish

S. GribbleSantio: universal instant messaging

6/4/2000 Java Grande 8

Composable, Secure Proxy Architecture for Post-PC devices

Diverse Clients Internet

Services

Security Adpaters

Identity

Service

Transient

Store

DATEK(Trust Contract)Trusted

Client

PersonalAppl

SA

Format Transcoders

FT

FT

SA

Filter and Control ModifierEmbededUntrusted

Client

https

S. Ross, J. Hill

6/4/2000 Java Grande 9

Reduce value of the information

DATEK

6/4/2000 Java Grande 10

Example: eScience Services

‘Sugar’ MEMS simulation Service

LAPACK

ServicesNetsolver

Nodal

Modeling

6/4/2000 Java Grande 11

Outline

• Call for distributed innovation of scalable, composable services

• Wandering Down the Java Garden Path

• Returning to robust building blocks and design patterns

• Postprandial thoughts

6/4/2000 Java Grande 12

• Bases (1M’s)– scalable, highly available– persistent state– databases, agents– “home” base per user– service programming environment

Wide-Area Path

• Active Proxies (100M’s)– not packet routers – bootstrap thin devices into infrastructure– soft-state and well-connected

• Units (1B’s)– sensors / actuators– PDAs / smartphones / PCs– heterogeneous– Minimal functionality: “Smart Clients”

A ‘Structured Architecture’ Approach

6/4/2000 Java Grande 13

Guided by the CAP lemma

• Consider– Consistency

– Availability

– Operation in the presence of network Partitions

You may have any two of the three,

but not all three

• Example: replicate for availability– lose consistency upon update during partition

– or can defer the updates till healed

– or can engineer the system so no partition between replicas

6/4/2000 Java Grande 14

The Java “Apple”

• strong typing

• automatic memory management

• Concurrency built-in: Threads and Synchronized Methods– finally!

• Elegant remote access built-in: RMI– service lookup yields service

object stub

– transparent access

• Code mobility– traditionally for pulling down applets

on demand

6/4/2000 Java Grande 15

JVM provides service upload capability, plus strong typing of service interfaces. Distributed hash table API provide scalable, available hard state

RMI + authentication, encryption, multicast, user-level SAN speed.

Name service, RMI stub registry, and service control API:

• LoadService (URL)• interf.[ ]=ListServices• stub=GetService(name)• KillService(name)

Sandbox that contains untrusted, uploaded services.

Service is an interface, plus objects that implement that interface.

Tru

sted

Serv

ices

Ninja iSpace + RMI

iSpace Execution Environment

JVM + persistent store APIs

Security Mgr

Loa

der

UntrustedServices

iSpace

6/4/2000 Java Grande 16

SAN

Multispace Cluster Platform

iSpace

client

m-RMI stub

RMI “Redirector Stubs run-time compiled RMI superstub• stub selection policy• fail-over, •broadcast, multicast, fork, etc.

MultiSpace

Loader DDS

6/4/2000 Java Grande 17

After the garden: Post-Prototype Reality

• Powerful, attractive, tantalizing possibilities…– see examples ...

• Didn’t scale– service concurrency

– client population

– service diversity

• Wasn’t robust

• Lessons– Thread-per-task considered harmful

– Woes of blocking interfaces

– The Transparency trap

– Versions really matter

6/4/2000 Java Grande 18

Java RMI Thread-per-task services

• Server Thread per client thread– familiar per-task programming model, including RMI and I/O

• Socket per client JVM (or per thread, per stub!)

Client

ServiceRMI

RMI

Blocking

6/4/2000 Java Grande 19

The transparency trap

• Server commits thread regardless of client load

• Client places demand regardless of server concurrency

• || resource || to blocking composition depth• ease leads to fine grain use of remote objects• RMI “call backs” make client a server• lifetime and scope of remote object unlimited• inexpressive error model (wait or RemoteException)• serialization is costly

6/4/2000 Java Grande 20

Blocking + Thread = Non-blocking ???

• JAVA i/o and comm APIs all blocking!

• need JNI for select!

Keep going to the “thread well”

6/4/2000 Java Grande 21

Study a Service “test problem”

• A: popularity

• L: I/O, network, or service composition depth

Threaded server

task arrivals rate: A tasks / sec

# concurrent tasks in server: T = A x L task completions

rate: S tasks / sec

closed loop implies S = A

latency: L sec

dispatch( ) or create( )

6/4/2000 Java Grande 22

Response time vs S (= T/L)

1

10

100

1000

10000

1 10 100 1000

server throughput (S tasks/sec)

en

d-t

o-e

nd

ta

sk

late

nc

y (

ms

) L = 10ms

L = 50ms

L = 200ms

6/4/2000 Java Grande 23

0

500

1000

1500

2000

2500

1 10 100 1000 10000

# threads executing in server (T)

max

ser

ver

thro

ughp

ut(S

task

s/se

c)

1-w ay Java

4-w ay Java

ultra 170 and E450, Solaris 7.2, jdk 1.2.2

Threads are a limited Resource

• Fix L = 10 ms, for each T measure max A = S

• Cluster parallelism just raises the threshold

* CPU bound tasks saturate early

* focus on threads, footprint follows

6/4/2000 Java Grande 24

Alternative: queues, events, typed msgs

• server provides bounded resources at request interface

– chooses when to assign resources to request event

– imposes load-conditioning or admission control

• client retains control of its thread– chooses when to block

– permits negotiation protocol– key to service composition

• queues absorb load and decouple operations

• provide non-blocking interface• RMI as syntax sugar

Explicit

request queue

6/4/2000 Java Grande 25

Java Event-based Server

• Fixed # threads , independent of # concurrent tasks in server (A x L)

closed loop implies S = A

timer queue with latency:

L seconds

task arrivals rate: A tasks / sec

task completions rate: S tasks / sec

6/4/2000 Java Grande 26

0

1000

2000

3000

4000

5000

6000

1 10 100 1000 10000

# tasks in client-server pipeline

ma

x s

erv

er

thro

ug

hp

ut

(S t

ask

s/se

c)

1-way Java

4-way Java

Event-per-task saturates gracefully

• Better and more robust performance– Use cluster parallelism to match demand

• Decompose task into multiple events– circulate or pipeline

• but ...

6/4/2000 Java Grande 27

Down side of event approach

• Lose the familiar sequential programming (plus synchronization)

– need a handler per stage of the task

• Does not naturally exploit SMP parallelism– must pipeline multiple event handler blocks

• Blocking interfaces (or faults) cause throughput to follow 1/L in an event block!

6/4/2000 Java Grande 28

Hybrid, Robust building block

• Compose service as graph of task handlers– Decouple stages of task within a node

– Replicate across cluster nodes for scale and availability

• Thread parallelism and latency tolerance within task handler block (i.e., A x L < T per node)

Explicit event queue absorbs bursts of tasksallows introspection

Bounded thread pool of

T < T’ threads

Load conditioning point# concurrent tasks decoupled from # concurrent threads in server:

6/4/2000 Java Grande 29

Hybrid Performance

• Competitive with pure event block– small overhead due to extra threads

• Upon blocking op, throughput tracks T/L

0

400

800

1200

1600

1 10 100 1000 10000

# tasks in client-server pipeline

serv

er

thro

ug

hp

ut

(S t

ask

s /

sec)

Ultra 1

6/4/2000 Java Grande 30

Four key task handler design patterns

• Wrap

• Pipeline

• Replicate

• Combine

6/4/2000 Java Grande 31

Wrap

• Take arbitrary piece of code:

• place queue in front

• encapsulate with bounded thread pool T < T’

=> get ‘robust’ service with non-blocking interface

=>

6/4/2000 Java Grande 32

Wrap (thread-per-task server)

• Get robust hybrid task handler with T/L tolerance

• Preserve conventional task sequencing

• Building block for composed services

=>

6/4/2000 Java Grande 33

Pipeline

• Decouple stages within task handler across multiple task handlers

• Wrapped Blocking call is natural boundary

=>

6/4/2000 Java Grande 34

Why Pipeline?

• Functional parallelism across stages– when thread blocks in one...

• Functional parallelism across processors

• Functional parallelism across nodes

• Increase locality (cache, VM, TLB, …) within node

– tend to perform operation (stage) on “convoy” of tasks

• Limit number of threads devoted to “low concurrency” operation

– ex: file system can only handle 40-50 concurrent write requests, so this limits useful T

– additional threads can be applied to remainder of stage

6/4/2000 Java Grande 35

Replicate

• Scale throughput across nodes

• Provide fault isolation boundary

• Mediate thread-pool bottleneck within node

=>

6/4/2000 Java Grande 36

Combine

• Two task handlers share pool and queue

• Common use is before/after wrapped call

• Avoid wasting threads

=>

6/4/2000 Java Grande 37

A Prescription

Well-conditioned node

• Wrap to introduce load conditioning

• Pipeline to avoid wasting threads at bottlenecks

• Pipeline to enhance locality

Available Service

• Replicate for Fault Tolerance

Scaling

• Replicate to meet concurrency demand

Tuning

• Combine to limit threads per node

• Pipeline for functional specialization

6/4/2000 Java Grande 38

Ninja vSPACE design

• Each blocking interface is wrapped

• Service described by collection of task handler modules

• Each module implements a set of task types– includes completion events

– module clones are replicated on demand

• Most task handlers are state free

• Persistent state provided by DDS

• Explicit queues are the fundamental means of introspection

6/4/2000 Java Grande 39

Example: Hash Table Distr. Data Struct.

Service

DDS lib

Storage

“brick”

Service

DDS lib

Service

DDS lib

Storage

“brick”

Storage

“brick”

Storage

“brick”

Storage

“brick”

Storage

“brick”

System Area Network

Clustered

Service DistrHash tableAPI

Single-nodedurablehash table

Redundantlow latencyhigh xputnetwork

6/4/2000 Java Grande 40

DDS Hash Table Brick Design

I/O coredisk

I/O corenetwork

buffercache

single-nodeHT

distributed hashtable“RPC” skeletons

operating system

I/O coredisk

I/O corenetwork

file system /raw disk network stack

I/O coredisk

I/O corenetwork

file system /raw disk network stack

DDS Brick

Ideal I/O Core

Pragmetic I/O Core

6/4/2000 Java Grande 41

Scalable Throughput

100

1000

10000

100000

1 10 100 1000

# of DDS bricks

ma

x t

hro

ug

hp

ut

(op

s/s)

reads

writes

(128,13582)

(128,61432)

6/4/2000 Java Grande 42

Robust under load

0

4000

8000

12000

16000

20000

0 5 10 15 20 25 30# client process

(100 parallel requests issued per client process)

has

h t

able

th

rou

gh

pu

t (r

ead

s/s) 2 bricks

8 bricks

16 bricks

32 bricks

6/4/2000 Java Grande 43

Fault and Recovery

0

100

200

300

400

500

600

0 50000 100000 150000 200000 250000 300000

time (ms)

thro

ug

hp

ut

(re

ad

s/s)

Th

ree

no

des

On

e d

ies

Rec

ove

r st

art

Rec

ove

r d

on

e

Rec

ove

red

no

de

cold

Gar

bag

e

colle

ctio

n

6/4/2000 Java Grande 44

Dessert thoughts

• Performance and efficiency on Java is critical first step, but cannot stay in MPP mode

• Huge Opportunity– distributed innovation of widely used services (with I/O)

– service composition as new level of programming

• Need to deal with resource containment, load, errors, versions and coupling from the beginning

– events, queues, types msgs => managed RMI

• Event driven execution (encapsulating threads) is exciting & opens a rich set of questions

– expressiveness, synthesis

– introspection, scheduling, concurrency control

– debugging

6/4/2000 Java Grande 45

Where to go for more

• http://ninja.cs.berkeley.edu

• A Design Framework for Highly Concurrent Systems, Matt Welsh, Steven Gribble, Eric Brewer, and David Culler.

• Scalable, Distributed Data Structures for Internet Service Construction, Steven Gribble, Eric Brewer, Joseph Hellerstein, and David Culler.

• A security Architecture for the Post-PC World, S. Ross, J. Hill, M. Chen, D. Culler, A. Joseph, E. Brewer

• The MultiSpace: an Evolutionary Platform for Infrastructural Services, Steven Gribble, Matt Welsh, Eric Brewer, and David Culler.

6/4/2000 Java Grande 46

Backup: Mobility not enough

• RMI names classes / interfaces in the registry– which class do you get?

• Class path management nightmare

• Must maintain source web server

• distinct services may need distinct instances

• service name != class name

• versioning is essential

• use renaming to allow multiple versions within VM

• service publication expresses entire dependence set