replication and caching in multi-tier...

1

1UPM, May 2010

Replication and caching inmulti-tier architectures

Bettina KemmeMcGill UniversityMontreal, Canada

2UPM, May 2010

Information Systems today

2

3UPM, May 2010

What is a client/server system?• From Client perspective, there is one server• Server provides service that can be called

Client Server

4UPM, May 2010

Multi-tier Architecture

Application Server

Y:2 Z:3

Client A buys X

DatabaseX:0X:1

Items

Clients

Database Cache: x=0

Preferences

3

5UPM, May 2010

Replication

• What is replication?– Process replication:

• Several instances of AS• Several instances of DBMS

– Data replication• Several copies of a given data item

• Replication good for adaptability– Adjusting to dynamic environments

X:1X:1

X:1

6UPM, May 2010

Without Replication

• Bottleneck

Client Client Client Client

• Single point of failure

4

7UPM, May 2010

With Replication

• Load Sharing

Client ClientClient Client

• In case of crash, redirect clients toother replica

8UPM, May 2010

Challenge: Replica Control• Replica Control

– many physical copies appear asone logical copy

• Client sees logical copy• Execution is on physical copies

– in case of updates keep copiesconsistent

– What does consistency mean?– Read-one-write-all-(available)

• Update has to be (eventually)executed at all replicas to keepthem consistent

• Read can be performed at onereplica

5

9UPM, May 2010

Peer-to-peer

RM1RM2

RM3

Client-Distributed Server

Coordination amongservers

clientserver

clientserver

clientserver

clientserver

clientserver

clientserver

P2P

10UPM, May 2010

• Defs:– Each node/peer is client and server at the

same time– Each peer provides content and/or

resources– Direct exchange between peers– Autonomy of peers (can join and leave at

their will)

6

11UPM, May 2010

Challenges

• Finding the content you are looking for• Multicasting a message to all nodes or a

subset

12UPM, May 2010

Games

Peer-to-Peer Architectures for Games 2010 February 9th

Single Player:Assassin’s Creed Massively Multiplayer:

World of Warcraft

Multiplayer:Unreal Tournament

7

13UPM, May 2010

• Why is this interesting research?• Why is this interesting course

material?

14UPM, May 2010

Practical Importance• Multi-tier

– Multi-billion $ market• Much hype in AS market: JavaEE5, .Net, Corba• Well established database systems with new customers

daily– Stringent adaptability requirements

• High throughput• 24/7 availability• Storage of critical data

• Peer-to-peer– Big market– Fun and cool

8

15UPM, May 2010

Interesting Challenges require…

• knowledge in– data management– distributed architectures– communication mechanisms– distributed coordination algorithms

• both theoretical and systems-orientedwork

16UPM, May 2010

Challenge: Failover

• Transparent Failover• Correctness

– “Behave as a failure-freenon-replicated system”• state• response to client• coordination with other

tiers

Y:2 Z:3X:0X:1

9

17UPM, May 2010

Challenge: Recovery• Add replicas

– State transfer– Synchronization with

ongoing processing

18UPM, May 2010

Challenge: Load Balancing

10

19UPM, May 2010

Challenge: Provisioning

20UPM, May 2010

Object Replication: Fault-Tolerance

• Crash-failure: A server/process works correctly untilcomplete stop

• Correctness:– Replicated System should behave as non-replicated system

that has no failures– Each request has exactly one “successful” execution– Client receives exactly one response (failure transparency)– strong data consistency: data copies are consistent at the

end of request execution• Most well-known approaches

– passive (primary backup) replication– active replication

11

21UPM, May 2010

Understanding failure behavior

Client A buys X

AS losesvolatile state

Client receivesfailure exception

Exception

22UPM, May 2010

Passive Replication

Client Stub

Primary Backup

12

23UPM, May 2010

Execution Diagram

Client Stub

Primary

client requestresponse

execution

Backup

State changesand response

ok

24UPM, May 2010

Failure Cases

Client Stub

Primary


execution

Backup

State changesand response

ok

1 2 3 4 5

13

25UPM, May 2010

Case 2

Client Stub

Primary

Client stub receivesfailure exception

New Primary

Resubmission

Backup

New execution

26UPM, May 2010

Case 2

Client Stub

Primary

client requestexception

execution

Backup

2

Resub-mission

execution

result

14

27UPM, May 2010

Case 3 (and 4)

Client Stub

Primary

Client stub receivesfailure exception

New Primary

Immediate returnof response

Backup

Resubmission

28UPM, May 2010

Algorithms: normal processingClient Stub

Upon client requestAdd unique request-idForward to primary

Upon responseReturn to client

PrimaryUpon client request

Execute requestPropagate update (request-id, state changes and response)to backups

Wait to receive ok from allReturn to client

BackupUpon reception of update from primary

Apply changesStore request-id/responseReturn ok

15

29UPM, May 2010

Algorithms: failoverClient Stub

Upon failure exception for requestResubmit to new primary

New PrimaryUpon client request with request-idIf request-id/response stored (statealready received) return response immediately

else Execute request (Propagate request-id, state changes andresponse to remaining backups)

(Wait to receive ok from all) Return to client

30UPM, May 2010

Not discussed• Failover

– Client stub• How does it know who is the new primary?

– Backups• How do they detect failure of primary?• How do they decide on new primary?• How do they guarantee that last state changes are received

by all or none of the backups?• How to decide when to stop checking whether it is a

resubmission?

• Recovery– How do the old primary or new backups join?

16

31UPM, May 2010

Active Replication: normalprocessing

Client Stub

Replica Replica

32UPM, May 2010

Active Replication: failure

Client Stub

Replica Replica

17

33UPM, May 2010

Algorithm failover

• No extra failover algorithm

34UPM, May 2010

Concurrent Clients

Client Stub

Replica Replica

Client Stub

18

35UPM, May 2010

Concurrent Clients

Total order of requests at all replicas– All replicas must receive all requests in the

same order– Use of group communication systems

36UPM, May 2010

Total Order Execution

Client Stub

Replica Replica

Client Stub

19

37UPM, May 2010

Active vs. Passive Replication

• Determinism• Execution during normal processing

– Communication Overhead– CPU overhead– Complexity

• Write / read• Termination protocol

38UPM, May 2010

Considering cross-tier execution

AS losesvolatile state

Client receivesfailure exception

Y:2 Z:3X:0X:1X:1

20

39UPM, May 2010

Transactions• DB Transaction

– Sequence of read and write operations– Final commit or abort operation– Example

• ri(x) ri(y) wi(x) ci

• All-or-nothing property (Atomicity)– After successful commit:

• all changes guaranteed– Something goes wrong before commit:

• Changed so far rolled back

• Transactions across tiers– In this lecture: 1 client request = 1 transaction

40UPM, May 2010

Failure before commit

Y:2 Z:3X:0X:1

Write x

21

41UPM, May 2010

Failure after commit

Y:2 Z:3X:0X:1

Write xcommit

42UPM, May 2010

Execution

Client Stub

Primary


execution

Backup

Database

Start txn commit

22

43UPM, May 2010

Propagation after commit

Client Stub

Primary


execution

Backup

Database

Start txn commit

44UPM, May 2010

Failure after commit beforepropagation

Y:2 Z:3X:0

Inconsistency

ResubmissionException

23

45UPM, May 2010

Propagation before commit

Client Stub

Primary


execution

Backup

Database

Start txn commit

46UPM, May 2010

Failure after propagation beforecommit

Y:2 Z:3

Inconsistency

Resubmission

X:1X:0

24

47UPM, May 2010

Solution Outline

• Propagate before DB commit• At failure the following is possible

– New primary has state-changes– DB transaction is not committed– In this case:

• New primary must detect this• Then Discard state-changes• Make a complete reexecution

48UPM, May 2010

Failure after propagation beforecommit

Y:2 Z:3

Resubmission

X:1X:0

New execution

25

49UPM, May 2010

Performance Evaluation

• Comparison:– Non-replicated JBoss– JBoss + our algorithm (Replicated JBoss)– JBoss’ clustering (does not provide

transactional exactly-once semantics)

• Sun ECPerf benchmark– Ordering/ manufacturing / supply-chain

application

50UPM, May 2010

Response Time

0

50

100

150

200

250

300

350

400

450

500

550

600

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Injection Rate

Re

sp

on

se

Tim

e

(ms)

1-1 algorithm

JBoss Clustering

Non-ReplicatedJBoss

26

51UPM, May 2010

Throughput

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 2 4 6 8 10 12 14 16 18 20

Injection Rate

Bu

sin

ess O

pe

ratio

ns

(pe

r m

inu

te) Replicated Jboss

Clustered Jboss

Non-Replicated Jboss

52UPM, May 2010

Further Issues• Reaction of AS on failures of other tiers

– Client / DBS• Recovery

– Failed or new replicas rejoin as backups– Receive necessary backup information

27

53UPM, May 2010

Combining Load-Balancing andFault-tolerance

P P P PB B B

C C CC

BB

54UPM, May 2010

Scalability

28

55UPM, May 2010

So far

ReplicatedApplication

Server

Y ZXDatabase

Clients

56UPM, May 2010

Database Replication

Y ZReplicatedDatabase X

Clients

Y ZX

Clients

29

57UPM, May 2010

WhyFault-Tolerance: Take-Over

Performance: Fast local access

Madrid

Montreal

Rome

Application driven:mobile users, datawarehouses (OLAP vs.OLTP) ...

Scale-Up: cluster instead of bigger mainframe

58UPM, May 2010

Transactions repeated• DB Transaction

– Sequence of read and write operations– Final commit or abort operation– Example

• ri(x) ri(y) wi(x) ci

• All-or-nothing property (Atomicity)– After successful commit:

• all changes guaranteed– Something goes wrong before commit:

• Changed so far rolled back• Isolation

– Execution of concurrent transactions controlled such that resultthe same as if executed serially

– Enforced by a concurrency control protocol:• Decides when submitted operations may be executed on the data

30

59UPM, May 2010

Replica Control– Keep copies consistent: replica control– Difference to AS Replication:

• Several clients access the same data items– Combine replica control with concurrency

control

w(x) w(x)

xx xx xx yy

w(y) w(y)

60UPM, May 2010

Global Serializability

• Global serializability– Execution in replicated system equivalent

to serial execution in single-copy database

31

61UPM, May 2010

How to replicate data?

• Depending on when the updates arepropagated:– eager– lazy

• Depending on wherethe updates can take place:– Primary Copy (master)– Update Anywhere

eager

lazy

Primary copy Upd. any

62UPM, May 2010

Where can updates be submitted?

• Update Anywhere:– Update transactions can be

submitted to any site– Site forwards updates to

other sites

T2w2(y)

T3w3(x)…

T1w1(x)..

• Primary Copy:– Update transactions can

only execute at theprimary copy (master)

– Primary forwardsupdates to secondaries

T1, T2T3

read-only

read-only

T4r(y)…

32

63UPM, May 2010

When to propagate• Eager:

– within the boundariesof the transaction

– Transactions terminateusually with 2-phase-commit protocol

• 2PC: agreementprotocol to guaranteeatomicity

BOT

R(x)

W(x)

W(x) W(x)

R(y)

request

ack

2PC

Eager

64UPM, May 2010

When to propagate• Lazy:

– after thecommit of thetransaction

BOT

R(x)

W(x)

R(y)

Commit

W(x) W(x)

33

65UPM, May 2010

Eager Primary Copy

R(x)

W(x)

W(x) W(x)

R(y)

request

ack

2PC

R(y)

R(x)

66UPM, May 2010

Lazy primary copy

R(x)

W(x)

R(y)

Commit

W(x) W(x)

R(y)

R(x)

commit

34

67UPM, May 2010

Eager vs. lazy Primary Copy• replication transparency

– Need users to be aware of replication or not?• Consistency level• Complexity• Response Time for users• Fault-tolerance• Load-balancing

68UPM, May 2010

Eager vs. Primary Copy• Transparency

– Both: no– update transactions must be submitted to specific primary

• Consistency– Eager: High consistency– Lazy: stale reads

• concurrency control:– Both: relatively easy

• Message Overhead:– Eager: One round per write operation + 2PC

• Reduce message overhead by sending all write operations (writeset) within vote request message of 2PC

– Lazy: low

35

69UPM, May 2010

Eager Primary Copy• Fault-tolerance:

– Eager: yes; Widely used for fault-tolerance– Lazy: no

• Load-Balancing:– Both ok but restricted by update ratio

70UPM, May 2010

Eager update anywhere

W(x)

W(x) W(x)

R(y)

request

ack

2PC

w(y)

w(y) w(y)

2PC

36

71UPM, May 2010

Lazy update anywhere

R(x)

W(x)

R(y)

Commit

W(x) W(x)

R(y)

r(x)

commit

w(y)

w(y)w(y)

72UPM, May 2010

Lazy / Update Anywhere• Resolve conflicts

– for numeric types (or types with comparison):• average:• minimum/maximum:• additive:

– discard new value, overwrite old value– Site priority– value priority– earliest/latest timestamp

• Eventual Consistency– Eventually all copies have the same value

37

73UPM, May 2010

Eager vs. lazy update anywhere• transparency• Consistency• Concurrency control• Response time• Load-balancing• Fault-tolerance

74UPM, May 2010

Eager Update Anywhere• Transparency:

– Both: yes• Consistency:

– Eager: strong– Lazy: bad; stale reads and inconsistencies

• Concurrency control and coordination complexity– Eager: complex– Lazy: conflict resolution complex

38

75UPM, May 2010

Eager Update Anywhere• Response Time:

– Eager: bad– Lazy: good

• Load-Balancing:– Both: good

• Basically no database system supports eager updateanywhere– but many middleware based solutions!

76UPM, May 2010

Primary vs. Update anywhere

– Simpler concurrency control– Less coordination necessary / optimizations are

easier– Inflexible model:

• Clients must know primary to submit update transactions• Have to distinguish update from read-only transactions

– Primary is single point of failure and potentialbottleneck

39

77UPM, May 2010

Lazy vs. Eager– Lazy primary copy: stale reads– Lazy update anywhere: inconsistencies and

reconciliation– No communication within transaction

response time– Possible transaction loss in case of crash

78UPM, May 2010

• so far: Kernel-based approach

• new: Middleware-based approach– Advantages

• Modular• Do not need access to DB code• Reusability

– Disadvantages• No access to concurrency control

information in the kernel

Recent Replica ControlApproaches

40

79UPM, May 2010

Middleware Primary Copy

• (e.g. Ganymed)

primary secondary

1. submit

scheduler

3. exe

2. forward 4. propagate

Update transactions

primary secondary

1. submit

scheduler

3. exe

2. forward

Read only transactions

5. Apply the changes

80UPM, May 2010

So farClients

Y ZXReplicatedDatabase Y ZX


Server

Items Preferences

Preferences Items

41

81UPM, May 2010

Middle-tier caching


Server

Y ZXDatabase

Clients

Items

Database Cache: x=0

Prefer Items

Database Cache: x=0

Prefer

82UPM, May 2010

• Avoid DB bottleneck• Reduced response even if DB not

bottleneck• Simpler than DB replication?• Cached in format of application

82

Advantages of Caching

42

83UPM, May 2010

Select*fromcustomerwhereid=2

An example with No CachingAn example with No Caching

ApplicationServerDB

DisplayCustomerID=1

DisplayCustomerID=2

DisplayCustomerID=1


Customer

NOCACHING!!!

84UPM, May 2010

CachingCaching

ApplicationServer DB

Cache

43

85UPM, May 2010

Cache

CustomerID=1ID=2


An example with CachingAn example with Caching

ApplicationServer

DB

DisplayCustomerID=1

DisplayCustomerID=2

DisplayCustomerID=1


Customer

CustomerID=1ID=1

86UPM, May 2010

Two-layered CachingTwo-layered Caching

ApplicationServer

DB

Cache

FirstLevelCache

SecondLevelCache

44

87UPM, May 2010

Hibernate CachingHibernate Caching

• First-level cache (Session Cache)– Caches object within the current session.

• Second-level cache– Is responsible for caching objects across

sessions.

• Query cache– The query cache is responsible for caching

queries and their results.

88UPM, May 2010

Query CacheQuery Cache

• A query cache caches query resultsinstead of objects.

• The cache key is based on the queryname and parameters.

• Always used in conjunction with secondlevel cache.

45

89UPM, May 2010

Hibernate CachingHibernate Caching

Session

CacheConcurrencyStrategy

QueryCache

CacheImplementation(PhysicalCacheRegions)

CacheProvider

FirstLevelCache

SecondLevelCache

90UPM, May 2010


An example with CachingAn example with Caching

ApplicationServer

DB

DisplayCustomerID=1

DisplayCustomerID=2

DisplayCustomerID=1


Customer

Cache

Customer#1CustomerID=1ID=2Customer#2

Customer#1

FirstLevelCache

SecondLevelCache

ID=1

46

91UPM, May 2010

Cache

An example with Distributed An example with Distributed ASsASs

AS

DB

DisplayCustomerID=1

DisplayCustomerID=2

DisplayCustomerID=1

CustomerID=1

AS

AS

Cache

Cache

CustomerID=2

CustomerID=1

92UPM, May 2010

Approach # 1Approach # 1

• A separate “shared” cache instead ofdedicated caches for each AS.

AS

AS

AS

CacheDB

47

93UPM, May 2010

Approach # 2Approach # 2

• Two level Cache : Dedicated + SharedCaches

AS

AS

AS

CacheDB

Cache

Cache

Cache

94UPM, May 2010

Approach #Approach # 33

• Distributed Cooperative Cache

AS

AS

AS

DB

Cache

Cache

Cache

48

95UPM, May 2010

Flowchart. Example Cache GetFlowchart. Example Cache Get

CacheGet

Get(key)

isRemotelyCached()

returnobjectornull

getFromRemoteCache()

Yes

No

96UPM, May 2010

Coop.CacheA

An ExampleAn Example

AS

DB

Display

CustomerID=1

Display

CustomerID=2

Display

CustomerID=1

AS

AS

Coop.CacheC

Coop.CacheB

Directory Store

‐Customer#1,A

‐Customer#2,B

Directory Store

Directory Store

‐Customer#1,A

‐Customer#2,B

‐Customer#1,A

‐Customer#2,B

‐Customer#1Object

‐Customer#2Object

FetchRemoteObject:Customer#1fromLocationA

ReturnedObjectwithCustomerkey1

‐Customer#1,A‐Customer#1,A

49

97UPM, May 2010

WAN

Client DBMSAppSrv

98UPM, May 2010

Edge Computing

Client DBMSEdgeAppSrv

Client DBMSBackendSrvEdgeAppSrv

JDBCEJB

JDBC

50

99UPM, May 2010

Advantages

• Load reduction on Server side• Network bandwidth• latency

100UPM, May 2010

Details

EdgeAppSrv BackendSrv

ApplicationLogic

ReplicatedEJBHome

TransientHome(TH)

SLIHome

CacheMiss,OCCLogic

PersistentHome(PH)

JDBC

Transaction

3

57

51

101UPM, May 2010

Details


ApplicationLogic

ReplicatedEJBHome

TransientHome(TH)

SLIHome

CacheMiss,OCCLogic

PersistentHome(PH)

JDBC

Transaction

3

7

5

102UPM, May 2010

Details


ApplicationLogic

ReplicatedEJBHome

TransientHome(TH)

SLIHome

CacheMiss,OCCLogic

PersistentHome(PH)

JDBC

Transaction

3 5 7

52

103UPM, May 2010

Details


ApplicationLogic

ReplicatedEJBHome

TransientHome(TH)

SLIHome

CacheMiss,OCCLogic

PersistentHome(PH)

JDBC

Transaction

357

104UPM, May 2010

Last note

• Issues– Fault-tolerance– Scalability– Load-balancing– Performance

• Throughput --> Scalability• Response Time

– Correctness

replication and caching in multi-tier...

Documents