communication between processes · 2001. 10. 26. · communication between processes (2) most...

85
Communication between processes

Upload: others

Post on 01-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Communicationbetween processes

Page 2: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Communication between processes

What problems emerge when communicating

� between separate address spaces

� between separate machines?

How do those environments differ from previous examples?

Recall that

� within a process, or with a shared virtual address space,threads can communicate naturally through ordinary datastructures – object references created by one thread canbe used by another

� failures are rare and usually occur at the granularity ofwhole processes

� OS-level protection is also performed at the granularity ofprocesses

Concurrent Systems and Applications 2001 – 2 Tim Harris

Page 3: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Communication between processes (2)

Most directly, introducing separate address spaces meansthat data is not directly shared between the threads involved

� At a low-level the representation of different kinds of datamay vary between machines – e.g. big endian v littleendian

� Names used may require translation – e.g. objectlocations in memory (at a low-level) or file names on alocal disk (at a somewhat higher level)

More generally, we’ll see four recurring problems indistributed systems:

� Components execute concurrently

� Components (and/or their communication channels) mayfail independently

� Access to a ‘global clock’ cannot be assumed

� Inconsistent states can occur during operations (e.g.related changes to objects on different machines)

Concurrent Systems and Applications 2001 – 3 Tim Harris

Page 4: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Communication between processes (3)

We’ll look primarily at two different mechanisms forcommunication between processes

� Low-level communication using network sockets

✔ A ‘lowest-common-denominator’: protocols like TCPare available on almost all platforms

✘ Much more for the application programmer to thinkabout; many wheels to re-invent

� Remote method invocation

✔ Remote invocations look substantially like local calls:many low-level details are abstracted

✘ Remote invocations look substantially like local calls:the programmer must remember the limits of thistransparency and still consider problems such asindependent failures

✘ Not well suited to streaming or multi-casting data

Concurrent Systems and Applications 2001 – 4 Tim Harris

Page 5: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Naming

How should processes identify which resources they wish toaccess?

Within a single address space in a Java program we could useobject references to identify shared data structures and either

� pass them as parameters to a thread’s constructor

� access them from static fields

When communicating between address spaces we needother mechanisms to establish

� unambiguously which item is going to be accessed

� where that item is located and how communication withit can be achieved

Late binding of names (e.g. elite.cl.cam.ac.uk ) toaddresses (128.232.8.50 ) is considered good practice –i.e. using a name service at run-time to resolve names, ratherthan embedding addresses directly in a program

Concurrent Systems and Applications 2001 – 5 Tim Harris

Page 6: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Name services

1. Register

2. Resolve

4. Access

3. Address

Server

Client Name service

How does the client now how to contact the name service?

� A namespace is a collection of names recognised by aname service – e.g. process IDs on one UNIX system, thefilenames that are valid on a particular system or theInternet DNS names that are defined

� A naming domain is a section of a namespace operatedunder a single administrative authority – e.g.management of the cl.cam.ac.uk portion of the DNSnamespace is delegated to the Computer Lab

� Binding or name resolution is the process of making alookup on the name service

Concurrent Systems and Applications 2001 – 6 Tim Harris

Page 7: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Name services (2)

Although we’ve shown the name service here as a singleentity, in reality it may

� be replicated for availability (lookups can be made if anyof the replicas are accessible) and read performance(lookups can be made to the nearest replica)

� be distributed, e.g. separate systems may managedifferent naming domains within the same namespace(updates to different naming domains require lessco-ordination)

� allow caching of addresses by clients, or caching ofpartially resolved names in a hierarchical namespace

(See Part-II, Distributed Systems)

Concurrent Systems and Applications 2001 – 7 Tim Harris

Page 8: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Names

Names are used to identify things and so they should beunique within the context that they are used. (A directoryservice may be used to select an appropriate name to lookup – e.g. “find the nearest system providing service xyz”)

When a namespace contains a single naming domain thensimple unique IDs (UIDs) may be used – e.g. process IDs inUNIX

� UIDs are simply numbers in the range 0:::2N � 1 for anN -bit namespace. (Beware: UID 6= user ID in thiscontext!)

✔ Allocation is easy if N is large – just allocate successiveintegers

✘ Allocation is centralized (designs for allocating processIDs on highly parallel UNIX systems are still the subjectof research)

✘ What can be done if N is small? When can/should UIDsbe re-used?

Concurrent Systems and Applications 2001 – 8 Tim Harris

Page 9: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Names (2)

More usually a hierarchical namespace is formed – e.g.filenames or DNS names

✔ The hierarchy allows local allocation if differentallocators agree to use non-overlapping prefixes

✔ The hierarchy can often follow administrative delegationof control

✔ Locality of access within the structure may helpimplementation efficiency (if I lookup one name in/usr/bin/ then perhaps I’m likely to lookup othernames in that same directory)

✘ Lookups may be more complex. Can names be arbitrarilylong?

Concurrent Systems and Applications 2001 – 9 Tim Harris

Page 10: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Names (3)

We can also distinguish between pure and impure names

A pure name yields no information about the identifiedobject – where it may be located or where its details may beheld in a distributed name service. e.g. process IDs in UNIX

An impure name contains information about the object – e.g.e-mail to [email protected] will always be sent to a mailserver in the University

� Are DNS names, e.g. elite.cl.cam.ac.uk pure orimpure?

� Are IPv4 addresses, e.g. 128.232.8.50 pure or impure?

Names may have structure while still being pure – e.g.Ethernet MAC addresses are structured 48-bit UIDs andinclude manufacturer codes, and broadcast/multicast flags.This structure avoids centralized allocation

In other schemes, pure names may contain location hints.Crucially, impure names prevent the identified object fromchanging in some way (usually moving) without renaming

Concurrent Systems and Applications 2001 – 10 Tim Harris

Page 11: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Protection

Require protection against unauthorised:

� release of information– reading or leaking data– violating privacy legislation– using proprietary software– covert channels

� modification of information– changing access rights– can do sabotage without reading information

� denial of service– causing a crash or intolerable load

How should access to resources be controlled?

� When a system is built from multiple processes

� ...when these may be executing on different systems

� ...when some may be operating as servers on behalf ofmany clients

Concurrent Systems and Applications 2001 – 11 Tim Harris

Page 12: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Protection (2)

� Some other protection mechanisms:– lock the computer room (prevent people from

tampering with the hardware)

– restrict access to system software

– de-skill systems operating staff

– keep designers away from final system!

– use passwords (in general challenge/response)

– use encryption

– legislate

� ref: Saltzer + Schroeder Proc. IEEE, Sept 75– design should be public

– default should be no access

– check for current authority

– give each process minimum possible authority

– mechanisms should be simple, uniform and built in tolowest layers

– should be psychologically acceptable

– cost of circumvention should be high

– minimize shared access

Concurrent Systems and Applications 2001 – 12 Tim Harris

Page 13: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Access matrix

Access matrix is a matrix of subjects against objects.

Subject (or principal) might be:

� users e.g. by system user ID� executing process in a protection domain� sets of users or processes

Objects are things like:

� files� devices� domains / processes� message ports (in microkernels)

Matrix is large and sparse) don’t want to store it all.

Two common representations:

1. by object: store list of subjects and rights with eachobject) access control list

2. by subject: store list of objects and rights with eachsubject) capabilities

Concurrent Systems and Applications 2001 – 13 Tim Harris

Page 14: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Access control lists

Often used in storage systems:

� system naming scheme provides for ACLs to be insertedat each level of a hierarchical name, e.g. files

� if ACLs stored on disk, check is made in software) mustonly use on low duty cycle

� for higher duty cycle must cache results of check

� e.g. Multics: open file = memory segment.On first reference to segment:1. interrupt (segment fault)2. check ACL3. set up segment descriptor in segment table

� most systems check ACL– when file opened for read or write– when code file is to be executed

� access control by program, e.g. Unix– exam prog, RWX by examiner, X by student– data file, A by exam program, RW by examiner

Concurrent Systems and Applications 2001 – 14 Tim Harris

Page 15: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Capabilities

Capabilities associated with active subjects, so:

� store in address space of subject

� must make sure subject can’t forge capabilities

� easily accessible to hardware

� can be used with high duty cyclee.g. as part of addressing hardware– Plessey PP250– CAP I, II, III– IBM system/38– Intel iAPX432

� have special machine instructions to modify (restrict)capabilities

� support passing of capabilities on procedure call

Can also use software capabilities. Checked by encryption.Nice for distributed systems

Concurrent Systems and Applications 2001 – 15 Tim Harris

Page 16: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Capabilities (2)

Tagged Architectures (e.g. IBM system/38):

� all words in memory and the processor registers aretagged as containing either data or a capability

� tag stays with contents on all copy operations� system checks ALU operations for validity

Capability segments (e.g. CAP):

� capabilities for code segment held in special capabilitysegment

� only a restricted set of operations are allowed oncapability segments

� provide a cache of entries in capability segments inspecial capability registers

� use associative store, per domain capability list, centralcapability list

� add enter capability

Software schemes (e.g. EROS)

� require capabilities for all system services� fake out enter via IPC.

Concurrent Systems and Applications 2001 – 16 Tim Harris

Page 17: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Capabilities (3)

� Capabilities nice for distributed systems but:– messy for application, and– revocation is tricky.

� Could use timeouts (e.g. Amoeba).

� Alternatively: combine passwords and capabilities.

� Store ACL with object, but key it on capability (notimplicit concept of “principal” from OS).

� Advantages:– revocation possible– multiple “roles” available.

� Disadvantages:– still messy (use ‘implicit’ cache?).

Concurrent Systems and Applications 2001 – 17 Tim Harris

Page 18: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Facilities in Java

We’ll now look at how these techniques apply to Javaapplications

Within Java applications object references can be used asunforgeable capabilities, e.g. when running multiple appletswithin a single JVM

� Access modifiers on constructors prevent arbitraryinstantiation of classes

� Access control checks can be performed at instantiationtime and – if these fail – instantiation can be aborted bythrowing an exception

For many kinds of access the security manager provides amechanism for enforcing simple controls

� A security manager is implemented byjava.lang.SecurityManager (or a sub-class)

� An instance of this is installed usingSystem.setSecurityManager(...) (itself anoperation under the control of the current securitymanager)

Concurrent Systems and Applications 2001 – 18 Tim Harris

Page 19: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Facilities in Java (2)

Most checks are made by delegating to acheckPermission method, e.g. for dynamically loading anative library

checkPermission(

new RuntimePermission(

"loadLibrary."+lib));

Decisions made by checkPermission are relative to aparticular security context. The current context can beobtained by invoking getSecurityContext and checksthen made on behalf of another context

Permissions can be granted in a policy definition file, passedto the JVM on the command line with-Djava.security.policy= filename

grant {

permission java.net.SocketPermission

"*:1024-65535", "connect,accept";

};

http://java.sun.com/products/jdk/1.2/docs/guide/security/index.html

Concurrent Systems and Applications 2001 – 19 Tim Harris

Page 20: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Low-level communication

Two basic network protocols are available in Java:datagram-based UDP and stream-based TCP (see DigitalCommunication I)

Communication occurs between UDP sockets which areaddressed by giving an appropriate IP address and a UDPport number (0..65535, although 0 not accessible throughcommon APIs, 1..1023 reserved for privileged use)

UDP sockets provide unreliable datagram-basedcommunication that is subject to:

� Loss: datagrams that are sent may never be received, and

� Re-ordering: datagrams are forwarded separately withinthe network and may arrive out of order

A checksum is used to guard against corruption (corrupt datais discarded by the protocol implementation and theapplication perceives it as loss)

The framing within datagrams is preserved – e.g. iffragmentation occurs within the network

Concurrent Systems and Applications 2001 – 20 Tim Harris

Page 21: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Low-level communication (2)

Naming is handled by

� Using the DNS to map textual names into IP addresses,InetAddress.getByName("elite.cl.cam.ac.uk")

� Using ‘well-known’ port numbers for particular UDPservices which wish to be accessible to clients (See the/etc/services file on a UNIX system)

UDP sockets are represented by instances ofjava.net.DatagramSocket . The 0-argumentconstructor creates a new socket that is bound to anavailable port on the local host machine. This identifies thelocal endpoint for the communication

Datagrams are represented in Java as instances ofjava.net.DatagramPacket . The most elaborateconstructor

DatagramPacket(byte buf[], int length,

InetAddress address, int port)

specifies the data to send (length bytes from within buf )and the destination address and port

Concurrent Systems and Applications 2001 – 21 Tim Harris

Page 22: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

UDP example

import java.net.*;

public class Send {

public static void main (String args[]) {

try {

DatagramSocket s = new DatagramSocket ();

byte[] b = new byte[1024];

int i;

for (i = 0; i < args.length - 2; i ++)

b[i] = Byte.parseByte (args[2 + i]);

DatagramPacket p = new DatagramPacket (

b, i,

InetAddress.getByName (args[0]),

Integer.parseInt (args[1]));

s.send(p);

} catch (Exception e) {

System.out.println("Caught " + e);

}

}

}

Concurrent Systems and Applications 2001 – 22 Tim Harris

Page 23: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

UDP example (2)

import java.net.*;

public class Recv {

public static void main (String args[]) {

try {

DatagramSocket s = new DatagramSocket ();

byte[] b = new byte[1024];

DatagramPacket p =

new DatagramPacket (b, 1024);

System.out.println("Port: " +

s.getLocalPort());

s.receive(p);

for (int i = 0; i < p.getLength (); i ++)

System.out.print ("" + b[i] + " ");

System.out.println ("\nFrom: " +

p.getAddress () + ":" + p.getPort ());

} catch (Exception e) {

System.out.println("Caught " + e);

}

}

}

Concurrent Systems and Applications 2001 – 23 Tim Harris

Page 24: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Problems using UDP

Many facilities must be implemented manually by theapplication programmer:

✘ Detection and recovery from loss

✘ Flow control (preventing the receiver from beingswamped with too much data)

✘ Congestion control (preventing the network from beingoverwhelmed)

✘ Conversion between application data structures andarrays of bytes (marshaling )

Of course, there are situations where UDP is directly useful

✔ Communication with existing UDP services (e.g. someDNS name servers)

✔ Broadcast and multicast are possible (e.g. address255.255.255.255) all machines on the local network

Concurrent Systems and Applications 2001 – 24 Tim Harris

Page 25: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

TCP sockets

The second basic form of inter-process communication isprovided by TCP sockets

Naming is again handled using the DNS and well-knownport numbers as before. There is no relationship betweenUDP and TCP ports having the same number

TCP provides a reliable bi-directional connection-basedbyte-stream with flow control and congestion control

What doesn’t it do?

� Unlike UDP the interface exposed to the programmer isnot datagram based: framing must be provided explicitly

� Marshaling must still be done explicitly – but serializationmay help here

� Communication is one-to-one

In practice TCP forms the basis for many internet protocols –e.g. FTP and HTTP are both currently deployed over it

Concurrent Systems and Applications 2001 – 25 Tim Harris

Page 26: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

TCP sockets (2)

Two principal classes are involved in exposing TCP socketsin Java:

� java.net.Socket represents a connection over whichdata can be sent and received. Instantiating it directlyinitiates a connection from the current process to aspecified address and port. The constructor blocks untilthe connection is established (or fails with an exception)

� java.net.ServerSocket represents a socketawaiting incoming connections. Instantiating it starts thelocal machine listening for connections on a particularport. ServerSocket provides an accept operationthat blocks the caller until an incoming connection isreceived. It then returns an instance of Socketrepresenting that connection

The system will usually buffer only a small (5) number ofincoming connections if accept is not called

Typically programs that expect multiple clients will have onethread making calls to accept and starting further threadsfor each connection

Concurrent Systems and Applications 2001 – 26 Tim Harris

Page 27: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

TCP example

import java.net.*;

import java.io.*;

public class TCPSend {

public static void main (String args[]) {

try {

Socket s = new Socket (

InetAddress.getByName (args[0]),

Integer.parseInt (args[1]));

OutputStream os = s.getOutputStream ();

while (true) {

int i = System.in.read();

os.write(i);

}

} catch (Exception e) {

System.out.println("Caught " + e);

}

}

}

Concurrent Systems and Applications 2001 – 27 Tim Harris

Page 28: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

TCP example (2)

import java.net.*;

import java.io.*;

public class TCPRecv {

public static void main (String args[]) {

try {

ServerSocket serv = new ServerSocket (0);

System.out.println ("Port: " +

serv.getLocalPort ());

Socket s = serv.accept ();

System.out.println ("Remote addr: " +

s.getInetAddress());

System.out.println ("Remote port: " +

s.getPort());

InputStream is = s.getInputStream ();

while (true) {

int i = is.read ();

if (i == -1) break;

System.out.write (i);

}

} catch (Exception e) {

System.out.println("Caught " + e);

}

}

}

Concurrent Systems and Applications 2001 – 28 Tim Harris

Page 29: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Remote method invocation

Using UDP or TCP it was necessary to

� Decide how to represent data being sent over thenetwork – either packing it into arrays of bytes (in aDatagramPacket ) or writing it into an OutputStream(using a Socket )

� Use a rather inflexible naming system to identify servers –updates to the DNS may be difficult, access to a specificport number may not always be possible

� Distribute the code to all of the systems involved andensure that it remains consistent

� Deal with failures (e.g. the remote machine crashing –something a ‘reliable’ protocol like TCP cannot mask)

Java RMI presents a higher level interface that addressessome of these concerns. Although it is remote methodinvocation, the principles are the same as for remoteprocedure call (RPC) systems

Concurrent Systems and Applications 2001 – 29 Tim Harris

Page 30: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Remote method invocation (2)

124

31

client server

web server

registry

1. A server registers a reference to a remote object with theregistry (a basic name service) and deposits associated.class files with a web server

2. A client queries the registry to obtain a reference to aremote object

3. The client obtains the .class files needed to access theremote object from a web server (if they are not alreadyavailable locally)

4. The client makes an RMI call to the remote object

The registry acts here as a name service, holding names ofthe form //thor.cam.ac.uk/tlh20-example-1.2

Concurrent Systems and Applications 2001 – 30 Tim Harris

Page 31: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Remote method invocation (3)

Parameters and results are generally passed by making deepcopies when passed or returned over RMI

� i.e. copying proceeds recursively on the object passed,objects reachable from that etc (! take care to reduceparameter sizes)

� The structure of object graphs is preserved – e.g. datastructures may be cyclic

� Remote objects are passed by reference and so bothcaller and callee will interact with the same remoteobject if a reference to it is passed or returned

Note that Java only supports remote method invocation –changes to fields must be made using get /set methods

Other implementation choices:

� Perform a shallow copy and treat other objects reachablefrom that as remote data (as above, would be hard toimplement in Java) or copy them incrementally

� Emulate ‘pass by reference’ by passing back any changeswith the method results (what about concurrent updates?)

Concurrent Systems and Applications 2001 – 31 Tim Harris

Page 32: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI - Interfaces

Suppose that we wish to define a simple remote object onwhich a single method spell is defined:

package tlh20.rmi;1

2

import java.rmi.*;3

4

public interface Phonetic extends Remote {5

6

public final static String URL =7

"//thor.cam.ac.uk/tlh20-example-1.2";8

9

public String [] spell (String s)10

throws RemoteException;11

}12

� All RMI invocations are made across remote interfacesextending java.rmi.Remote

� The field URL in Lines 7–8 will be used to name aparticular remote object implementing this interface. It’sincluded here for easy access by both client and server

� All remote methods must throw RemoteException

Concurrent Systems and Applications 2001 – 32 Tim Harris

Page 33: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI - Client

package tlh20.rmi;1

2

import java.rmi.*;3

4

public class PhoneticClient {5

6

public static void main (String [] args) {7

try {8

System.setSecurityManager (9

new RMISecurityManager ());10

11

Phonetic p = (Phonetic)12

Naming.lookup (Phonetic.URL);13

14

String [] results = p.spell ("Example");15

16

for (int r = 0; r < results.length; r++)17

System.out.println (results [r]);18

}19

catch (Exception e) {20

System.out.println ("Exception: " + e);21

}22

}23

}24

Concurrent Systems and Applications 2001 – 33 Tim Harris

Page 34: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI - Client (2)

Note how few differences there are in the client comparedwith local invocations on an instance of a classimplementing Phonetic :

� The security manager installed in lines 9–10 is anexample one for use by RMI applications that usedownloaded code

� Lines 12–13 obtain an instance of a class implementingthe Phonetic interface. Invocations on this instancewill be made on a remote object registered under thename Phonetic.URL

� The exception handler in lines 20–22 may see

– NotBoundException – no remote object has beenassociated with the name Phonetic.URL

– RemoteException – if the RMI registry could not becontacted (12–13) or if there was a problem with thecall (15)

– AccessException – if the operation has not beenpermitted

Concurrent Systems and Applications 2001 – 34 Tim Harris

Page 35: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI - Server

package tlh20.rmi;1

2

import java.net.*;3

import java.rmi.*;4

import java.rmi.server.*;5

6

public class PhoneticServer7

extends UnicastRemoteObject8

implements Phonetic9

{10

public static void main (String [] args) {11

try {12

System.setSecurityManager (13

new RMISecurityManager ());14

15

PhoneticServer s = new PhoneticServer ();16

17

Naming.rebind (Phonetic.URL, s);18

System.out.println (Phonetic.URL +19

" server running");20

}21

catch (Exception e) {22

System.out.println ("Exception: " + e);23

};24

}25

Concurrent Systems and Applications 2001 – 35 Tim Harris

Page 36: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI - Server (2)

public PhoneticServer () throws RemoteException {26

super ();27

}28

29

private final static String [] WORDS = { "alfa",30

"bravo", "charlie", "delta", "echo", "foxtrot",31

"golf", "hotel", "India", "Juliet", "kilo",32

"Lima", "Mike", "November", "Oscar", "papa",33

"Quebec", "Romeo", "sierra", "tango", "uniform",34

"victor", "whiskey", "x-ray", "yankee", "zulu" };35

36

public String [] spell (String s)37

throws RemoteException38

{39

String source = s.toUpperCase ();40

String [] reply = new String [s.length ()];41

for (int i = 0; i < s.length (); i++) {42

try {43

int w = (int) source.charAt (i) - (int) ’A’;44

reply [i] = WORDS [w];45

}46

catch (Exception e) {reply [i] = "?";}47

}48

return reply;49

}50

Concurrent Systems and Applications 2001 – 36 Tim Harris

Page 37: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Putting it all together

� Compile the remote interface class, client and server:

$ javac tlh20/rmi/Phonetic.java

$ javac tlh20/rmi/PhoneticClient.java

$ javac tlh20/rmi/PhoneticServer.java

� Generate stub classes from the server:

$ export PUBCLASSES=/home/tlh20/\

public_html/java/classes/

$ rmic -v1.2 -d $PUBCLASSES \

tlh20.rmi.PhoneticServer

� Generate a security policy file:

grant {

permission java.net.SocketPermission

"*:1024-65535", "connect,accept";

permission java.net.SocketPermission

"*:80", "connect";

permission java.util.PropertyPermission

"java.rmi.server.codebase", "read";

permission java.util.PropertyPermission

"user.name", "read,write";

};

Concurrent Systems and Applications 2001 – 37 Tim Harris

Page 38: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Putting it all together (2)

� Start the server running:

$ export CODEBASE=http://hammer.thor.cam.ac.uk\

/~tlh20/java/classes/

$ java -Djava.rmi.server.codebase=$CODEBASE \

-Djava.security.policy=security.policy \

tlh20.rmi.PhoneticServer

//thor.cam.ac.uk/tlh20-example-1.2 server running

� Start the client running:

$ java -Djava.security.policy=security.policy \

tlh20.rmi.PhoneticClient

echo

x-ray

alfa

Mike

papa

Lima

echo

Concurrent Systems and Applications 2001 – 38 Tim Harris

Page 39: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI implementation

PhoneticServer_Stub

PhoneticClient PhoneticServer

UnicastRef

TCPConnection

UnicastServerRef

TCPTransport

Method

� The Stub class is the one created by the rmic tool – ittransforms invocations on the Phonetic interface intogeneric invocations of an invoke method onUnicastRef

� UnicastRef is responsible for selecting a suitablenetwork transport for accessing the remote object – inthis case TCP

� UnicastServerRef uses the ordinary reflectioninterface to dispatch calls to remote objects

Concurrent Systems and Applications 2001 – 39 Tim Harris

Page 40: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI implementation (2)

With the TCP transport RMI creates a new thread on theserver for each incoming connection that is received

� A remote objects should be prepared to acceptconcurrent invocations of its methods

� Remember: the synchronized modifier applies to amethod’s implementation. It must be applied to thedefinition in the server class, not the interface

✔ This avoids deadlock if remote object A invokes anoperation on remote object B which in turn invokes anoperation on A

✘ The application programmer must be aware of how manythreads might be created and the impact that they mayhave on the system

Concurrent Systems and Applications 2001 – 40 Tim Harris

Page 41: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI implementation (3)

1. Marshal

2. Generate ID

3. Set timer

5. Record ID

8. Unmarshal

6. Marshal

4. Unmarshal

7. Set timer

9. Acknowledge

Caller Calledmethod

RMI ServiceRMI Service

Client Server

What could be done without TCP?

We need to manually implement:

� Reliable delivery of messages subject to loss in thenetwork

� Association between invocations and responses – shownhere using per-call RPC identifier with which allmessages are tagged

Concurrent Systems and Applications 2001 – 41 Tim Harris

Page 42: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

RMI implementation (4)

Even this simple protocol requires multiple threads: e.g. tore-send lost acknowledgements after the client-side RMIservice has returned to the caller

What happens if a timeout occurs at 3? Either the messagesent to the server was lost, or the server failed before replying

� At-most-once semantics) return failure indication tothe application

� ‘Exactly’-once semantics) retry a few times with thesame RPC id (so server can detect retries)

What happens if a timeout occurs at 7? Either the messagesent to the client was lost, or the client failed

No matter what is done, the client cannot distinguish, onthe basis of these messages, server failures before / aftermaking some change to persistent storage

Concurrent Systems and Applications 2001 – 42 Tim Harris

Page 43: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Defining remote interfaces

Recall that with Java RMI the interface to a remote object isdefined as an ordinary interface that extendsjava.rmi.Remote

✔ Easy to use in Java-based systems

✘ What about interoperability with other languages?

Java RMI is rather unusual in using ordinary languagefacilities to define remote interfaces. Usually a specificInterface Definition Language (IDL) is used

� This acts as a ‘lowest common denominator’ presentingfeatures common to many languages

� The IDL has language bindings that define how itsfeatures are realized in a particular language

� An IDL compiler generates per-language stubs (contrastwith the rmic tool that only generates stubs for the JVM)

Concurrent Systems and Applications 2001 – 43 Tim Harris

Page 44: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

OMG IDL

We’ll take OMG IDL (used in CORBA) as a typical example

//POS Object IDL example1

module POS {2

typedef string Barcode;3

4

interface InputMedia {5

typedef string OperatorCmd;6

void barcode_input(in Barcode item);7

void keypad_input(in OperatorCmd cmd);8

};9

};10

� A module defines a namespace within which a group ofrelated type definitions and interface definitions occur

� Interfaces can be derived using multiple inheritance

� Built-in types include basic integers (e.g. long holding�231 : : : 231 � 1 and unsigned long holding0 : : : 232 � 1), floating point types, 8-bit characters,boolean s and octet s

� Parameter modifiers in , out and inout define thedirection in which parameters are copied

Concurrent Systems and Applications 2001 – 44 Tim Harris

Page 45: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

OMG IDL (2)

Type constructors allow structures, discriminated unions,enumerations and sequences to be defined:

struct Person {

string name;

short age;

};

union Result switch(long) {

case 1 : ResultDataType r;

default : ErrorDataType e;

};

enum Color { red, green, blue };

typedef sequence<Person> People;

Interfaces can define attributes (unlike Java interfaces), butthese are just shorthand for pairs of method definitions:

attribute long value;

!

long _get_value();

void _set_value(in long v);

Concurrent Systems and Applications 2001 – 45 Tim Harris

Page 46: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

OMG IDL (3)

IDL construct Java constructmodule package

interface interface + classesconstant public static final

boolean booleanchar , wchar char

octet bytestring , wstring java.lang.String

short shortunsigned short short

long longunsigned long long

float floatdouble double

eunm, struct , union classsequence , array array

exception classreadonly attribute Read-accessor method

attribute Read,write-accessor methodsoperation Method

� ‘Holder classes’ are used for out and inout parameters– these contain a field appropriate to the type of theparameter

Concurrent Systems and Applications 2001 – 46 Tim Harris

Page 47: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Microsoft .NET

Instead of defining a separate IDL and per-languagebindings, the Microsoft .NET platform defines a commonlanguage subset and programming conventions for makingdefinitions that conform to it

Many familiar features: static typing, objects (classes, fields,methods, properties), overloading, single inheritance ofimplementations, multiple implementation of interfaces, . . .

Metadata describing thse definitions is available at run-time,e.g. to control marshaling

� Interfaces can be defined in an ordinary programminglanguage and do not need an explicit IDL compiler

� Languages vary according to whether they can be used towrite clients or servers in this system – e.g. JScript andCOBOL vs VB, C#, SML

Concurrent Systems and Applications 2001 – 47 Tim Harris

Page 48: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Transactions

Page 49: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Transactions

We’ve now seen mechanisms for

� Controlling concurrent access to objects

� Providing access to remote objects

Using these facilities correctly, and particularly incombination, is extremely difficult. What improvedabstractions could be provided?

Ideally the programmer may wish to write something like

transactionally {

if (source.balance() >= amount) {

source.withdraw (amount);

destination.deposit (amount);

return true;

} else {

return false;

}

}

Concurrent Systems and Applications 2001 – 49 Tim Harris

Page 50: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Transactions (2)

The intent is that code within a transactionally blockwill execute without interference from other activities, inparticular

� other operations on the same objects

� system crashes

We’ll say that a transaction either commits (i.e. succeeds) oraborts (i.e. fails).

Of course, we can’t provide complete resilience to systemcrashes, but we can say that

� if enough of the system keeps working

� then the results of committed transactions are not lost

� and the effects of non-committed transactions are notseen

Concurrent Systems and Applications 2001 – 50 Tim Harris

Page 51: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Transactions (3)

In more detail we’d like committed transactions to satisfyfour ‘ACID’ properties:

A tomicity – either all or none of the transaction’s operationsare performed

— programmers do not have to worry about ‘cleaning up’after a transaction aborts; the system ensures that it has no

visible effects

Consistency – a transaction transforms the system from oneconsistent state to another

— essentially the transaction must be implemented topreserve desired invariants, e.g. totals across accounts

I solation – the effects of a transaction are not visible to othertransactions until it is committed

— in the strictest case, another transaction shouldn’t read thesource and destination amounts mid-transfer

Durability – the effects of committed transactions enduresubsequent system failures

— when the system confirms the transaction has committedit must ensure any changes will survive faults

Concurrent Systems and Applications 2001 – 51 Tim Harris

Page 52: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Transactions (4)

These requirements can be grouped into two categories:

� Atomicity and durability refer to the persistence oftransactions across system failures.

We want to ensure that no ‘partial’ transactions areperformed (atomicity) and we want to ensure that systemstate does not regress by apparently-committedtransactions being lost (durability)

� Consistency and isolation concern ensuring correctbehaviour in the presence of concurrent transactions

As we’ll see there are trade-offs between the ease ofprogramming within a particular transactionalframework, the extent that concurrent execution oftransactions is possible and the isolation that is enforced

Concurrent Systems and Applications 2001 – 52 Tim Harris

Page 53: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Persistent storage

Assume a fail-stop model of crashes in which

� the contents of main memory (and above in the memoryhierarchy) is lost

� non-volatile storage is preserved (e.g. data written to disk)

) if we want the state of an object to be preserved acrosssystem failures then we must either

� ensure that sufficient replicas exist on different machinesthat the risk of losing all is tolerable (Part-II DistributedSystems)

� ensure that the enough information is written tonon-volatile storage in order to recover the state after arestart

Can we just write object state to disk before every commit?(e.g. invoking flush() on any kind of JavaOutputStream )

✘ Not directly: the failure may occur part-way through thedisk write (particularly for large amounts of data)

Concurrent Systems and Applications 2001 – 53 Tim Harris

Page 54: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Persistent storage – logging

We could split the update into stages:

1. Write details of the proposed update to an write-aheadlog – e.g. in a simple case giving the old and new valuesof the data, or giving a list of smaller updates as a set of(address; old ; new) tuples

0 1 2 3 4 5 6

48 65 6C 6C 6F 21 00

1: 65 -> 452: 6C -> 4C3: 6C -> 4C4: 6F -> 4F

Log

2. Proceed through the log making the updates

0 1 2 3 4 5 6

6C 6F 21 00

1: 65 -> 452: 6C -> 4C3: 6C -> 4C4: 6F -> 4F

Log

48 4C45

Crash during 1) no updates performed

Crash during 2) re-check log, either undo (so no changes)or redo (so all changes made)

Concurrent Systems and Applications 2001 – 54 Tim Harris

Page 55: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Persistent storage – logging (2)

More generally we can record details of multipletransactions in the log by associating each with a transactionid. Complete records, held in an append-only log, may be ofthe form:

� (transaction; operation; old ; new)

� or (transaction; start=abort=commit)

T1, x, add(1), 2, 3

T2, y, add(10), 17, 27

T2, ABORT

Log entriesObject values

y = 17

x = 3

Cac

he

Dis

k

Object values

y = 17

x = 2

z = 42

Previous entries

T2, START

Checkpoint:T2 active

Restart file

Concurrent Systems and Applications 2001 – 55 Tim Harris

Page 56: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Persistent storage – logging (3)

We can cache values in memory and use the log for recovery

� A portion of the log may also be held in volatile storage,but records for a transaction must be written tonon-volatile storage before that transaction commits

� Values can be written out lazily: the system state can berecovered using the log

A naïve implementation would be inefficient, e.g. whenaborting a transaction. A checkpoint mechanism can beused, e.g. every x seconds or every y log records. For eachcheckpoint:

� Force log records out to non-volatile storage

� Write a special checkpoint record that identifies thethen-active transactions

� Force cached updates out to non-volatile storage

Then write the location of the checkpoint record into arestart file

Concurrent Systems and Applications 2001 – 56 Tim Harris

Page 57: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Persistent storage – logging (4)T

ran

sact

ion

s

Checkpoint Failure

time

TS

RQ

P

P already committed before the checkpoint – any itemscached in volatile storage must have been flushed

Q active at the checkpoint but subsequently committed –log entries must have been flushed at commit, REDO

R active but not yet committed – UNDO

S not active but has committed – REDO

T not active, not yet committed – UNDO

Concurrent Systems and Applications 2001 – 57 Tim Harris

Page 58: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Persistent storage – logging (5)

A general algorithm for recovery:

� The recovery manager keeps UNDO and REDO lists

� Initialize UNDO with the set of transactions active at thelast checkpoint

� REDO is initially empty

� Search forward from the checkpoint record:– Add transactions that start to the UNDO list– Move transactions that commit from the UNDO list to

the REDO list

� Then work backwards through the log from the end to thecheckpoint record:– UNDOing the effect of transactions on the UNDO list

� Then work forwards from the log from the checkpointrecord:– REDOing the effect of transactions in the REDO list

Storing old and new values in the log enables generalidempotent UNDO and REDO

Concurrent Systems and Applications 2001 – 58 Tim Harris

Page 59: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Persistent storage – shadowing

An alternative to logging: create separate old and newversions of the data structures being changed

48 65 6C 6C 6F 21 00

0 1 2 3 4 5 6Old meta-data

An update starts by constructing a new ‘shadow’ version ofthe data, possibly sharing unchanged components:

New meta-data

Old meta-data

48 65 6C 6C 6F 21 00

0 1 2 3 4 5 6

45 4C 4C 4F

7 8 9 A

The change is committed by a single in-place update to alocation containing a pointer to the current version. This lastchange must be guaranteed atomic by the system.

How can this be extended for persistent updates to multipleobjects?

Concurrent Systems and Applications 2001 – 59 Tim Harris

Page 60: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation

Recall our original example:

transactionally {

if (source.balance() >= amount) {

source.withdraw (amount);

destination.deposit (amount);

return true;

} else {

return false;

}

}

What can the system do in order to enforce isolationbetween transactions specified in this manner?

A simple approach: execute transactions serially, allowingonly one to operate at a time

✔ Simple, ‘clearly correct’, independent of the operationsperformed within the transaction

✘ Does not enable concurrent execution, e.g. two of theseoperations on separate sets of accounts

✘ What happens if operations can fail?

Concurrent Systems and Applications 2001 – 60 Tim Harris

Page 61: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – serialisability

This idea of executing transactions serially provides a usefulcorrectness criteria for executing transactions in parallel:

� A concurrent execution is serialisable if there is someserial execution of the same transactions that gives thesame result

Suppose we have two transactions:

T1: transactionally {

int s = A.read ();

int t = B.read ();

return s + t;

}

T2: transactionally {

A.credit (100);

B.debit (100);

}

If we assume that the individual read , credit and debitoperations are implemented atomically (e.g. bysynchronized methods) then an execution without furtherconcurrency control can proceed in 6 ways

Concurrent Systems and Applications 2001 – 61 Tim Harris

Page 62: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – serialisability (2)

Both of these concurrent executions are OK:

T1:

T2:

A.read B.read

A.credit B.debit

T1:

T2: B.debit

A.read

A.credit

B.read

Neither of these concurrent executions is valid:

T1:

T2:

A.read

A.credit

B.read

B.debit

T1:

T2: A.credit

A.read B.read

B.debit

In each case some – but not all – of the effects of T2 havebeen seen by T1, meaning that we have not achievedisolation between the transactions

Concurrent Systems and Applications 2001 – 62 Tim Harris

Page 63: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – serialisability (3)

We can depict a particular execution of a set of concurrenttransactions by a history graph

� Nodes in the graph represent the operations comprisingeach transaction, e.g. T1: A.read

� An directed edge from node a to node b means that ahappened before b

– Operations within a transaction are totally ordered bythe program order in which they occur

– Conflicting operations on the same object are orderedby the object’s implementation

For clarity we usually omit edges that can be inferred by thetransitivity of happens before

Suppose again that we have two objects A and B associatedwith integer values and run transaction T1 that reads valuesfrom both and transaction T2 that adds to A and subtractsfrom B

Concurrent Systems and Applications 2001 – 63 Tim Harris

Page 64: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – serialisability (4)

These histories are OK. Either both the read operations seethe old values of A and B:

T1:

T2:

start commit

start commit

A.read B.read

B.debitA.credit

or both read operations see the new values:

T1:

T2:

start

start commit

commitA.read B.read

B.debitA.credit

Concurrent Systems and Applications 2001 – 64 Tim Harris

Page 65: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – serialisability (5)

These histories show non-serialisable executions in whichone read sees an old value and the other sees a new value:

T1:

T2:

A.read B.read

B.debitA.credit

start

start commit

commit

T1:

T2:

A.read B.read

B.debitA.credit

start

start commit

commit

Concurrent Systems and Applications 2001 – 65 Tim Harris

Page 66: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – serialisability (6)

We can derive a simpler serialisation graph in which nodesrepresent transactions and a directed edge from node Ta toTb means that some node in Tb’s history graph is reachablefrom some node in Ta’s

A history is serialisable iff its serialisation graph is acyclic

T1 T2 T2T1

T2T1 T2T1

These graphs show whether a particular execution of thetransactions corresponds to a serialisable execution

As we’ve seen in this example, one piece of code can leadto both serialisable and non-serialisable histories

The transaction management system is responsible forensuring that a serialisable execution is chosen at run-time

Concurrent Systems and Applications 2001 – 66 Tim Harris

Page 67: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – two-phase locking

We’ll now look at some mechanisms for ensuring thattransactions are executed in a serialisable manner whileallowing more concurrency than an actual serial executionwould achieve

In two-phase locking (2PL) each transaction is divided into

� a phase of acquiring locks

� a phase of releasing locks

Locks must exclude other operations that may conflict withthose to be performed by the lock holder. Simple mutualexclusion locks may suffice, but could limit concurrency. Inthe example we could use a MRSW lock, held in read modefor read and write mode for credit and debit

� If Ta performs an operation that comes before aconflicting one by Tb then Ta must have released a lockon the object and Tb acquired one

� At that point Ta must have entered its releasing phase – itcan’t acquire locks on further objects that Tb may havepreviously updated

Concurrent Systems and Applications 2001 – 67 Tim Harris

Page 68: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – two-phase locking (2)

How does the system know when (and how) to acquire andrelease locks if transactions are defined in the form:

transactionally {1

if (source.balance() >= amount) {2

source.withdraw (amount);3

destination.deposit (amount);4

return true;5

} else {6

return false;7

}8

}9

� Could require explicit invocations by the programmer,e.g. additional operations to

– acquire a read lock on source before 2, release if theelse clause is taken,

– upgrade to a write lock on source before 3,

– acquire a write lock on destination before 4,

– release the lock on source any time after acquiringboth locks,

– release the lock on destination after 4

Concurrent Systems and Applications 2001 – 68 Tim Harris

Page 69: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – two-phase locking (3)

How well would this form of two-phase locking work?

✔ Ensures serialisable execution if implemented correctly

✔ Allows arbitrary application-specific knowledge to beexploited, e.g. using MRSW for increased concurrencyover mutual exclusion locks

✔ Allowing other transactions to access objects as soon asthey have been unlocked increases concurrency

✘ Complexity of programming (e.g. 2PL) MRSW needsan upgrade operation here)

✘ Risk of deadlock

✘ If Ta ! Tb then isolation requires that– Tb cannot commit until Ta has– Tb must abort if Ta does (‘cascading aborts’)

Some of these problems can be addressed by strict isolationin which all locks are held until release: transactions neversee partial updates made by others

With Strict 2PL locks are only released when a transactioncommits or aborts – no cascading aborts but consider theeffect of long transactions...

Concurrent Systems and Applications 2001 – 69 Tim Harris

Page 70: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – timestamp ordering

Timestamp ordering (TSO) is another mechanism to enforceisolation:

� Each transaction has a timestamp – e.g. of its start time.These must be totally ordered, using a suitable tie-break ifnecessary

� Each object requires fields to hold

– The timestamp of the most recent transaction

– The operation invoked upon it

� Each time an operation is invoked that conflicts with theprevious one on the object:

✔ It is allowed to proceed if it is from a transaction witha later timestamp

✘ It is rejected as too late if it is from an earliertransaction

Concurrent Systems and Applications 2001 – 70 Tim Harris

Page 71: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – timestamp ordering (2)

One serialisable order is achieved: that of the timestamps ofthe transactions, e.g.

T1,1: start T2,1: start

T1,2: A.read() T2,2: A.credit()

T1,3: B.read() T2,3: B.debit()

✔ T1,1 executes,! timestamp 17

✔ T1,2 executes, A: 17,read

✔ T2,1 executes,! timestamp 42

✔ T2,2 executes, OK (later) A: 42,credit

✔ T2,3 executes, B: 42,debit

✘ T1,3 attempted: too late 17 earlier than 42 and readconflicts with credit

In this case both transactions could have committed if T1,3had been executed before T2,3

Concurrent Systems and Applications 2001 – 71 Tim Harris

Page 72: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – timestamp ordering (3)

✔ The decision of whether to admit a particular operation isbased on information local to the object

✔ Simple to implement – e.g. by interposing the checks oneach invocation (contrast with 2PL)

✔ Avoiding locking may increase concurrency (but seebelow: the work performed may not be useful)

✔ Deadlock is not possible

✘ Cascading aborts are possible – e.g. if T1,2 had updatedA then it would need to be undone and T2 would have toabort because it may have been influenced by T1

— could delay T2,2 until T1 either commits or aborts(still avoiding deadlock)

✘ Serialisable executions can be rejected if they do notagree with the transactions timestamps (e.g. executing T2in its entirety, then T1)

Generally: the low overheads and simplicity make TSO goodwhen conflicts are rare

Concurrent Systems and Applications 2001 – 72 Tim Harris

Page 73: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – OCC

Optimistic Concurrency Control (OCC) is anothermechanism for enforcing isolation

A transaction operates on shadow copies of objects: changesremain local. Copies may be taken at transaction start orperhaps each time it accesses a new object

Upon commit:

� Validate that the the shadows were consistent...

� ...and no other transaction has committed an operationon an object which conflicts with one intended by thistransaction

✔ If OK then commit the updates to the persistent objects,in the same transaction-order at every object

✘ If not OK then abort: discard shadows and retry

Note that abort is easy: just discard the shadows

No cascading aborts or deadlock

But conflicts force transactions to retry

Concurrent Systems and Applications 2001 – 73 Tim Harris

Page 74: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – OCC (2)

Validation is the complex part of OCC. As usual there aretrade-offs between the implementation complexity,generality and likelihood that a transaction must abort

We’ll consider a validation scheme using

� a single-threaded validator

� the usual distinction between conflicting andcommutative operations

Transactions are assigned timestamps when they passvalidation, defining the order in which the transactions havebeen serialised. We’ll assign timestamps when validationstarts and then either

� confirm during validation that this gives a serialisableorder, or

� discover that it does not and abort the transaction

Elaborate schemes are probably unnecessary: OCC assumestransactions do not usually conflict

Concurrent Systems and Applications 2001 – 74 Tim Harris

Page 75: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – OCC (3)

The validator maintains a preceding transactions list:

Validated Validation Objects Committedtransaction timestamp updated

P 10 A, B, C Yes

Q 11 D Yes

R 12 A, E

Transactions P and Q have been validated and committedto persistent storage. R has been accepted by the validatorbut its updates to objects A and E not yet committed

A current timestamp is maintained by each object, holdingthe validation timestamp of the most recent transactioncommitted to it:

Object TimestampA 12B 10C 10D 11E 10

The update to E remains to take place

Concurrent Systems and Applications 2001 – 75 Tim Harris

Page 76: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – OCC (4)

Before execution:

� Record the validation timestamp of the most recentlyvalidated but not committed transaction – in this case 12.This will be the base timestamp

Validation phase 1:

� Compare each shadow’s timestamp against the basetimestamp

✔ Shadow earlier (B,C,D,E): part of a consistent snapshotbefore 12

✘ Otherwise (A): it may have seen a subsequent update

Validation phase 2:

� Compare the transaction T against each entry (Told) inthe list

✔ Told before the base timestamp

✔ Told has no conflicting updates

✘ Otherwise abort T

Concurrent Systems and Applications 2001 – 76 Tim Harris

Page 77: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Isolation – recap

We’ve seen three schemes:

1. 2PL uses explicit locking to prevent concurrenttransactions performing conflicting operations. Strict 2PLenforces strict isolation and avoids cascading aborts.Both may allow deadlock

✔ Use when contention is likely and deadlockavoidable. Use strict 2PL if transactions are short orcascading aborts problematic

2. TSO assigns transactions to a serial order at the time theystart. Can be modified to enforce strict isolation. Doesnot deadlock but serialisable executions may be rejected

✔ Simple and effective when conflicts are rare.Decisions are made local to each object: suitable fordistributed systems

3. OCC allows transactions to proceed in parallel onshadow objects, deferring checks until they try to commit

✔ Good when contention is rare. Validator may allowmore flexibility than TSO

Concurrent Systems and Applications 2001 – 77 Tim Harris

Page 78: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example

Finally, we’ll look at an example implementing TSO in Java

� This is, of course, only looking at enforcing isolationbetween transaction – there is no persistent storage

� The syntax is more cumbersome than the

transactionally {

...

}

notation that may be desired in that the programmer mustuse a try ...catch block to deal with abortingtransactions and must pass an additional transactionobject to each method

� In overview, an instance of TSOTransaction is createdfor each transaction performed. This is used to keep trackof the objects that transaction accesses (instantiated fromsub-classes of TSOTransactorObject ). It records theold state of each object to allow transactions to abort

Concurrent Systems and Applications 2001 – 78 Tim Harris

Page 79: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example – main program

import java.util.Random;

class Example {

static Account a = new Account (100);

static Account b = new Account (100);

static volatile int u, a1, a2;

public static void main (String args[]) {

Thread t1 = new Thread () {

public void run () {

Random r = new Random ();

while (true) {

Transaction tx = null;

try {

tx = new TSOTransaction ();

int n = r.nextInt ();

b.delta (tx, -n);

a.delta (tx, n);

tx.commit (); u++;

} catch (Failure f) {

tx.abort (); a1++;

}

}

}

};

Concurrent Systems and Applications 2001 – 79 Tim Harris

Page 80: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example – main program (2)

Thread t2 = new Thread () {

public void run () {

while (true) {

Transaction tx = null;

int total;

try {

tx = new TSOTransaction ();

total = a.read (tx) + b.read (tx);

tx.commit ();

System.out.println (

"Total=" + total +

" (" + u + "," + a1 +

"," + a2 + ")");

} catch (Failure f) {

tx.abort (); a2++;

}

}

}

};

t1.start ();

t2.start ();

}

}

Concurrent Systems and Applications 2001 – 80 Tim Harris

Page 81: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example – Account

class Account extends TSOTransactorObject

{

int value;

Account (int value) { this.value = value; }

Object getState () {

return new Integer (value);

}

void setState (Object o) {

value = ((Integer)o).intValue();

}

void delta (Transaction tx, int change)

throws Failure

{

enter (tx); value += change;

}

int read (Transaction tx) throws Failure

{

enter (tx); return value;

}

}

Concurrent Systems and Applications 2001 – 81 Tim Harris

Page 82: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example – TSOTransactorObject

abstract class TSOTransactorObject {

TSOTransaction mostRecent;

synchronized void enter (Transaction t)

throws Failure

{

TSOTransaction tx = (TSOTransaction) t;

if (mostRecent != null &&

mostRecent != tx) {

if (tx.earlierThan (mostRecent)) {

throw new Failure ("Too late");

} else {

mostRecent.waitFor ();

}

}

mostRecent = tx;

tx.enter (this, getState ());

}

abstract Object getState ();

abstract void setState (Object o);

}

Concurrent Systems and Applications 2001 – 82 Tim Harris

Page 83: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example – Transaction

abstract class Transaction {

public static int STATUS_ACTIVE = 0;

public static int STATUS_COMMITTED = 1;

public static int STATUS_ABORTED = 2;

int status = STATUS_ACTIVE;

abstract void abort ();

abstract void commit () throws Failure;

synchronized void waitFor () throws Failure {

try {

if (status == STATUS_ACTIVE) wait ();

} catch (InterruptedException ie) {

throw new Failure("Interrupted");

}

}

synchronized void setStatus (int status) {

this.status = status;

notifyAll ();

}

}

Concurrent Systems and Applications 2001 – 83 Tim Harris

Page 84: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example – TSOTransaction

import java.util.Vector;

class TSOTransaction extends Transaction {

Vector os = new Vector ();

Vector states = new Vector ();

long id = getNextId ();

static long nextId;

static synchronized long getNextId () {

return nextId ++;

}

boolean earlierThan (TSOTransaction other) {

return (id < other.id);

}

void enter (TSOTransactorObject o,

Object old) {

os.addElement (o);

states.addElement (old);

}

Concurrent Systems and Applications 2001 – 84 Tim Harris

Page 85: Communication between processes · 2001. 10. 26. · Communication between processes (2) Most directly, introducing separate address spaces means that data is not directly shared

Example – TSOTransaction (2)

void abort () {

for (int i = os.size() - 1; i >= 0; i --)

{

TSOTransactorObject tso;

tso = (TSOTransactorObject)

os.elementAt (i);

tso.setState (states.elementAt (i));

}

setStatus (STATUS_ABORTED);

}

void commit () throws Failure {

if (status != STATUS_ACTIVE)

throw new Failure ("Cannot commit");

setStatus (STATUS_COMMITTED);

}

}

$ javac *.java && java Example

Total=200 (2,1,0)

Total=200 (1003,2,2)

Total=200 (1095,2,2)

Total=200 (1222,3,4)

...

Concurrent Systems and Applications 2001 – 85 Tim Harris