military technical academy bucharest, 2004 grid applications & data considerations adina riposan...

43
Military Technical Academy B ucharest, 2004 GRID GRID APPLICATIONS & DATA APPLICATIONS & DATA CONSIDERATIONS CONSIDERATIONS ADINA RIPOSAN ADINA RIPOSAN Applied Information Technology Applied Information Technology Department of Computer Engineering Department of Computer Engineering

Upload: erika-day

Post on 31-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Military Technical Academy Bucharest, 2004

GRID GRID

APPLICATIONS & DATA APPLICATIONS & DATA

CONSIDERATIONSCONSIDERATIONS

ADINA RIPOSANADINA RIPOSANApplied Information TechnologyApplied Information Technology

Department of Computer EngineeringDepartment of Computer Engineering

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Application considerationsApplication considerations

Data considerationsData considerations

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Application considerationsApplication considerations

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Application considerationsApplication considerations

The considerations that need to be made The considerations that need to be made

when when evaluating,evaluating,

designing,designing, or or

converting applicationsconverting applications

for use in a Grid computing environmentfor use in a Grid computing environment

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

NNot all ot all AApplicationspplications can be transformed to run can be transformed to run in parallel on a in parallel on a GGrid and achieve scalability.rid and achieve scalability.

Grid Grid AApplications can be categorized in one of the pplications can be categorized in one of the following 3 categories:following 3 categories:

• Applications that Applications that are not enabledare not enabled for using for using multiple processors but multiple processors but can be executedcan be executed on on different machines. different machines.

• Applications that Applications that are already designedare already designed to use to use the multiple processors of a the multiple processors of a GGrid setting.rid setting.

• Applications that Applications that need to be modifiedneed to be modified or or rewrittenrewritten to better exploit a to better exploit a GGrid.rid.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

There are many factors to consider in grid-There are many factors to consider in grid-enabling an enabling an AApplicationpplication

• New computation intensive applicationsNew computation intensive applications written written today are being today are being designed for parallel executiondesigned for parallel execution => => and these will be and these will be easily grideasily grid--enabledenabled, if they do not , if they do not

already follow emerging grid protocols and already follow emerging grid protocols and standards.standards.

• There are some There are some practical toolspractical tools that skilled that skilled application designers can use to write a application designers can use to write a parallel parallel grid applicationgrid application..

• There are There are NONO practical tools practical tools for transforming for transforming arbitrary applicationsarbitrary applications to exploit the parallel to exploit the parallel capabilities of a grid. capabilities of a grid. => => Automatic transformation of applications is a Automatic transformation of applications is a

science in its infancy.science in its infancy.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

• Applications specifically designed to useApplications specifically designed to use multiple processorsmultiple processors or other or other federated federated resourcesresources of a Grid will benefit most. of a Grid will benefit most.

For grid computing, we should examine any For grid computing, we should examine any applications that consume large amounts of applications that consume large amounts of CPU time.CPU time.

• Applications that Applications that can be run in a batch modecan be run in a batch mode are the are the easiest to handleeasiest to handle..

• Applications that Applications that need interaction through need interaction through graphical user interfacesgraphical user interfaces are are more difficultmore difficult to to run on a grid, but not impossible. run on a grid, but not impossible.

They can use remote graphical terminal They can use remote graphical terminal support, support, such as X Windows or such as X Windows or other means.other means.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

The most important step in The most important step in

Grid-enabling an Grid-enabling an

ApplicationApplication::

=> to determine whether the calculations => to determine whether the calculations

can be done can be done in parallelin parallel or not or not

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

• HPC clustersHPC clusters (High Performance (High Performance Computing) are sometimes used to handle Computing) are sometimes used to handle the execution of applications that can the execution of applications that can utilize utilize parallel processingparallel processing

• GRIDS provide the ability to run these GRIDS provide the ability to run these applications across a set of applications across a set of heterogeneous, heterogeneous, geographically dispersegeographically disperse set of clustersset of clusters. .

Rather than run the application on a single Rather than run the application on a single homogenous cluster, the application can take homogenous cluster, the application can take advantage of the larger set of resources in the advantage of the larger set of resources in the Grid. Grid.

If If the algorithmthe algorithm is such that each computation is such that each computation depends on the prior calculation, then a new depends on the prior calculation, then a new algorithm would need to be found. algorithm would need to be found.

• Not all problems can be converted into parallel Not all problems can be converted into parallel calculations.calculations.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Some computations Some computations cannot be rewrittencannot be rewritten to to execute in parallel.execute in parallel.

For example, For example, in physicsin physics, there are no , there are no simple formulas that show where three or simple formulas that show where three or more moving bodies in space will be after more moving bodies in space will be after a specified time when they gravitationally a specified time when they gravitationally affect each other. affect each other.

• Each computation depends on the prior Each computation depends on the prior

one. one.

• This is repeated a great number of times This is repeated a great number of times

until the desired time is until the desired time is

reached.reached.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Often, an Application may be a mix of Often, an Application may be a mix of

independent computationsindependent computations as well as as well as

dependent computationsdependent computations

One needs to analyze the application to see ifOne needs to analyze the application to see if there is a way TO SPLIT some subset of the work. there is a way TO SPLIT some subset of the work.

Drawing a Drawing a program flow graphprogram flow graph and a and a

data dependency graphdata dependency graph can help in analyzing can help in analyzing whether and how an application could be whether and how an application could be separated into separated into independently running parallel independently running parallel partsparts..

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Rearranging SERIAL computations Rearranging SERIAL computations

to execute in PARALLELto execute in PARALLEL

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Simulation that Simulation that cannotcannot be made PARALLEL be made PARALLEL

but but needs to run many timesneeds to run many times

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Another approach to Another approach to reducing data dependencyreducing data dependency on prior computationson prior computations is to look for ways to is to look for ways to use use REDUNDANT computationsREDUNDANT computations. .

• If the dependency is on a subset of the If the dependency is on a subset of the prior prior computationscomputations::

to have each successive computation that needs to have each successive computation that needs the results of the prior computation the results of the prior computation recompute recompute those resultsthose results

instead of waiting for them to arrive from another instead of waiting for them to arrive from another job.job.

• If the dependency is on a computation that If the dependency is on a computation that has a has a YES/NO answerYES/NO answer::

to compute the next calculations for both of the to compute the next calculations for both of the “yes” and “no” cases“yes” and “no” cases, and , and

throw away the wrong choice when the throw away the wrong choice when the dependency is finally known.dependency is finally known.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

This technique can be taken to extremes in This technique can be taken to extremes in various ways.various ways.

• For example, for 2 bits of For example, for 2 bits of data dependencydata dependency, , we could make 4 copies of the next we could make 4 copies of the next computation with all four possible input computation with all four possible input values.values.

=>This can proceed to copies of the next =>This can proceed to copies of the next calculation for N bits of data dependency. calculation for N bits of data dependency.

• As N gets large, it quickly becomes too As N gets large, it quickly becomes too costly to compute all possible costly to compute all possible computations.computations.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

We may speculate and only perform the We may speculate and only perform the copies for the values we guess might be copies for the values we guess might be more likely to be correct. more likely to be correct.

if we did not guess the correct one, then we if we did not guess the correct one, then we simply end up computing it in simply end up computing it in

series,series,

but if we guessed correctly it saves us but if we guessed correctly it saves us overall overall real time. real time.

Here Here HEURISTICSHEURISTICS (rules of thumb) (rules of thumb) could be could be

developed to make the best possible developed to make the best possible

guesses.guesses.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

• The same kind of The same kind of speculative computingspeculative computing (speculative approach) is used to improve the (speculative approach) is used to improve the efficiency inside CPUsefficiency inside CPUs

by executing both branches of a condition until the by executing both branches of a condition until the correct one is determined.correct one is determined.

• In many cases, an Application is used to test In many cases, an Application is used to test an array of “what if” input valuesan array of “what if” input values..

each of the alternatives can be a each of the alternatives can be a separate jobseparate job running running the the same simulation applicationsame simulation application, but with , but with different different input valuesinput values..

=> This is called a => This is called a

PARAMETER SPACE PARAMETER SPACE PROBLEMPROBLEM

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Redundant speculative computation Redundant speculative computation

to reduce to reduce latencylatency

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

A Computation Grid is ideally suited for this A Computation Grid is ideally suited for this kind of problemkind of problem

• The parallelism comes from running The parallelism comes from running many many separate jobsseparate jobs that cover that cover the parameter spacethe parameter space..

• Some grid products provide tools for simplifying Some grid products provide tools for simplifying the submission of the many sub-jobs in a the submission of the many sub-jobs in a parameter space explorationparameter space exploration type of type of application.application.

Applications that consist of Applications that consist of a large number of a large number of independent subjobsindependent subjobs are very suitable for are very suitable for exploiting Grid CPU resources.exploiting Grid CPU resources.

=> These are sometimes called => These are sometimes called

PARAMETER SPACE PARAMETER SPACE SEARCHESSEARCHES

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Parameter space problems are:Parameter space problems are:

• finite in nature, or finite in nature, or • infinite, or infinite, or • so large that all possible parameter inputs so large that all possible parameter inputs

cannot be examined. cannot be examined.

=> For these kinds of parameter space => For these kinds of parameter space problems, it is useful to use problems, it is useful to use additional additional heuristicsheuristics

to select which parts of the to select which parts of the parameter space to tryparameter space to try

This may not lead to the absolute best solution, but it This may not lead to the absolute best solution, but it may be close enough.may be close enough.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

• It may be acceptable It may be acceptable to explore only a small to explore only a small part of the parameter spacepart of the parameter space. .

to try a reasonable number of to try a reasonable number of randomly randomly scatteredscattered points in the problem’s parameter points in the problem’s parameter space first,space first,

then to try then to try small changessmall changes in the parameters in the parameters around the best points that might lead to a around the best points that might lead to a better solution. better solution.

=> This technique is useful when the parameter => This technique is useful when the parameter

space space relates relatively smoothlyrelates relatively smoothly to changes to changes

in the result.in the result.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Many times, an application that was Many times, an application that was written for a single written for a single processorprocessor may not be organizedmay not be organized or use algorithms or use algorithms or approaches that are suitable for splitting into or approaches that are suitable for splitting into parallel subcomputationsparallel subcomputations. .

• An application may have been written in a way that makes An application may have been written in a way that makes it most efficient on ait most efficient on a single processor machine single processor machine..

• However, there may be However, there may be other methods or algorithmsother methods or algorithms that are not as efficient, yet may be much more amenable that are not as efficient, yet may be much more amenable to being split into independently running subcomputations.to being split into independently running subcomputations.

A A different algorithmdifferent algorithm may “scale” bettermay “scale” better because it can because it can more efficiently use more efficiently use larger and larger numbers of larger and larger numbers of processorsprocessors. .

=> Thus, another approach for Grid enabling an Application is => Thus, another approach for Grid enabling an Application is to to revisit the choicesrevisit the choices made when the Application was originally made when the Application was originally written.written.

Some of the Some of the discarded approachesdiscarded approaches may be better for Grid use. may be better for Grid use.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

SOME ADDITIONAL THINGS SOME ADDITIONAL THINGS

TO THINK ABOUTTO THINK ABOUT

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Is there any part of the computation that would Is there any part of the computation that would be performed more than once using the be performed more than once using the same data?same data?

If so, and if that computation is a significant If so, and if that computation is a significant portion of the overall work, it may be useful to portion of the overall work, it may be useful to save the results of such computations.save the results of such computations.

• How much output dataHow much output data would need to be saved to avoid the would need to be saved to avoid the computation the next time? computation the next time?

• If there is a very large amount of output data, If there is a very large amount of output data, it may be it may be prohibitive to save itprohibitive to save it..

• Even if any one computation’s results does not represent a Even if any one computation’s results does not represent a large amount of data, large amount of data, the aggregatethe aggregate for all of them might. for all of them might.

Need to consider this Need to consider this TIME-SPACE TRADE-OFFTIME-SPACE TRADE-OFF for for the application. the application.

• We could presumably save space and time by only saving the We could presumably save space and time by only saving the results for the most frequently occurring situations.results for the most frequently occurring situations.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

In a distributed Application, partial results or In a distributed Application, partial results or data dependencies may be met by data dependencies may be met by

communicating among subjobs.communicating among subjobs.

One job may compute some intermediate result One job may compute some intermediate result and then transmit it to another job in the Grid.and then transmit it to another job in the Grid.

• If possible, we should consider whether it might be any If possible, we should consider whether it might be any more efficient to simply more efficient to simply recompute the intermediate recompute the intermediate resultresult at the point where it is needed at the point where it is needed rather than rather than waitingwaiting for it from another job. for it from another job.

• We should also consider the We should also consider the transfer timetransfer time from another from another job, job, versus retrieving itversus retrieving it from a database of prior from a database of prior computations.computations.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Data considerationsData considerations

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Data considerationsData considerations

When splitting Applications for use on a Grid, it When splitting Applications for use on a Grid, it is important to consider is important to consider

• the amounts of datathe amounts of data that are needed to be sent that are needed to be sent to the node performing a calculation and to the node performing a calculation and

• the time requiredthe time required to send it. to send it.

* * Most ideal:Most ideal: If the Application can be split into If the Application can be split into small work unitssmall work units requiring requiring little input datalittle input data and and producing producing small amounts of output datasmall amounts of output data

The data is said to be The data is said to be “staged” to the node“staged” to the node doing the work.doing the work.

=> Sending this data along with the executable file => Sending this data along with the executable file to the to the Grid nodeGrid node doing the work is part of the doing the work is part of the function of most Grid systems.function of most Grid systems.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

* When the Grid Application is split into * When the Grid Application is split into subjobssubjobs, , often often the input data is a large fixed set of the input data is a large fixed set of datadata. .

• This offers the opportunity to This offers the opportunity to share this datashare this data rather than staging the entire set with each rather than staging the entire set with each subjob. subjob.

• However, even with a shared mountable file However, even with a shared mountable file system, the data is being sent over the network.system, the data is being sent over the network.

=> => The GOAL is to locate the shared data closer The GOAL is to locate the shared data closer to the jobs that need the data.to the jobs that need the data.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

If the data is going If the data is going to be used more than onceto be used more than once, it , it could be could be REPLICATEDREPLICATED to the degree that space to the degree that space permits.permits.

• If more than one copy of the data is stored in the If more than one copy of the data is stored in the Grid, it is important Grid, it is important to arrange for the subjobs to to arrange for the subjobs to access the nearest copyaccess the nearest copy per the configuration of per the configuration of the network.the network.

=> This highlights the need for an => This highlights the need for an information information serviceservice within the Grid within the Grid to trackto track this form of data this form of data awareness.awareness.

• The network should not become the bottleneckThe network should not become the bottleneck for such a Grid Application.for such a Grid Application.

=> If each subjob processes the data => If each subjob processes the data very very quicklyquickly and is and is always waiting for more data to always waiting for more data to arrivearrive, then sharing may not be the best model if , then sharing may not be the best model if the network data transfer speed to each subjob the network data transfer speed to each subjob does not at least match disk speeds.does not at least match disk speeds.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

SHARED DATA MAY BE FIXED SHARED DATA MAY BE FIXED

OR CHANGINGOR CHANGING

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

A.A. It is It is easier and more efficienteasier and more efficient to share a to share a database where:database where:

Latest data is not added to the database Latest data is not added to the database the the instant that it is available instant that it is available

B.B. In some shared-data situations updates In some shared-data situations updates must not be delayedmust not be delayed

If there are If there are copies of this databasecopies of this database elsewhere, elsewhere, they must all be they must all be updatedupdated with with each new item each new item SIMULTANEOUSLY. SIMULTANEOUSLY.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

A.A. It is It is easier and more efficienteasier and more efficient to share a to share a database where:database where:

Latest data is not added to the database the Latest data is not added to the database the instant that it is availableinstant that it is available

the updates to it can be batched and processed at the updates to it can be batched and processed at off-off-peak usage timespeak usage times, ,

rather than contending with concurrent access by rather than contending with concurrent access by applications.applications.

It improves performance if:It improves performance if:

More than one copy of this data existsMore than one copy of this data exists, and all of the copies , and all of the copies do not need to be simultaneously updateddo not need to be simultaneously updated

because all applications using the data would not need to because all applications using the data would not need to be stopped while updating the data, be stopped while updating the data,

only those accessing a particular copy would need to be only those accessing a particular copy would need to be stopped or temporarily paused.stopped or temporarily paused.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

When a file or a database is updated: When a file or a database is updated:

• Jobs Jobs cannot simultaneously readcannot simultaneously read the portion of the portion of the file concurrently being updated by another the file concurrently being updated by another job. job.

• LockingLocking or or synchronizing primitivessynchronizing primitives are typically are typically built into the files system or database to built into the files system or database to automatically prevent this. automatically prevent this.

Otherwise, the application might read partially Otherwise, the application might read partially updated data, perhaps receiving a combination of updated data, perhaps receiving a combination of old and new data.old and new data.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

B.B. In some shared-data situations updates In some shared-data situations updates must not be delayedmust not be delayed

If there are If there are copies of this databasecopies of this database elsewhere, elsewhere, they must all be they must all be updatedupdated with each new item with each new item

SIMULTANEOUSLY. SIMULTANEOUSLY.

Scaling issues:Scaling issues:

There can be a large amount of There can be a large amount of data data synchronization communicationssynchronization communications among jobs and among jobs and databases.databases.

The The synchronization primitivessynchronization primitives can become can become bottlenecks in overall Grid performance.bottlenecks in overall Grid performance.

=> The database activity should be => The database activity should be partitionedpartitioned::• so that there is less interference among the parts, so that there is less interference among the parts, • and thus less potential synchronization contention and thus less potential synchronization contention

among those parts.among those parts.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

* Applications that access the data they need * Applications that access the data they need

SERIALLYSERIALLY

More predictableMore predictable => various techniques can be => various techniques can be used to improve their performance on the Grid. used to improve their performance on the Grid.

=> => Shared copiesShared copies might be desirable might be desirable if each subjob needs to access all of the dataif each subjob needs to access all of the data

=> => Multiple copiesMultiple copies of the data should be of the data should be considered considered

if bringing the data closer to the nodes running the if bringing the data closer to the nodes running the subjobs would help subjobs would help

=> => Copies may not be desirableCopies may not be desirable if each part of the data is examined only once if each part of the data is examined only once

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

* However, if the access is SERIAL, * However, if the access is SERIAL,

some of the some of the retrieval timeretrieval time can be can be

overlapped with overlapped with processing timeprocessing time

There could be There could be a thread retrieving the dataa thread retrieving the data

that will be needed next while the data already that will be needed next while the data already retrieved is being processed.retrieved is being processed.

=> This can even apply to => This can even apply to

randomly accessed data,randomly accessed data,

if there is the ability to do if there is the ability to do some predictionsome prediction of of which portions of data will be needed next.which portions of data will be needed next.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

* One of the most difficult problems with * One of the most difficult problems with

DUPLICATING rapidly changing databases is DUPLICATING rapidly changing databases is

keeping them in SYNCHRONIZATION.keeping them in SYNCHRONIZATION.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

The first step is to see The first step is to see if rapid synchronization is if rapid synchronization is really neededreally needed..

• If the rapidly changing data is If the rapidly changing data is only a subset of the only a subset of the databasedatabase, , memory versionsmemory versions of the database might be of the database might be considered. considered.

• Network communicationNetwork communication bandwidth into the central bandwidth into the central database repository could also be increased.database repository could also be increased.

Is it possible to rewrite the Application so that Is it possible to rewrite the Application so that

it uses it uses a data flow approacha data flow approach rather than rather than

the central state of a databasethe central state of a database ? ?

• Perhaps it can use Perhaps it can use self contained transactionsself contained transactions that are that are transmitted to where they are needed. transmitted to where they are needed.

• The The subjobssubjobs could use direct communications between could use direct communications between them as the them as the primary flow for data dependencyprimary flow for data dependency rather than passing this data through a database first.rather than passing this data through a database first.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

* In some applications, various * In some applications, various database database

recordsrecords may need to be updated may need to be updated

ATOMICALLY or IN CONCERT WITH OTHERS.ATOMICALLY or IN CONCERT WITH OTHERS.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

• Locking or synchronization primitivesLocking or synchronization primitives are used: are used:

to lock all of the related database entries (in to lock all of the related database entries (in the same database or not) the same database or not)

then the database entries are updated while then the database entries are updated while the synchronization primitives keep other the synchronization primitives keep other subjobs waiting until the update is finished.subjobs waiting until the update is finished.

• The need for ways The need for ways to minimize the number of to minimize the number of records being updated simultaneouslyrecords being updated simultaneously

to reduce the contention created by the to reduce the contention created by the synchronization mechanism. synchronization mechanism.

• Caution not to create situations which might Caution not to create situations which might cause a cause a synchronization deadlocksynchronization deadlock

with 2 subjobs waiting for each other to unlock with 2 subjobs waiting for each other to unlock a resource the other needs.a resource the other needs.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

There are 3 ways that are usually used to There are 3 ways that are usually used to prevent this problemprevent this problem

1. To have all waits for resources to include time-1. To have all waits for resources to include time-outs: outs:

If the If the time-outtime-out is reached, then the operation must be is reached, then the operation must be undone and started over in an attempt to have better undone and started over in an attempt to have better luck at completing the transaction luck at completing the transaction

(easiest, but can be most wasteful)(easiest, but can be most wasteful)

2. To lock all of the resources in a predefined order 2. To lock all of the resources in a predefined order ahead of the operation:ahead of the operation:

If all of the If all of the lockslocks cannot be obtained, then any locks cannot be obtained, then any locks acquired should be released and then, after an optional acquired should be released and then, after an optional time period, another attempt should be made.time period, another attempt should be made.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

3. To use deadlock detection software: 3. To use deadlock detection software:

A A transitive closuretransitive closure of all of the waiters is computed of all of the waiters is computed before placing the requesting task into a wait for the before placing the requesting task into a wait for the resource. resource.

If it would cause a deadlock, the task is not put into If it would cause a deadlock, the task is not put into a wait. The task should release its locks and try a wait. The task should release its locks and try again later. again later.

If it would not cause a deadlock, the task is set to If it would not cause a deadlock, the task is set to automatically wait for the desired resource.automatically wait for the desired resource.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

* It may be necessary to run an Application * It may be necessary to run an Application REDUNDANTLY REDUNDANTLY

(e.g., for reliability reasons)(e.g., for reliability reasons)

• The Application The Application may be run simultaneouslymay be run simultaneously on on geographically distinct parts of the Grid geographically distinct parts of the Grid

to reduce the chances that a failure would prevent to reduce the chances that a failure would prevent the Application from completing its work or prevent it the Application from completing its work or prevent it from providing a reliable service.from providing a reliable service.

• If the Application If the Application updates databasesupdates databases or has other or has other data communicationsdata communications

to be designed to tolerate redundant data activity to be designed to tolerate redundant data activity caused by running multiple copies of the application; caused by running multiple copies of the application; otherwise, computed results may be in error.otherwise, computed results may be in error.