![Page 1: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/1.jpg)
Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Case Studies of Using Condor for Scientists
Barcelona, 2006
![Page 2: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/2.jpg)
2http://www.cs.wisc.edu/condor
AgendaExtended user’s tutorialAdvanced Uses of Condor
Java programsDAGManStorkMWGrid Computing
Case studies, and a discussion of your application‘s needs
![Page 3: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/3.jpg)
3http://www.cs.wisc.edu/condor
BLAST
![Page 4: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/4.jpg)
4http://www.cs.wisc.edu/condor
Background
• Each species has a genetic encoding within its cells
• Humans are made of approximately 1014 cells
![Page 5: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/5.jpg)
5http://www.cs.wisc.edu/condor
Background• The human nucleus of each
cell contains 46 chromosomes• Each chromosome contains
between 231 and 2958 genes• Each chromosome is made of
somewhere between 25 million and 237 million (approximately) base pairs
![Page 6: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/6.jpg)
6http://www.cs.wisc.edu/condor
![Page 7: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/7.jpg)
7http://www.cs.wisc.edu/condor
Base Pairs (Simplified)
• Each base pair is one of 4 nucleotides
• Each nucleotide is represented by one letter:
A C G T
![Page 8: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/8.jpg)
8http://www.cs.wisc.edu/condor
The Science Issue
Scientists ask many questions and pose computationally difficult issues:map a species’ genome - build a huge
database of informationunderstand evolution at a genetic level –
answer homology and related questionsidentify mutations and genes – to develop
diagnoses and medical treatments
![Page 9: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/9.jpg)
9http://www.cs.wisc.edu/condor
BLAST
Basic Local Alignment Search Tool A really good pattern matching program An answer to the science questions often
requires queries such asDoes the following nucleotide sequence
(~1000 pairs), or something close appear in the database (several billions of pairs)? To what certainty is there a match?
![Page 10: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/10.jpg)
10http://www.cs.wisc.edu/condor
The Biological Magnetic Resonance Data Bank
Department of Biochemistry at University of Wisconsin-Madison
Part of the Center for Eukaryotic Structural Genomics (CESG)
Working on three dimensional protein structure
![Page 11: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/11.jpg)
11http://www.cs.wisc.edu/condor
The BMRB and BLAST
The BMRB (with the help of the Condor Team) has a weekly set of automated BLAST runs
These BLAST runs compare progress on the BMRB set of working proteins to the Protein Data Bank
![Page 12: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/12.jpg)
12http://www.cs.wisc.edu/condor
Serial versus Parallel
Too slow: The BMRB working set could be input as a single BLAST program execution Load the Protein Data Bank database Serially query the database with each protein
in the working set
Faster: Divide the working set into pieces that allow parallel executions of BLAST
![Page 13: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/13.jpg)
13http://www.cs.wisc.edu/condor
Weekly BMRB Runs
1. Obtain and install the BLAST executable and Protein Data Bank database
2. Decide on the best way to split the BMRB working set of proteins to minimize the parallel execution time
3. Make a custom DAG for this split4. Produce a report on the BMRB run
![Page 14: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/14.jpg)
14http://www.cs.wisc.edu/condor
E
BBB
The Custom DAG
. . .
E E. . .
C
B is BLAST
E is Extract results
![Page 15: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/15.jpg)
15http://www.cs.wisc.edu/condor
An Economics Application
Computations are done at points on a coordinate plane
Initial values are known along the axes Computation of one point at a time is too
slow (serial execution) Each point is dependent on 2 neighboring
points(x,y) can be computed knowing (x-1,y) and (x,y-
1)
![Page 16: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/16.jpg)
16http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
result
![Page 17: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/17.jpg)
17http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
![Page 18: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/18.jpg)
18http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
![Page 19: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/19.jpg)
19http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
![Page 20: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/20.jpg)
20http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
![Page 21: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/21.jpg)
21http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
![Page 22: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/22.jpg)
22http://www.cs.wisc.edu/condor
The DAG
1-1
1-2
2-1
1-3
2-2
3-1
1-4
2-3
3-2
4-1
etc.
![Page 23: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/23.jpg)
23http://www.cs.wisc.edu/condor
Use DAGMan
Write a program to generate the DAG input file
The submit description file (and the executable) is the same for each node in the DAG
![Page 24: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/24.jpg)
24http://www.cs.wisc.edu/condor
DAG Input FileJob 1-1 gonkulate.submitJob 1-2 gonkulate.submitParent 1-1 Child 1-2Job 2-1 gonkulate.submitParent 1-1 Child 2-1Job 1-3 gonkulate.submitParent 1-2 Child 1-3Job 2-2 gonkulate.submitParent 1-2 2-1 Child 2-2Vars 2-2 left=“file1-2”Vars 2-2 below=“file2-1”Vars 2-2 result=“file2-2”. . .
DAG input file, continued
Job 3-4 gonkulate.submit
Parent 2-4 3-3 Child 3-4
Vars 3-4 left=“file2-4”
Vars 3-4 below=“file3-3”
Vars 3-4 result=“file3-4”
. . .
![Page 25: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/25.jpg)
25http://www.cs.wisc.edu/condor
Submit Description File
In gonkulate.submit:universe = vanillaexecutable = gonkulateoutput = $(result)should_transfer_files = YESwhen_to_transfer_output = ON_EXITtransfer_input_files = $(left) $(below)log = gonkulate.lognotification = Neverqueue
![Page 26: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/26.jpg)
26http://www.cs.wisc.edu/condor
Nug30
![Page 27: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/27.jpg)
27http://www.cs.wisc.edu/condor
Description of Nug30 nug30 (a Quadratic Assignment Problem
instance of size 30) had been the “holy grail” of computational QAP research since 1968
In 2000, Anstreicher, Brixius, Goux, & Linderoth set out to solve this problem
Using a mathematically sophisticated and well-engineered algorithm, they still estimated that we would require 11 CPU years to solve the problem.
![Page 28: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/28.jpg)
28http://www.cs.wisc.edu/condor
Nugent’s Problem
There are a set of N locations and a set of N facilities, and each facility must be assigned a location. To measure the cost of each possible assignment, the flow between each pair of facilities is multiplied by the distance between the pair's assigned locations, and then a sum is taken over all of the pairs.
For Nug30, N = 30
![Page 29: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/29.jpg)
29http://www.cs.wisc.edu/condor
The formal definition of the quadratic assignment problem is Given two sets, P ("facilities") and L ("locations"), of equal
size, together with a weight function w : P x P R and a distance function d : L x L R. Find the bijection f : P L (assignment) such that the cost function:
w(a,b) . d(f(a), f(b))
is minimized and a and b are members of P.Usually weight and distance functions are viewed as a
square real-valued matrices.
QAP Definition*
* Wikipedia
![Page 30: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/30.jpg)
30http://www.cs.wisc.edu/condor
Scope of the Problem
This QAP problem is difficult due to the excessively large number of possible facility assignments.
The number of possible assignments is factorial in the number of facilities.N! = N x (N-1) x (N-2) x . . . x 2
30! is approximately 2.6 x 1032
![Page 31: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/31.jpg)
31http://www.cs.wisc.edu/condor
The Simplified Approach
• Method of choice is branch and bound
• The complete tree has 30! nodes as leaves
• Branching grows the tree• Bounding results in
pruning the tree
![Page 32: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/32.jpg)
32http://www.cs.wisc.edu/condor
The Nug30 Solution
Used a new algorithm calledquadratic programming bound
developed by Anstreicher and Brixius Sequential execution would have
taken 7 years, so parallelization of the algorithm was important
Used MW
![Page 33: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/33.jpg)
33http://www.cs.wisc.edu/condor
Nug30 Computational Grid
Number Arch/OS Location 414 Intel/Linux Argonne
96 SGI/Irix Argonne
1024 SGI/Irix NCSA
16 Intel/Linux NCSA
45 SGI/Irix NCSA
246 Intel/Linux Wisconsin
146 Intel/Solaris Wisconsin
133 Sun/Solaris Wisconsin
190 Intel/Linux Georgia Tech
94 Intel/Solaris Georgia Tech
54 Intel/Linux Italy (INFN)
25 Intel/Linux New Mexico
12 Sun/Solaris Northwestern
5 Intel/Linux Columbia U.
10 Sun/Solaris Columbia U.
Used tricks to make it look like one Condor pool Flocking Glidein
2510 CPUs total
![Page 34: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/34.jpg)
34http://www.cs.wisc.edu/condor
Workers Over Time
![Page 35: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/35.jpg)
35http://www.cs.wisc.edu/condor
Nug30 solvedWall Clock Time 6 days
22:04:31 hours
Avg # Machines 653
CPU Time 11 years
Parallel Efficiency
93%
![Page 36: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/36.jpg)
36http://www.cs.wisc.edu/condor
The Football Pool Problem
![Page 37: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/37.jpg)
37http://www.cs.wisc.edu/condor
Win By Gambling
Each week, 6 games are played
The outcome of each game is
1. win2. lose3. tie
![Page 38: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/38.jpg)
38http://www.cs.wisc.edu/condor
Bet, and win $$$
• Get 5 of the 6 games correctly predicted, and you win
• What is the minimum number of predictions you must make to guarantee winning?
![Page 39: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/39.jpg)
39http://www.cs.wisc.edu/condor
Known Values
3 5
4 9
5 27
number of games minimum predictions
![Page 40: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/40.jpg)
40http://www.cs.wisc.edu/condor
Problem Description
A covering code An NP Hard problem Many years of research and effort for 6
games leads to65 < minimum number of predictions < 73
An integer programming problem Best solver is the commercial application
CPLEX
![Page 41: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/41.jpg)
41http://www.cs.wisc.edu/condor
Why the Problem is Difficult
Number of tickets possible: 6! x 36
The tree that represents the problem (and solutions) has many isomorphic branches. This makes it difficult to prune the tree.
New techniques have been developed, which leads to reducing the interval of solution
The latest and greatest does many smaller problems using MW
![Page 42: Case Studies of Using Condor for Scientists Barcelona, 2006](https://reader036.vdocuments.us/reader036/viewer/2022062315/568151b6550346895dbfe355/html5/thumbnails/42.jpg)
42http://www.cs.wisc.edu/condor
Solution! Not yet. . . The first effort (many CPU years
worth of time) had a very small error in input
Second effort is still in progress. All this to improve the lower bound
from 65 to 70, thereby reducing the range for the solution