solving big problems with condor - ii hpc sysadmins meeting

Post on 08-May-2015

290 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a long talk about the main features of Condor, and what tweaks we have added at I3A.

TRANSCRIPT

> Solving Big problems with OS: Condor > Antonio Sanz (ansanz@unizar.es) > 15 / Oct / 2012

2

> Antonio Sanz > I3A System Manager

> HERMES HPC cluster sysadmin > ansanz@unizar.es > @antoniosanzalc

3

4 I’m no SGE guy …

5

Condor – Main features

6

7

Healthy project

8 Condor Basics

Heterogeneous computing

9

Job Surveillance

10

Requirements

11 Condor Basics

Fair use of resources

3. Sistemas de gestión de colas : Condor

12

Checkpoints

13 Condor Basics

Nested jobs (DAG)

14

Easy Licensing

15

… with Hadoop, MPI, OpenMP, GPU

16

Condor Flocking

17

Grid & Cloud Computing

18

VM Universe

19

Hooks & APIs

20 Condor Basics

Flexibility

21

How Condor works

How Condor works

22

Management

[Hello, Dave]

23

Compute

* Hey!. I’m a 64K one!.

* *

24 Condor Basics

Job list ClassAd

3. Sistemas de gestión de colas : Condor

25

Resource list ClassAd

26

Matchmaking

27 Condor Basics

Priority Management

28 Data

Transfer

29 Condor Basics

3. Sistemas de gestión de colas : Condor

Job running

30

Job Monitoring

31

Job End

32

Example

33

Hello, World !!

#!/bin/sh # I’m hola.sh echo Hola mundo desde `hostname` # # A Hello World .. In Condor! # # I’m hello.sub Universe = vanilla Executable = hola.sh Log = hola.log Output = hola.out Error = hola.err Queue

34 Lanzar el cálculo

condor_submit

4. Condor Basics – Un cálculo fácil

35 Lanzar el cálculo

condor_q

36

HERMES

I3A HPC cluster

37 Condor Basics

1500 executing jobs, 40000 in queue … Lookin’ good

38

Condor Tweaks

39

Propietary Resources

40

Dynamic Partitioning

41 Condor Basics

Long Jobs

42 Condor Basics

Short Jobs

43 Condor Basics

Big Jobs

44

Advanced Accounting

45

Dynamic Checkpointing

46

Condor_ssh

Interactive Access

47 Condor Basics GPU Integration

48

Extra Bonus

Future (always work in progress)

49

HA

50

Cgroups Isolation

51 Condor Basics

Hadoop Integration

3. Sistemas de gestión de colas : Condor

52

Green Computing

53 Condor Basics

3. Sistemas de gestión de colas : Condor

Nobody’s perfect ….

54

No MPI + Dynamic Partitioning Rellenado de trabajos HA Complicada

No MPI + Dynamic Partitioning (yet)

No slot wise preemption

HA tough as nails

55 Condor Basics

3. Sistemas de gestión de colas : Condor

56 Condor Basics

> Conclusiones

3. Sistemas de gestión de colas : Condor

57

Example

58

Antonio Sanz ansanz@unizar.es @antoniosanzalc http://slideshare.net/ansanz

Slides here:

Fly like a bird with Condor Powerful. Flexible. Free.

59

Extra Bonus

60

I3A & Condor

61

Alzheimer & Dementia Diagnose

62

Tissue Modelling

63

Rare Diseases

64

Crash test simulations

65

Heart complete sim.

66 Communication Systems

67

Dynamic gaming AI

68

Autonomous robots

69

Antonio Sanz ansanz@unizar.es @antoniosanzalc http://slideshare.net/ansanz

Slides here:

Fly like a bird with Condor Powerful. Flexible. Free.

top related