uhd grid computing team department of computer and...

Post on 18-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

UHD Grid Computing TeamDepartment of Computer and Mathematical SciencesUniversity of Houston-Downtown

2

Outline

Grid LaboratoryArchitectureInterfaceClustersApplication Projects

3

Historical Review

An Integrated Lab Package for Introductory Computer Science Course – CSIThe Design of Phil2000

Java ApplicationHierarchical Design

Labs -> Tasks -> ActivitiesExplorerPlug-ins to run applications

4

Grid Infrastructure for Laboratory

an interface that is extensible to incorporate more lab modules and customizable to different course structures

Solution: a lab explorer an computational backbone that provides services for various lab activities

Solution: an array of servers that run on a computational grid

5

The pyramid model of the project

Interface of functional units

Client/server model

Lab modules

Module language

Architecture specification

Design theme

Application theme

Multi-agent system

6

Labs to implementTopology: Circuiting messages in a ringCollective communications: Matrix transposeGroup management: Matrix multiplication with Fox’s algorithmScientific computation: Solving linear systems with Jacobi’s algorithmCombinatorial search: Traveling salesman problemParallel I/O: Vector processing - SummationPerformance analysis: Visualization with Upshot – Trapezoidal rule problemParallel library: Solving linear system with ScaLapackScalability analysis: Bitonic sortingLAN configuration: The use of NICs and hubsNetwork analysis: Monitoring a chat roomAddress resolution: Experiment with ARP burstIP masquerading: Clustered web serversWAN configuration: The use of routersPerformance tuning: Deal with congestionService configuration: The configuration of a networked file system:

7

The Main Menu Frame

End

ScreensCards

Tree Panel Upper Toolbar

Lower Toolbar

Save

Open

Print

Next

Previous

Last

Help First

The Lab Layout

8

Frame title & icon Screen

menu

Tree

Upper ToolBar

Screen panel

ScrollPane LowerToolBar

Button The Main Menu

9

Open and Close Menus

Open or Save lab Massage

The file *.mla

10

Print

Print layoutframe

Print Dialog

Data Table

Print Table

Print Dialog

Scroll Pane

11

Help Menu

Help Frame

12

Other Software used

Visual C++

MPICH

JumpShot

13

Node Calling and Services Menu

Services menu

Scroll pane

RadioButton

Text areaText

field

Scroll Menu

Node CallingMenu

14

An Example

Performance evaluation of parallel programs

15

Performance Analysis - Activities

a. Performance Prediction b. Compilation and execution c. Profiling

16

Khoi Nguyeno Construct small cluster/NOW.o CMS lab cluster (16 nodes)o Linux Beowulf Cluster functionalo Successfully run all sorting algorithms

with expected results.o Create a GUI interface as an overlay

using JAVA.

http://www.netlib.org/utk/papers/mpi-book/node2.html

17

16 Nodes RedHat 9.0 as Operating SystemConfigure rsh in root and user directoriesInstall MPICH ver. 1.2.6 (Unix- all flavors)Test with MPICH test programs & sorting program

18

The Cluster

19

Test results – Quick sortQ u i c k s o r t f o r 1 0 , 0 0 0 e l e me n t s

0

0 . 0 1

0 . 0 2

0 . 0 3

0 . 0 4

0 . 0 5

P r o c e s s o r s

0 . 0 4 2 0 . 0 1 4 5 0 . 0 1 2 9 0 . 0 1 5 8 0 . 0 1 9

1 p 2 p 4 p 8 p 1 6 p

Q u i c k s o r t f o r 1 0 0 , 0 0 0 e l e me n t s

0

0 . 0 5

0 . 1

0 . 1 5

0 . 2

0 . 2 5

P r o c e s s o r s

0 . 1 6 2 0 . 1 8 1 0 . 1 9 8 0 . 1 8 9 0 . 2 2 4

1 p 2 p 4 p 8 p 1 6 p

Q u i c k s o r t f o r 1 , 0 0 0 , 0 0 0 e l e me n t s

0

0 . 5

1

1 . 5

2

P r o c e s s o r s

1 . 4 9 1 . 6 2 1 . 5 1 1 . 3 4 1 . 3 4

1 p 2 p 4 p 8 p 1 6 p

Q u i c k s o r t f o r 1 0 , 0 0 0 , 0 0 0 e l e me n t s

0

5

1 0

1 5

2 0

P r o c e s s o r s

1 6 . 0 8 5 1 4 . 6 8 1 4 . 5 8 1 4 . 1 5 1 2 . 4 9

1 p 2 p 4 p 8 p 1 6 p

20

Test results – Merge sortM e r g e So r t - 1 0 , 0 0 0 e l e me n t s

0

0 . 0 1

0 . 0 2

0 . 0 3

0 . 0 4

0 . 0 5

0 . 0 6

P r o c e s s o r s

0 . 0 4 9 1 0 . 0 4 3 4 0 . 0 3 0 2 0 . 0 3 1 0 . 0 2 9

1 p 2 p 4 p 8 p 1 6 p

M e r g e So r t - 1 0 0 , 0 0 0 e l e me n t s

0

0 . 0 5

0 . 1

0 . 1 5

0 . 2

0 . 2 5

0 . 3

P r o c e s s o r s

0 . 2 4 8 0 . 2 2 3 0 . 1 7 9 0 . 1 4 7 0 . 1 5 5

1 p 2 p 4 p 8 p 1 6 p

M e r g e s o r t - 1 0 , 0 0 0 , 0 0 0 e l e me n t s

0

1 0

2 0

3 0

4 0

P r o c e s s o r s

3 1 . 7 5 2 5 . 7 2 1 7 . 8 9 1 3 . 8 5 1 2 . 1 4

1 p 2 p 4 p 8 p 1 6 p

M e r g e s o r t P e r f o r ma n c e I n c r e a s e f o r 1 , 0 0 0 , 0 0 0 e l e me n t s

0

0 . 2

0 . 4

0 . 6

0 . 8

1

1 . 2

1 . 4

P r o c e s s o r s

1 6 . 5 9 % 5 1 . 7 0 % 9 9 . 2 5 % 1 1 8 . 8 5 %

1 p 2 . 6 7 2 p 4 p 8 p 1 6 p

21

GUI will be basic window appletAction buttons

IntroductionRun your MPI programRun Demo programs

Sorting ProgramsDistribution sample programs

HelpOutput Window

22

Finished GUI - Introduction

23

Open your MPI file!

24

Compile & Build your MPI Program!

25

Running your MPI program!

26

Program Running in Console…

27

ABSTRACTThe construction and performance of computer clusters

running different operating systems is studied. A platforms Windows XP cluster and a Linux ‘Beowulf’ cluster needed to be constructed to conduct a time-based analysis. Details on construction, configuration, and performance between the clusters are discussed.

INTRODUCTIONThe typical Von-Neumann architecture has directed us to

increase processing power via increased transistors, addressing space, and physical memory. However, a more efficient way is through message-passing between multiple processors. The concept of message-passing is to achieve parallelism through a function that explicitly transmits data from one process to another. Message Passing Interface (MPI) is simply a “library” of functions that can be called from C/C++ and FORTRAN programs. MPI programs make use of multiple processors by assigning each processor a task. Each processor works in parallel with anotherprocessor where one sends a packet of data and one receives.

MPI programs are designed to operate most efficiently on multiple processors. They are used widely on Scalable Parallel Computers (SPCs) and Networks of Workstations (NOWs). A ‘cluster’ is simply a collection computers (2 or more) working in parallel to accomplish a given task. Here, two different clusters were constructed to run different sorting algorithms and sample MPI programs. For convenience, a Java applet was also developed to launch these programs. Subsequently, cluster construction and performance results are discussed.

Khoi Nguyen, Computer & Mathematical Sciences, University of Houston – DowntownAdvisor: Dr. Hong Lin

Fall 2004CLUSTER CONSTRUCTION

XP Cluster

2 nodes: AMD Athlon 1.33GHz and Pentium III 850MHz w/ 512MB system memory were linked via a Router/Switch (see Figure 2).

Router/ Switch

Linux Beowulf Cluster

16 nodes: (15) Pentium II 350Mhz w/ 128MB system memory and (1) server node: Pentium 550Mhz w/ 256MB system memory. All nodes were linked via 10/100Mbps Ethernet LAN switch. A KVM switch was installed for only 4 nodes (See Figure 3).

SORTING ALGORITHMSParallel implementations of Merge-sort (O(log2n)) and

Bitonic-sort (O((log2n))2/2) were used to conduct the time-based analysis. Serial implementations were also incorporated to serve as control variables.

XP CLUSTER CONFIGURATIONThe configuration for this cluster required an older protocol – NETBeui, but

the more widely used protocol today is TCP/IP. The MPICH installation was mirrored on each node, and user information and passwords must be identical, and the executable program file must be in the same location on each node. Either node could function as the server at the user’s discretion; moreover, whatever node launched the program, becomes the server. MPICH ran processes in a ‘round-robin’fashion.

BEOWULF CLUSTER CONFIGURATIONThis architecture requires the installation of a Linux distribution on each node.

One node functions as a server where the user interacts directly. The rest of the nodes serve as computational slaves (see Figure 3). Fedora (latest Red Hat) was installed on each node. Remote Shell or ‘rsh’ was used for communication between server and nodes. Each node was configured the same way – differing in IP and hostnames. The latest MPICH distribution for UNIX was installed to each node to the same directory. Sample programs included in the distribution were tested on 8 nodes.

Figure 3Figure 2

Figure 1

Workstations

LAN Switch

KVM Switch

0.001.142.283.424.565.706.847.989.12

10.2611.4012.5413.6814.8215.9617.1018.2419.3820.5221.6622.8023.9425.0826.2227.3628.5029.6430.7831.9233.0634.2035.3436.48

10000 100000 1000000 10000000

ELEMENTS

Tim

e (s

)

Serial

Parallel

XP Cluster – Mergesort – 2 Processors

0.942

0.65

0.468

0.3670.315

0.356 0.354

0.257 0.229

0.307

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1p 2p 3p 4p 5p 6p 7p 8p 9p 10p

PROCESSORS

TIM

E (s

)

CPI Test Program - Beowulf

TO BE CONTINUED…Inconclusive and inconsistent data drawn from the

XP cluster has led me to choose the Beowulf design. MPI programs are competing for resources under XP, so priority scheduling is required for running MPI programs efficiently under XP. This is a rather cumbersome and inconvenient process; moreover the Beowulf cluster offers more flexibility in configuration. The XP cluster has been abandoned and continued research is currently being conducted on the Beowulf cluster.

Server

REFERENCES

Pacheco, Peter. Pacheco, Peter. Parallel Programming with MPI.Parallel Programming with MPI.Morgan Kaufmann, 1997.Morgan Kaufmann, 1997.

Parallel Programming with MPI. Parallel Programming with MPI. Pacheco, Peter. Pacheco, Peter. 2002. <http://2002. <http://www.cs.usfca.edu/mpiwww.cs.usfca.edu/mpi/>/>

28

External Support

NSF Major Research Instrumentation Grant2 more clusters, each with 16 nodesMaster server node to distribute applicationsExternal storage

29

2nd & 3rd Cluster Configuration

Danil Safin and Hooman HemmatiSharing Internet connection using FirestarterNetwork configuration and booting on LANEnable remote/secure shells and allow passwordless loginSharing files with Network File SystemInstall and configure MPICH2

30

Real-Time Intelligent Agent Traffic Controllers

OutlineProblemGoals Traffic modelSimulation test bedPreliminary results

31

ProblemCoordinate a network of traffic lights in an inner city streets such as downtown can be very challengingThe fact that traffic configurations change constantly One might have to wait for the light to turn green at an intersection where there is no car on the cross street How long to keep the light green at a busy street?Timed traffic lightsSensors

32

Goals

Develop a traffic model that can manage a network of traffic lights efficiently by maximize the traffic flows and minimize the time of traffic flowsConstruct a visual simulation test bed to verify the our model and perform comparisons to other models

33

Real-Time Intelligent Agent Traffic Controller Model

Consists of two types of agents

Communication Agents --send and receive traffic information from and to its neighbors

Computation Agents --make decision to keep or change the current traffic light based on the information from Communication Agent

34

Communication Agents

Send and receive traffic informationNumber of cars passing at the intersectionNumber of cars waiting at the intersectionTime of light has stay greenWaiting time for a red light to change to greenAverage speed of current traffic

35

Computation Agents

Establish rules from traffic informationConvert to fuzzy rulesConstruct fuzzy controller

36

Simulation Test Bed

Implement clusters where each node serves as a traffic light at an intersectionRules for moving cars

37

SimulationsCity of 10 horizontal and 10 vertical streets, each with 3 lanes, containing a total of 100 traffic lights. Tests were run with maximum cars being 1500 (heavy traffic), 1000 (moderate) and 500 (light traffic). The cars entered the city map randomly, with no preference for any street or direction. Two Traffic models were compared:

the timed traffic lights with a checkerboard pattern (no adjacent intersections have the same light), Simple traffic controllers. Each controller based its decision on

the amounts of cars on the two sides of the intersection, the maximum waiting time that any car can be made to wait at a red light.

The data gathered was the mean speed of a single car and the average of all cars' mean speed.

38

Simulations

Intelligent Agent Traffic ControllersStandard Traffic Controllers

39

Preliminary Results

Two models:Standard traffic lightsSimple intelligent agent controlled traffic lights

Measured average speed of all cars IA traffic lights are 20% to 90% more efficient than standard lights

40

Intelligent Traffic Control by Agents

41

Data Mining

42

Data Mining Results

Gabriel Williams: Text mining on clustersDanil Safin: HTML document preprocessing

Datamining Runtime

0

100

200

300

1 8 16

# of Nodes

Runt

ime

Datamining Speedup

0

1

2

3

4

1 8 16

# of Nodes

Spee

dup

43

E-Learning Agent System Design

System arranged in a hierarchical tree structureTwo entry points into the system

Command lineNetwork socket – to service web requests

44

E-Learning Agent System Design (continued)

Maintenance Agent

Control Agent

Student Information Agent

Instructor

Notification and Recommendation Agent

Control Agent Control Agent

Master Control Agent

Link DB

Student Registrar

Student DB

45

The Agents Developed For This Project

Master Control AgentMain entry point into the system

Control AgentMaintenance Agent

InstructorNotification and Recommendation Agent

StudentStudent Information Agent

Registrar

46

Implementing MPI into the Agent System

MPI provides communication mechanism for the agentsPacks data into MPI data structureUses MPI_Send and MPI_RecvCommon for all agentsMPI Limitations

Process creation limitationEliminates 1:Many relationship between control and information agents

47

E-Learning System Agent Code Architecture

Data StructuresJOBINPUT JOB

struct JOB{int cmdSource;int cmdDestination;int cmdCode;int cmdResult;JOB_SOURCE ioSource;Socket* jobSocket;int messageLength;char messageInfo[MAX_STRING_SIZE];

};

struct INPUTJOB{int jobType;int cmdCode;Socket* jobSocket;char commandText[MAX_STRING_SIZE];

};

48

E-Learning System Agent Code Architecture (continued)

Class baseAgentContains MPI communication and process creation mechanismsAll agent classes inherit from this class

Agent classesDemos for each agent class represented

WEBAGENT.EXECGI program used to communicate with the Agent system from a web server

49

Client/Server Web Interface

The ultimate goal of this project is to formulate a formal system for creating multi-agent systems (MAS) so that one no longer has to rely on the use of a high level specification language. This will be accomplished by creating a gamma calculus parser and running the parser on a prototype to formulate a method for a formal system of creating multi-agent systems. As it stands, a prototype E-Learning MAS has been created and a preliminary Beliefs-Desires-Intentions (BDI) model, using argumentation based negotiation, has been created.

Multi-Agent Course-Scheduling System

Methods of implementation are as follows:1.First it was imperative to create a model MAS to run the calculus parser on. The chosen model was an E-Learning Environment MAS. This model was built using four main agents to distribute tasks; Master-Control Agent, Student Agent, Instructor Agent, Registrar Agent. These agents will handle registration and enrollment of students and the managing of course content.2.Second is to create advanced logic to run with these agents. The logic chosen was argumentation based negotiation. Using this with a BDI model, the agents would argue among themselves to achieve their particular goals. What is important about this model is each agent will argue for its beliefs and if other agents are coerced, they will create a compromise among themselves.3.The third point is to create a gamma-calculus parser to run on the MAS created. This will allow data to be collected and interpreted to formulate a method for the development of a formal system of creating multi-agent systems.

Methods

Successful completion of the multi-agent system prototype has been accomplished. Using MPI, the MAS divides tasks and sends the task to be accomplished by the appropriate agent. This system runs concurrently with a server/client socket structure. The Master-Control agent handles information received from the server socket which waits for a client on the same machine to communicate. This client gathers its information through use of Apache and Java Server pages. Thus the client of this system is a simple web page in which a user enters data to be used by the MAS.

The argumentation based negotiation logic with the BDI model has been preliminarily created. The rough prototype successfully integrates desires in which the registrar argues beliefs of classes that cannot be taught and classes that must be taught. When the system begins, the instructor argues what class it wants and the registrar responds, arguing what changes it may need to make. This is similar to what goes on with the student and registrar, in which the student has a list of desired classes and must argue to allow the registrar to accept its proposal. Current work is to make it more complex and add a visual aid to the program. All of this was done using Jason with agent-speak.

Results

55

Virus particle reconstruction from cryo-electron microscopy• Step 1: extract individual particle images from cryo-

electron micrographs or CCD images.• Step 2: determine orientations • Step 3: 3-D reconstruction. Execute Steps 2 and 3

repeatedly until convergence.• Step 4: dock atomic model into 3D density map

Z

X

Y

refinement

Step 1 Step 2 Step 3 Step 4

56

Challenges

Increase resolution from 20-30 Å to 5 Å.Increase number of projections, N from few hundreds to several thousand.Increase the size of pixel frames from about 1502 to 3002.

57

Computational challenges

Assuming:2,000 projections each of size 300 × 300 pixels.Then:we will solve at most 3 × 3002 = 270,000 linear least squares problems each having 2000 equations and 300 unknowns. The number of arithmetic operations:270,000 × 2000 × 3002 = O(5 × 1013)

58

Test on UHD cluster

P3DR Runtime

0

500

1000

1500

1 2 4 8 16

# of Nodes

Runt

ime

P3DR Speedup

0

2

4

6

8

1 2 4 8 16

# of Nodes

Spee

dup

59

60

61

Contact Information:Hong Lin (LinH@uhd.edu)

top related