distributed computing utilize unused pc resources processing complex calculations load distribution...

23
Distributed Computing Utilize unused PC resources Processing Complex calculations Load distribution 25% of storage is unused SANs 100 computers 80gb drives = 6tb unused

Upload: patrick-cain

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Distributed Computing

Utilize unused PC resources Processing

Complex calculations Load distribution

25% of storage is unused SANs 100 computers 80gb drives = 6tb unused

Page 2: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Process Sharing Applications

For large-scale computations Data analysis, data mining, scientific computing Research Problems

SETI@Home Folding@Home distributed.net Genome@Home FightAIDS@Home climate simulation Economics medicine

Page 3: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Distributed Computing

P2P is not distributed computing; similar challenges and issues from: sharing and taking advantage of resources available at endpoints and harnessing their power for computationally intensive problems

SETI@home, fightaids@home, genome@home Grid computing and e-science

Computational grids to solve/simulate real-life problems E-Science Commercial applications United Devices, Entropia, Avaki, etc.

Page 4: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Volunteer PCDownloads and executesthe application

VolunteerPC

Parameters

Client application Params. /results.

Internet

VolunteerPC

Coordinator

Distributed Computing

Dedicated Applications SETI@Home, distributed.net, Décrypthon (France)

Production applications Folding@home,

Genome@home, Xpulsar@home,Folderol, Exodus, Peer review,

Research Platforms Javelin, Bayanihan, JET, Charlotte (based on Java),

Commercial Platforms Entropia, Parabon, United Devices, Platform (AC)

A central coordinator schedules taskson volunteer computers,Master worker paradigm,Cycle stealing

Page 5: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb
Page 6: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Cycle Sharing Model Chunks of data are sent to client in suspend mode Data is processed by clients when client is not in use and returned to the master Internet-based (Master-slave) computing Example: SETI@HomeSETI@Home scans radio telescope images

Master

Bob CarolTed Alice

Data Crunching Data Crunching Data Crunching Data Crunching

RawData

RawData

ProcessedData

ProcessedData

Page 7: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

SETI@HomeMain Server

• Launched In 1996• Scientific experiment - uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI) • Distributes a screen saver–based application to users • Applies signal analysis algorithms different data sets to process radio-telescope data. • Has more than 3 million users - used over a million years of CPU time to date

Client/Server P2P

1. InstallScreen Server

Radio-telescope Data

2. SETI client (screen Saver) starts

3. SETI client getsdata from server and runs

4. Client sends resultsback to server

SETI@HOME

Page 8: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Distributed Computing: SETI@home Search for Extraterrestrial Intelligence that has over two million

computers crunching away and downloading data gathered from the Arecibo radio telescope in Puerto Rico

The SETI@Home project is widely regarded as the fastest computer in the world

In fact, the project has already performed the single largest cumulative computation to date

From the architecture point of view Seti@Home is based upon client-server

The centralised servers hold enormous amounts of data gathered from the Arecibo radio telescope "listening" to the skies

That data needs to be analysed for distinct or unusual radio waves that might suggest extraterrestrial communications

http://setiathome.ssl.berkeley.edu

Page 9: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

SETI@Home

Search for Extraterrestrial Intelligence

Page 10: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb
Page 11: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Processing

Intel’s Netbatch 10,000 workstations over 25 locations Chip design Shortened time for chip development Reduced outlay for new mainframes $500 million savings

Page 12: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Processing

Amerada Hess Connects 200 Dell PCs to handle complex

seismic data interpretation Allowed them to replace a pair of IBM

supercomputers.

“We’re running seven times the throughput at a fraction of the cost.”

Richard Ross, CIO

Page 13: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Storage

Intel Distribution of computer-based training Prevents large downloads from central servers Preserves bandwidth Preserves expensive network storage

Page 14: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Allows any node to play different roles (client, server, system infrastructure)

Request may be related to Computations or data Accept concerns

computation or data

P2Psystem

Client (PC)

request

result

Server (PC)accept

provide

PC

PCPC

PC

PC

PC

PC PC

PC

Server (PC)

accept

provide

Potential communications forparallel applications

P2P Distributed Computing

Client (PC)

request

result

A very simple problem statement but leading to a lot of research issues:scheduling, security, message passing, data storageLarge Scale enlarges the problematic: volatility, confidence, etc.

Page 15: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

“Three Obstaclesto Making P2P Distributed Computing Routine”

1) New approaches to problem solving Data Grids, distributed computing, peer-to-peer,

collaboration grids, …

2) Structuring and writing programs Abstractions, tools

3) Enabling resource sharing across distinct institutions Resource discovery, access, reservation, allocation;

authentication, authorization, policy; communication; fault detection and notification; …

Programming Problem

Systems Problem

Credit: Ian Foster

Page 16: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

P2P for Distributed Computing or Web P2P for Distributed Computing or Web ComputingComputing The distributed computing P2P applications are highlighted by the use of

millions of Internet clients to analyze data looking for extraterrestrial life (SETI@home http://setiathome.ssl.berkeley.edu/ ) and the

Newer project examining the folding of proteins ( Folding@home http://www.stanford.edu/group/pandegroup/Cosm/ ).

These are building distributed computing solutions for a special class of applications:

Those that can be divided into a huge number of essentially independent computations, and a central server system doles out separate work chunks to each participating client.

In the parallel computing community, these problems are called "pleasingly or embarrassingly parallel".

This approach is included in the P2P category because the computing is Peer based even though it does not have the "Peer only communication" characteristic of all aspects of Gnutella and Napster for information transfer.

SETI@home and Folding@home are elegantly implemented as screen savers that you download.

Page 17: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

P2P space: Distributed Computing

Distributed Collaboration Use under utilized Internet and/or network resources for improving

computation and data analysis MetaComputing,  CareScience,   DataSynapse,  Distributed.net, 

DistributedScience,  Entropia,  Parabon, The Open Lab 

Distributed Search Engines Used to easily lookup and share files and offer content management BearShare, Filetopia, Hotline Connect, InfraSearch, Plebio, Jibe, 

LimeWire,  MusicBrainz.org,  NeuroGrid,  NextPage, Redfoot, Opencola, Project Pandango  

Page 18: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Entropia Financial Modeling I

Page 19: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Entropia Financial Modeling II Each basic financial instrument can be

calculated independently Central Server interprets the total simulation Make Money or Learn what causes market

swings or ….

Page 20: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Drug Structure

Simulations

Page 21: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

United Devices also does DrugSimulation

Parameter Study: do billions of simulations – each with different parameters Search Engine like interface to simulation Works as each calculation fits in a PC – a detailed molecular model would usually not do this

Page 22: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Performance of Entropia Network

Page 23: Distributed Computing Utilize unused PC resources  Processing Complex calculations Load distribution 25% of storage is unused  SANs  100 computers 80gb

Peer to Peer P2P “Illusion” among collaborating clientsFor Napster like Services or Collaboration

Server

Server

Server

Server

Server

Server