![Page 1: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/1.jpg)
Grid computing: An introduction
Lionel Brunie
National Institute of Applied Science (INSA)LIRIS Laboratory/DRIM Team – UMR CNRS 5205
Lyon, France
http://liris.cnrs.fr/lionel.brunie
![Page 2: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/2.jpg)
And comparisons must bemade among many
We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns
A Brain is a Lot of Data!
(Mark Ellisman, UCSD)
![Page 3: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/3.jpg)
Data Intensive Physical Sciences
• High energy & nuclear physics• Simulation
– Earth observation, climate modeling– Geophysics, earthquake modeling– Fluids, aerodynamic design– Pollutant dispersal scenarios
• Astronomy- Digital sky surveys: modern telescopes produce over 10 Petabytes per year by 2008 !
• Molecular genomics• Chemistry and biochemistry• Financial applications• Medical images
![Page 4: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/4.jpg)
Performance evolution of computer components
• Network vs. computer performance– Computer speed doubles every 18 months– Network speed doubles every 9 months– Disk capacity doubles every 12 months
• 1986 to 2000– Computers: x 500– Networks: x 340,000
• 2001 to 2010– Computers: x 60– Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
![Page 5: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/5.jpg)
Conclusion: invest innetworks !
![Page 6: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/6.jpg)
Hansel and Gretel are lost in the forest of definitions
• Distributed system• Parallel system• Cluster computing• Meta-computing• Grid computing• Peer to peer computing• Global computing• Internet computing• Network computing• Cloud computing
![Page 7: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/7.jpg)
Distributed system
• N autonomous computers (sites): n administrators, n data/control flows
• an interconnection network
• User view: one single (virtual) system– «A distributed system is a collection of independent computers that
appear to the users of the system as a single computer » Distributed Operating Systems, A. Tanenbaum, Prentice Hall, 1994
• « Traditional » programmer view: client-server
![Page 8: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/8.jpg)
Parallel System
• 1 computer, n nodes: one administrator, one scheduler, one power source
• memory: it depends
• Programmer view: one single machine executing parallel codes. Various programming models (message passing, distributed shared memory, data parallelism…)
![Page 9: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/9.jpg)
Examples of parallel system
CPU CPU
Memory
network
CPU
CPU CPU CPU
Memory
network
CPU
CPU CPU CPU
Memory
network
CPU
CPU
Interconnection network
Periph.
A CC-NUMA architecture
CPU
CPU
CPUMemory
network
Memory
Memory
A shared nothing architecture
![Page 10: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/10.jpg)
Cluster computing
• Use of PCs interconnected by a (high performance) network as a parallel (cheap) machine
• Two main approaches– dedicated network (based on a high performance
network: Myrinet, SCI, Infiniband, Fiber Channel...)– non-dedicated network (based on a (good) LAN)
![Page 11: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/11.jpg)
Where are we today ?
• A source for efficient and up-to-date information: www.top500.org
• The 500 best architectures
• N° 1: 1,8 (2,3) Pflops ! N° 500: 20 Tflops • Sum (1-500) = 20 Pflops
• 1 Flops = 1 floating point operation per second• 1 TeraFlops = 1000 GigaFlops – 1 Pflops = 1000
TeraFlops
![Page 12: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/12.jpg)
How it grows ?
• in 1993 (prehistoric times!)– n°1: 59.7 GFlops– n°500: 0.4 Gflops– Sum = 1.17 TFlops
• in 2004 (yesterday)– n°1: 70 TFlops (x1118)– n°500: 850 Gflops (x2125)– Sum = 11274 Tflops and
408629 processors
![Page 13: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/13.jpg)
2007/11 best: http://www.top500.org/http://www.top500.org/
Peak: 596 Tflops !!!
http://www.top500.org/
![Page 14: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/14.jpg)
2008/11 best: http://www.top500.org/http://www.top500.org/
Peak: 1457 Tflops !!!
http://www.top500.org/
![Page 15: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/15.jpg)
2009/11 best: http://www.top500.org/http://www.top500.org/
Peak: 2331 Tflops !!!
http://www.top500.org/
![Page 16: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/16.jpg)
Performance evolution
![Page 17: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/17.jpg)
Projected performance
![Page 18: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/18.jpg)
Architecture distribution
![Page 19: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/19.jpg)
Interconnection network distribution
![Page 20: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/20.jpg)
NEC earth simulator (1st en 2004 ; 30th in 2007)
Single stage crossbar: 2700 km of cables
700 TB disk space1.6 PB mass storageArea: 4 tennis courts, 3 floors
A MIMD withDistributed Memory
![Page 21: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/21.jpg)
NEC Earth Simulator
![Page 22: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/22.jpg)
![Page 23: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/23.jpg)
BlueGene
• 212992 processors – 3D torus
• Rmax = 478 Tflops ; Rpeak = 596 Tflops
![Page 24: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/24.jpg)
RoadRunner• 3456 nodes (18 clusters) - 2 stage fat tree Infiniband (optical)• 1 node= 2 AMD Opteron DualCore + 4 IBM PowerXCell 8i• Rmax = 1.1Pflops ; Rpeak = 1.5Pflops• 3,9 MW (0,35 Gflops/W)
![Page 25: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/25.jpg)
Jaguar• 224162 cores – Memory: 300 TB – Disk: 10 PB
• AMD x86_64 Opteron Six Core 2600 MHz (10.4 GFlops) • Rmax = 1759 – Rpeak = 2331• Power: 6,950 MW• http://www.nccs.gov/jaguar/
![Page 26: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/26.jpg)
Tianhe-1A
• 186 368 cores – Memory: 229 TB – Disk: 10 PB• 14336 Intel EM64T Xeon X56xx (Westmere-EP) 2930 MHz (11.72
GFlops) + 7168 NVidia GPU Tesla M2050• Rmax = 2566 – Rpeak = 4701• Power:4,04 MW… only !
![Page 27: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/27.jpg)
Network computing
• From LAN (cluster) computing to WAN computing
• Set of machines distributed over a MAN/WAN that are used to execute parallel loosely coupled codes
• Depending on the infrastructure (soft and hard), network computing is derived in Internet computing, P2P, Grid computing, etc.
![Page 28: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/28.jpg)
Meta computing (beginning 90’s)
• Definitions become fuzzy...• A meta computer = set of (widely) distributed (high
performance) processing resources that can be associated for processing a parallel not so loosely coupled code
• A meta computer = parallel
virtual machine over a
distributed system Cluster of PCs
SAN
SAN
Cluster of PCs
LAN
WAN
SupercomputerVisualization
![Page 29: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/29.jpg)
Internet computing
• Use of (idle) computer interconnected by Internet for processing large throughput applications
• Ex: SETI@HOME – 5M+ users since launching– 2009/11: 930k users, 2.4M computers; 190k
active users, 278k active computers, 2M years of CPU time
– 234 « countries »– 1021 floating point operations since 1999– 769 Tflops! – BOINC infrastructure (Décrypthon, RSA-
155…)
• Programmer view: a single master, n servants
![Page 30: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/30.jpg)
Global computing
• Internet computing on a pool of sites
• Meta computing with loosely coupled codes
• Grid computing with poor communication facilities
• Ex: Condor (invented in the 80’s)
![Page 31: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/31.jpg)
Peer to peer computing
• A site is both client and server: servent
• Dynamic servent discovery by « contamination »
• 2 approaches: – centralized management: Napster, Kazaa, eDonkey…– distributed management: Gnutella, KAD, Freenet,
Bittorrent…
• Application: file sharing
![Page 32: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/32.jpg)
Grid computing (1)
“Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organisations” (I. Foster)
![Page 33: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/33.jpg)
Grid computing (2)
• Information grid– large access to distributed data (the Web)
• Data grid– management and processing of very large
distributed data sets
• Computing grid– meta computer
![Page 34: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/34.jpg)
Parallelism vs grids: some recalls
• Grids date back “only” 1996 • Parallelism is older ! (first classification in 1972)
• Motivations:– need more computing power (weather forecast, atomic
simulation, genomics…)– need more storage capacity (Petabytes and more)– in a word: improve performance ! 3 ways ...
Work harder --> Use faster hardware
Work smarter --> Optimize algorithms
Get help --> Use more computers !
![Page 35: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/35.jpg)
The performance ? Ideally it grows linearly
• Speed-up:
– if TS is the best time to process a problem sequentially,
– then the parallel processing time should be TP=TS/P with P processors
– speedup = TS/TP
– the speedup is limited by Amdhal law: any parallel program has a purely
sequential and a parallelizable part TS= F + T//,
– thus the speedup is limited: S = (F + T//) / (F + (T///P)) < P
• Scale-up:
– if TPS is the time to solve a problem of size S with P processors,
– then TPS should also be the time to process a problem of size n*S with n*P
processors
![Page 36: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/36.jpg)
Grid computing
![Page 37: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/37.jpg)
Starting point
• Real need for very high performance infrastructures
• Basic idea: share computing resources– “The sharing that the GRID is concerned with is not primarily
file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource-brokering strategies emerging in industry, science, and engineering” (I. Foster)
![Page 38: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/38.jpg)
Applications
• Distributed supercomputing
• High throughput computing
• On demand (real time) computing
• Data intensive computing
• Collaborative computing
![Page 39: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/39.jpg)
An Example Virtual Organization: CERN’s Large Hadron Collider
Worldwide LHC Computing Grid (WLCG)8000 Physicists, 170 Sites, 34 Countries
15 PB of data per year; 100,000 CPUs
![Page 40: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/40.jpg)
Why Grid Computing (CERN opinion) ?
• The answer is "money"... In 1999, the "LHC Computing Grid" was merely a concept on the drawing board for a computing system to store, process and analyse data produced from the Large Hadron Collider at CERN. However when work began on the design of the computing system for LHC data analysis, it rapidly became clear that the required computing power was far beyond the funding capacity available at CERN.
• On the other hand, most of the laboratories and universities collaborating on the LHC had access to national or regional computing facilities.
• The obvious question was: Could these facilities be somehow integrated to provide a single LHC computing service? The rapid evolution of wide area networking—increasing capacity and bandwidth coupled with falling costs—made it look possible. From there, the path to the LHC Computing Grid was set.
![Page 41: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/41.jpg)
Why Grid Computing (CERN opinion) ?Additional benefits
• Multiple copies of data can be kept in different sites, ensuring access for all scientists involved, independent of geographical location.
• Allows optimum use of spare capacity for multiple computer centres, making it more efficient.
• Having computer centres in multiple time zones eases round-the-clock monitoring and the availability of expert support.
• No single points of failure.
• The cost of maintenance and upgrades is distributed, since individual institutes fund local computing resources and retain responsibility for these, while still contributing to the global goal.
• Independently managed resources have encouraged novel approaches to computing and analysis.
• So-called “brain drain”, where researchers are forced to leave their country to access resources, is reduced when resources are available from their desktop.
• The system can be easily reconfigured to face new challenges, making it able to dynamically evolve throughout the life of the LHC, growing in capacity to meet the rising demands as more data is collected each year.
• Provides considerable flexibility in deciding how and where to provide future computing resources.
• Allows community to take advantage of new technologies that may appear and that offer improved usability, cost effectiveness or energy efficiency.
![Page 42: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/42.jpg)
LCG System Architecture• A 4 layers Computing Model
– Tier-0: CERN: accelerator• Data Acquisition and Reconstruction • Data Distribution to Tier-1 (~online)
– Tier-1• 24x7 Access and Availability, • Quasi-online data Acquisition • Data Service on the Grid• “Heavy” Analysis of the data• ~10 countries
– Tier-2• Simulation• Final User, Analysis of the data (batch and
interactive modes)• ~40 Countries
– Tier-3• Final User, Scientific analysis
Tier-0(1)
Tier-1(11)
Tier-2(160)
« Tier-3 » End User
LHCLHC40 millions collisions per second40 millions collisions per second
~100 interesting collisions per second after filtering~100 interesting collisions per second after filtering1-10 MB of data per collision1-10 MB of data per collisionAcquisition rate: 0.1 to 1 GB/secAcquisition rate: 0.1 to 1 GB/sec
1010 collisions recorded every year 1010 collisions recorded every year ~10 PBytes/year~10 PBytes/year
![Page 43: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/43.jpg)
LCG System Architecture (Cont’d)T
ier-
1
Tier-0
10 Gbps linksOptical Private Network
(to almost all sites)
Trigger and Data
Acquisition System
Tie
r-2
General Purpose/Academic/Research Network
From F. Malek – LCG France
![Page 44: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/44.jpg)
Back to roots (routes)
• Railways, telephone, electricity, roads, bank system
• Complexity, standards, distribution, integration (large/small)
• Impact on the society: how US grown
• Important differences: – clients (the citizens) are NOT providers (states or companies)– small number of actors/providers– small number of applications– strong supervision/control
![Page 45: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/45.jpg)
Computational grid
• “Hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” (I. Foster)
• Performance criteria:– security– reliability– computing power– latency– throughput– scalability– services
![Page 46: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/46.jpg)
Grid characteristics
• Large scale• Heterogeneity• Multiple administration domain• Autonomy… and coordination• Dynamicity• Flexibility• Extensibility• Security
![Page 47: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/47.jpg)
Levels of cooperation in a computing grid
• End system (computer, disk, sensor…)– multithreading, local I/O
• Cluster– synchronous communications, DSM, parallel I/O
– parallel processing
• Intranet/Organization– heterogeneity, distributed admin, distributed FS and databases
– load balancing
– access control
• Internet/Grid– global supervision
– brokers, negotiation, cooperation…
![Page 48: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/48.jpg)
Basic services
• Authentication/Authorization/Traceability
• Activity control (monitoring)
• Resource discovery
• Resource brokering
• Scheduling
• Job submission, data access/migration and execution
• Accounting
![Page 49: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/49.jpg)
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
From I. Foster
![Page 50: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/50.jpg)
Resources
• Description
• Advertising
• Cataloging
• Matching
• Claiming
• Reserving
• Checkpointing
![Page 51: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/51.jpg)
Resource management (1)
• Services and protocols depend on the infrastructure
• Some parameters– stability of the infrastructure (same set of resources or not)– freshness of the resource availability information– reservation facilities– multiple resource or single resource brokering
• Example of request: I need from 10 to 100 CE each with at least 512 MB RAM and a computing power of 150 Mflops
![Page 52: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/52.jpg)
Resource management and scheduling (1)
• Levels of scheduling– job scheduling (global level ; perf: throughput)– resource scheduling (perf: fairness, utilization)– application scheduling (perf: response time, speedup, produced data…)
• Mapping/Scheduling process– resource discovery and selection– assignment of tasks to computing resources– data distribution– task scheduling on the computing resources– (communication scheduling)
![Page 53: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/53.jpg)
Resource management and scheduling (2)
• Individual perfs are not necessarily consistent with the global (system) perf !
• Grid problems– predictions are not definitive: dynamicity !– Heterogeneous platforms– Checkpointing and migration
![Page 54: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/54.jpg)
GRAM GRAM GRAM
LSF Condor NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
A Resource Management System Example (Globus)
NQE: Network Queuing Env.(batch management; developedby Cray Research
LSF: Load Sharing Facility(task scheduling and load balancing; Developed by Platform Computing)
Resource Specification Language
![Page 55: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/55.jpg)
Resource information (1)
• What is to be stored ?– virtual organizations, people, computing resources, software packages,
communication resources, event producers, devices…– what about data ???
• A key issue in such dynamics environments
• A first approach : (distributed) directory (LDAP)– easy to use– tree structure– distribution– static– mostly read ; not efficient updating– hierarchical– poor procedural language
![Page 56: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/56.jpg)
Resource information (2)
• Goal:– dynamicity– complex relationships– frequent updates– complex queries
• A second approach: (relational) database
![Page 57: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/57.jpg)
Programming on the grid: potential programming models
• Message passing (PVM, MPI)• Distributed Shared Memory• Data Parallelism (HPF, HPC++)• Task Parallelism (Condor)• Client/server - RPC• Agents• Integration system (Corba, DCOM, RMI)
![Page 58: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/58.jpg)
Program execution: issues
• Parallelize the program with the right job structure, communication patterns/procedures, algorithms
• Discover the available resources
• Select the suitable resources
• Allocate or reserve these resources
• Migrate the data
• Initiate computations
• Monitor the executions ; checkpoints ?
• React to changes
• Collect results
![Page 59: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/59.jpg)
Data management
• It was long forgotten !!!• Though it is a key issue !• Issues:
– indexing– retrieval– replication– caching– traceability– (auditing)
• And security !!!
![Page 60: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/60.jpg)
Bruni, pas BruniE !!!
![Page 61: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/61.jpg)
From computing grids to information grids
![Page 62: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/62.jpg)
From computing grids to information grids (1)
• Grids lack most of the tools mandatory to share (index, search, access), analyze, secure, monitor semantic data (information)
• Several reasons:– history– money– difficulty
• Why is it so difficult ?– Sensitivity but openness– Multiple administrative domains, multiple actors, heterogeneousness but a
single global architecture/view/system– Dynamicity and unpredictability but robustness– Wideness but high performance
![Page 63: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/63.jpg)
From computing grids to information grids (2) ex: Replica Management Problem
• Maintain a mapping between logical names for files and collections and one or more physical locations
• Decide where and when a piece of data must be replicated
• Important for many applications
• Example: CERN high-level trigger data– Multiple Petabytes of data per year– Copy of everything at CERN (Tier 0)– Subsets at national centers (Tier 1)– Smaller regional centers (Tier 2)– Individual researchers have copies of pieces of data
• Much more complex with sensitive and complex data like medical data !!!
![Page 64: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/64.jpg)
From computing grids to information grids (3):some (still…) open issues
• Security, security, security (incl. privacy, monitoring, traceability…)) at a semantic level
• Access protocols (incl. replication, caching, migration…)• Indexing tools• Brokering of data (incl. accounting) • (Content-based) Query optimization and execution• Mediation of data• Data integration, data warehousing and analysis tools• Knowledge discovery and data mining
![Page 65: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/65.jpg)
Functional View of Grid Data Management
Location based ondata attributes
Location of one ormore physical replicas
State of grid resources, performance measurements and predictions
Metadata Service
Application
Replica LocationService
Information Services
Planner:Data location, Replica selection,Selection of compute and storage nodes
Security and Policy
Executor:Initiates data transfers and computations
Data Movement
Data Access
Compute Resources Storage Resources
A bit simplistic…
![Page 66: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/66.jpg)
Grid Security (1): Why Grid Security is Hard
• Used resources may be extremely valuable & the problems to be solved extremely sensitive
• Resources are located in distinct administrative domains– Each resource has its own policies & procedures
• Users are diverse
• The set of resources used by a single computation may be large, dynamic, and/or unpredictable
– Not just client/server
• The security service must be broadly available & applicable– Standard, well-tested, well-understood protocols– Integration with wide variety of tools
![Page 67: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/67.jpg)
Grid security (2): Requirements
• Authentication• Authorization and Delegation of authority• Assurance• Accounting • Auditing and Monitoring• Traceability• Integrity and Confidentiality (ACID properties)
![Page 68: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/68.jpg)
Access to data and Mediation
• Ciel, where are the data ?• Use case: Italian tourist – heart accident in Lyon• Data inside the grid # data at the side of the grid !• Basic idea
– use of metadata/indexes. Pb: indexes are (sensitive) information
• Alternative – encrypted indexes, use of views, proxies
• Mediation– no single view of the world mechanisms for interoperability,
ontologies
• Negotiation: a key open issue
![Page 69: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/69.jpg)
CachingCaching
• Motivation:– Collaborative caching is proved to be efficient– Each institution wants to control the access to its data– No standard exists in Grids for caching
• Proposal:– on demand caching– a two-level cache: local caches and a global virtual cache– use metadata to collaborate / index data
![Page 70: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/70.jpg)
Query optimization and execution
• Old wine in new bottles ?• Yes and no: it seems the problem has not changed but the
operational context has so changed that classical heuristics and methods are not more pertinent
• Key issues: – Dynamicity
– Unpredictability
– Adaptability
• Very few works have specifically addressed this problem
![Page 71: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/71.jpg)
An application example:GGM
Grille Geno-Médicale
![Page 72: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/72.jpg)
An application example: GGM Biomedical grids
• Biomedical applications are perfect candidates for gridification:– Huge volumes of data (an hospital = several TB per year)– Dissemination of data – Collaborative work (health networks)– Very hard requirements (e.g. response time)
• But– Partially structured semantic data– Very strong privacy issues
→ a perfect play field for researchers !
![Page 73: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/73.jpg)
An application example: GGM Motivation (1)
• Dissemination of new “high bandwidth” technologies in genome and proteome research (e.g. micro-arrays)– huge volume of structural (gene localization)– functional (gene expression) data
• Generalization of digital patient files and digital medical images
• Implementation of (regional and inter-national) health networks
• All information is available, people are connected to the network.
• The question is: How can we use it ?
![Page 74: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/74.jpg)
• Need for an information infrastructure to– index, exchange/share, process all this data– while preserving their privacy at a very large scale
• That is... just a good grid!
• Application objectives:– correlation of genomic and medical data: fundamental research and
later medical decision making process– patient-centered medical data integration: patient’s monitoring in
and out-side the hospital– epidemiology– training
An application example: GGM Motivation (2)
![Page 75: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/75.jpg)
• References: “Synergy between medical informatics and bioinformatics: facilitating genomic medicines for future healthcare”, – BIOINFOMED Working Group, Jan. 2003, European Commission
• Proceedings of Healthgrid conferences (1st edition in Lyon(2003))
An application example: GGM Motivation (3)
![Page 76: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/76.jpg)
An application example: GGM Scope
“The goal of the GGM project is, on top of a grid infrastructure, to propose a software architecture able to manage heterogeneous and dynamic data stored in distributed warehouses for intensive analysis and processing purposes.”
• Distributed Data Warehouses• Query Optimization• Data Access [and Control]• Data Mining
![Page 77: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/77.jpg)
An application example: GGM Data
• A piece of medical data (age, image, biological result, salient object in an image) has a meaning– It conveys information that can be interpreted (in multiple ways !)
• Meta-data can be attached to medical data… or not– pre-processing is necessary
• Medical data are often private– privacy/delegation
• The medical data of a patient are often disseminated over multiple sites– access rights/authentication problem, collection/integration of data into
partial views, identification of data/users
• Medical (meta-)data are complex and not yet (fully) standardized– no global structure
![Page 78: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/78.jpg)
An application example: GGM Architecture
GO Medical recordsExperiments
wrappers
DA+CacheNDS
OQS
DW DM
Grid Middleware
DW-GUI DW-GUI
GGM Middleware
![Page 79: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/79.jpg)
Virtual Data Warehouses on the Grid
![Page 80: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/80.jpg)
Virtual Data Warehouses on the Grid (1)
• Almost nothing…
• Why is it so difficult ?– multiple administrative domains– very sensitive data => security/privacy issues– wide distribution– unpredictability– relationship with data replica– heterogeneity– dynamicity (permanent production of large volumes of data)
• Centralized data warehouse ? – Not realistic at a large scale and not acceptable
![Page 81: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/81.jpg)
Virtual Data Warehouses on the Grid (2)
• A possible direction of research: virtual data warehouses on the grid
• Components:– a federated schema– a set of partial views (“chunks”) materialized at the local system level
• Advantages– Flexibility wrt users’ needs– Good use of the storage capacity of the grid and scalability– Security control at the local level– Global view of the disseminated data
![Page 82: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/82.jpg)
Virtual Data Warehouses on the Grid (3)
• Drawbacks and open issues– maintenance protocols– indexing tools– access to data and negotiation– query processing
![Page 83: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/83.jpg)
Access to data and collaborative brokers
![Page 84: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/84.jpg)
Access to data and collaborative brokers (1)
• Brokers act as interfaces between data, services and applications
• Possible locations– at the interface between the grid and the external data repositories– on the grid storage elements– at the interface between the grid and the user– inside the network (e.g. routers)
• Open issues– caching: computation results, query partial results…– data indexing– prefetching– user’s customization– inter brokers collaboration– a key issue: security and privacy
![Page 85: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/85.jpg)
Access to data and collaborative brokers (2): security and privacy
• Medical data belong to the patient that should be able to give access rights to who he wants
• To whom processed (even anonymous) data belong to ?
• How one can combine privacy and dissemination/
replication/caching ?
• What about traceability ?
• What about traceability ?
![Page 86: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/86.jpg)
Datamining and knowledge extraction on the grid
![Page 87: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/87.jpg)
Datamining and knowledge extraction on the grid
• Structure of the data: few records, many attributes
• Parallelizing data mining algorithms for the grid– volatility of the resources (data, processing)– fault tolerance, checkpointing– distribution of the data: local data exploration + aggregation
function to converge towards a unified model– incremental production of the data => active data mining
techniques
![Page 88: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/88.jpg)
A short overview of some grid middleware
![Page 89: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/89.jpg)
The Legion system
• University of Virginia• Object-oriented approach. Objects = data, applications,
sensors, computing resources, codes…: all is object !• Loosely coupled codes• Single naming space• Reuse of existing OS and protocols ; definition of message
formats and high level protocols• Core objects: naming, binding, object
creation/activation/desactivation/destruction• Methods: description via an IDL• Security: in the hands of the users• Resource allocation: a site can define its own policy
![Page 90: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/90.jpg)
High-Throughput Computing: Condor
• High-throughput computing platform for mapping many tasks to idle computers
• Since 1986 !
• Major components– A central manager manages pool(s) of [distributively owned or dedicated]
computers. A CM = scheduler + coordinator– DAGman manages user task pools– Matchmaker schedules tasks to computers using classified ads– Checkpointing and process migration– No simple communications
• Parameter studies, data analysis
• Condor married Globus: Condor-G
• Several hundreds of Condor pools in the world… or in your student room !
![Page 91: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/91.jpg)
Defining a DAG
• A DAG is defined by a .dag file, listing each of its nodes and their dependencies:# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B CParent B C Child D
• Each node will run the Condor job specified by its accompanying Condor submit file
Job A
Job B Job C
Job D
From Condor tutorial
![Page 92: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/92.jpg)
The Globus toolkit
• A set of integrated executable management grid services
• Services– resource management (GRAM-DUROC)– communication (NEXUS - MPICH-G2, globus_io)– information (MDS)– data management (replica catalog)– security (GSI)– monitoring (HBM)– remote data access (GASS - GridFTP - RIO)– executable management (GEM)– execution– commodity Grid Kits (Java, Python, Corba, Matlab…)
![Page 93: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/93.jpg)
Components in Globus Toolkit 3.0
GSI
WS-Security
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
WU GridFTPJAVA
WS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
![Page 94: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/94.jpg)
Components in Globus Toolkit 3.2
GSI
WS-Security
CAS(OGSI)
SimpleCA
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
OGSI-DAI
WU GridFTP
XIO
JAVAWS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
OGSI Python Bindings
(contributed)
pyGlobus(contributed)
![Page 95: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/95.jpg)
Components in Globus Toolkit 4
![Page 96: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/96.jpg)
Components in Globus Toolkit 5
![Page 97: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/97.jpg)
Conclusion (2005)
• Just a new toy for scientists or a revolution ?
• Huge investments
• Classical issues but a functional, operational and applicative context very complex
• Complexity from heterogeneity, wide distribution, security, dynamicity
• Functional shift from computing to information
• Data management in grids: not prehistory, but still middle-ages
• Still much work to do !!!
• A global framework for grid computing, pervasive computing and Web services ?
![Page 98: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/98.jpg)
Conclusion (2008)
• Just a new toy for scientists or a revolution ? Neither of them !
• Huge investments: too much ?!
• Classical issues but a functional, operational and applicative context very complex
• Complexity from heterogeneity, wide distribution, security, dynamicity
• Functional shift from computing to information
• Data management in grids: not middle-ages, but not 21st century => services
• Supercomputing is still alive
• A global framework for grid computing, pervasive computing and Web services… and SOA !
• Some convergence between P2P and grid computing
• The industrialization time
![Page 99: Grid computing: An introduction Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS 5205 Lyon, France](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d375503460f94a0f592/html5/thumbnails/99.jpg)
Conclusion (2010)
• 2008 conclusion is still valid
• … except that cloud computing has emerged !
• Will cloud computing kill grid computing? No!
• Is cloud computing just an avatar of grid computing? Yes and No
• See cloud computing presentation ;-) !