Resource Management in Volunteer Computing Grids
An analysis of the different approaches to maximizing
throughput on a BOINC grid
Presented by Geoffrey Oxholm and Beata ChrulkiewiczCS-575 Position Paper Presentation Fall 2007
Volunteer Grids• A Type of Grid Computer
– Decentralized, volunteer nodes• Supercomputing for free
– 1.1 PetaFLOPS vs. 360 TeraFLOPS
Image: http://www.di.unipi.it/groups/architetture/images/grid.gifhttp://holistic.com.mt/h/?Page=Article&Ref=107
• Unreliable Nodes– Users can disconnect their computers anytime– Amount of donated resources is subject to change– Evil jerks can upload malicious data
Berkeley Open Infrastructure for Network Computing
• Duplicate work to ensure validity– R – The “Redundancy Factor”
• Validate computation results. If the validation fails, repeat computation. – Validation Methods:
• Majority Voting– More than R/2 nodes must agree
• M-First Voting– First M nodes must agree
Image: http://en.wikipedia.org/wiki/Image:BOINC_logo_July_2007.png
Success and Limitations of BOINC
• With proper configuration high throughput can be achieved
• Still quite difficult to get volunteers
• Proper configuration is difficult• Fixed configurations can not
account for constantly changing grid characteristics
Image: http://www.baseacid.com/imagesRR/workBand.jpg
Fix: User Encouragement Feedback and Reward
• Each node generates statistics• Teams can be formed• Sense of pride in commitment• Encourages users to donate more time, resources
Image: http://teamocuk.com/cprojectcred1.php?p=PAH
Team OCUKPredictor@home
total credit.
Go team!
Fix: Maximizing Configuration Through Usage Simulation
• Enumerate a set of possible configurations• Test configurations in a fraction of the time • Avoid disturbing volunteers by simulating• Zero in on an effective configuration
Image: http://www.cyberroach.com/tron/tron3_circuit.jpg
Fix: Dynamic Redundancy Through Reliability Prediction
• Wait for a minimum number of nodes before assigning work
• Choose nodes which have higher reliability• Higher reliability means less need for redundancy
• Successful completion yields higher reliability rating for the node
Image: http://image.compusa.com/prodimages/44/8537c95c-8027-4840-b976-67deb0690e13.gif
Evaluation• User Encouragement
– Encourages cheating– Does nothing to maximize
efficient use of resources• Usage Simulation
– Still requires researchers to configure system– Static configuration fails to match dynamic grid
• Reliability Rating– Subject to further exploitation– Further minimizes the value of slow nodes, working
against incentives
Image: GPL Licensed
Conclusion• Build on existing methods
– Continue to encourage users– Create a starting point by using simulation– Update reliability system to avoid conflict
with system of incentives
• Develop new technologies– Blacklist malicious nodes– Develop a more comprehensive reliability system
which uses past schedules to predict future availability
Image: http://pixels.dessgeega.com/wp-content/uploads/2006/10/organize_big.gif
Questions?
Image: http://www.grid.phys.uvic.ca/
Geoff Oxholm Beata Churkiewicz