mwdriver: an object-oriented library for master-worker applications mike yoder, jeff linderoth,...
TRANSCRIPT
MWDriver: An Object-Oriented Library for Master-Worker
Applications
Mike Yoder, Jeff Linderoth,
Jean-Pierre Goux
February 26, 1999
Outline• Introduction to MWDriver
• The three classes– MW Driver– MW Task– MW Worker
• Limitations and future work
Introduction to MW
• Provide an object-oriented framework to develop master/worker applications
• Use Condor-PVM to handle acquiring / releasing nodes, message passing
• MW is fault tolerant– Handles workers arriving / leaving
– Handles workers getting suspended and resumed
• Assigns tasks to workers
MW Driver• Master “control center”• Sits in a loop and handles messages regarding
the state of all workers• Must implement these pure virtual functions
– get_userinfo()
– setup_initial_tasks()
– pack_worker_init_data()
– act_on_completed_task()
MW Task• A “Task” is one unit of work• Holds both
– Work to be done for that task
– Results of that work after it’s done
• Must implement these pure virtual functions– pack_work();
– unpack_work();
– pack_results();
– unpack_results();
MW Worker• Implements the worker program• Steps
– initialization
– ask master for work
– work
– report results
– repeat until dead
• Must implement these pure virtual functions– unpack_init_data()
– execute_task()
Master
Wid 1
Wid 2 Wid 3
Wid 4
W1
W2 W3
W4
Workers
T2 T3 T4 T5
Running
To DoT6 T7 T8 ...
Global Data
Statistics Collected• Each MWWorkerID instance keeps record of every
“interesting” event. Includes– Creation / removal
– Get work / complete work
– Suspended / resumed
• MWStats class gathers statistics from these records• Now reports:
– Run duration
– Worker total / suspended / working times
– Picture (.gif) of entire run (optional)
Current Work
– MW Sampling Golbon Zakeri
– MW LShaped Jeff Linderoth
– MW NLinBB Jean-Pierre Goux
– MW Development Mike Yoder
Future Work
• ‘If needed’ category– Heterogeneous set of workers
– Discrete work ‘stages’ (aka ‘Work Steps’)
• ‘Would be nice’ category– User - level checkpointing of the master
– More?