cas@home wenjing wu [email protected] computer center, institute of high energy physics chinese...
TRANSCRIPT
CAS@home
Wenjing [email protected] Center,
Institute of High Energy PhysicsChinese Academy of Sciences, Beijing
23/4/20 BOINC workshop 2013 @Grenoble 1
outline
• CAS@home project• Applications:– Lammps: dynamical molecular simulation– treeThreader: protein structure prediction
• Remote Job Submission
23/4/20 BOINC workshop 2013 @Grenoble 2
CAS@HOME
23/4/20 BOINC workshop 2013 @Grenoble 3
First and Only Volunteer Project in mainland ChinaFirst and Only Volunteer Project in mainland China
Launched in June 2010, hosted by the computer center of IHEP, CAS
Launched in June 2010, hosted by the computer center of IHEP, CAS
To support scientific computing from Chinese Academy of Sciences and other Research Institutes
To support scientific computing from Chinese Academy of Sciences and other Research Institutes
Host multiple applications from various research fields, including nanotechnology, bioinformation, physics
Host multiple applications from various research fields, including nanotechnology, bioinformation, physics
CAS@home status
23/4/20 BOINC workshop 2013 @Grenoble 4
Ever Since it was launched in June 2010Ever Since it was launched in June 2010
10K active users1/3 are Chinese
10K active users1/3 are Chinese
23K active hosts23K active hosts
7M CPU hoursSince Nov 20127M CPU hoursSince Nov 2012
Hosting 3 applications: Lammps , treeThreader, AevolOther ongoing applications: BOSS (VBoxwrapper based)
Hosting 3 applications: Lammps , treeThreader, AevolOther ongoing applications: BOSS (VBoxwrapper based)
1.3 TFLOPS(real time computing
power)
1.3 TFLOPS(real time computing
power)
Peak: 1M/monthvalidated CPU hours
Peak: 1M/monthvalidated CPU hours
Some project Statistics
Application 1: Lammps
• Software for dynamical molecular simulation, widely used by scientists from various research fields.
• Restartable, developed in C by an international group, can be compiled on both Windows and Linux with some effort.
• Input/output: 3 mandatory input files (<10MB)/ 1 compressed output file (hundreds of MB)
• Running time : 0.5 hour to 800 hours (it depends on a random number which decides the steps of the simulation)
23/4/20 BOINC workshop 2013 @Grenoble 6
Problems• Results are numerical, it generates discrepancy for 2 reasons:– float point calculation on different platforms– the checkpoints also cause discrepancy due to losing
precision with printing the value to a text file. • Solutions – Homogeneous Redundancy, or Homogeneous Application
Version
• Running problems:– Some long jobs (~hundreds hours) crash in the
middle without getting any credit.
23/4/20 BOINC workshop 2013 @Grenoble 7
Application 2: treeThreader
• For Protein structure prediction• Written in C by local scientists, can be compiled easily on both
Windows and Linux platform, restartable• Computing task: to compare a protein sequence file against
all existing protein templates. • Input files: configuration files, Protein Sequence file, ~50k
Protein templates (about 4GB)• Output files: a text file corresponds to a template file• It needs about 42GFLOPS/hour to compare one sequence file
against all templates.
23/4/20 BOINC workshop 2013 @Grenoble 8
Each comparison takes 6s
1 Host 1 Host
Computing task
A Protein sequenceA Protein sequence
Protein Template 1Protein Template 1
Protein Template 2Protein Template 2
Protein Template 3Protein Template 3
Protein Template 50,000
Protein Template 50,000
It takes about 84 hours on a single core
Each comparison takes 6s,each sub package takes 9000s on a
host
Running it on BOINC
A Protein sequenceA Protein sequence
It takes 9000s (2.5 hours) to finish the task
Host A1Host A1
Sub Package 1 (sticky file)Sub Package 1 (sticky file)Protein Template 1500Protein Template 1500
Protein Template 1Protein Template 1Protein Template 2Protein Template 2
Host A2Host A2
Sub Package 2(sticky file)Sub Package 2(sticky file)Protein Template 3000Protein Template 3000
Protein Template 1501Protein Template 1501Protein Template 1502Protein Template 1502
Host AmHost Am
Sub Package 32(sticky file)Sub Package 32(sticky file)Protein Template 48000Protein Template 48000
Protein Template 46501Protein Template 46501Protein Template 46502Protein Template 46502
Host AnHost An
Sub Package 14(sticky file)
Sub Package 14(sticky file)
Sub Package 15(sticky file)Sub Package 15(sticky file)
Sub Package 16(sticky file)
Sub Package 16(sticky file)
Locality Scheduling (job goes to where
the data is)
Problems
• Long tail batches– There is a front end server which submits batches and
does the pre-processing and post processing of the sequence, hence it can only maintain/watch a maximum number of active batches (batches in progress) in parallel (300)
– a whole batch is delayed by the slowest job– No new batches will be submitted to the BOINC server due
to some batches are still “in progress” (waiting for the slowest jobs)
– A lot of hosts end up in “starving” situation
23/4/20 BOINC workshop 2013 @Grenoble 11
Remote Job Submission• CAS@home hosts multiple applications• Each application has multiple users• Application users have no privileges to submit jobs via CAS@home server
directly• It requires remote job submission which allows authorized and
authenticated users to submit jobs through remote machines.• Basic Remote Job Submission functions: batch
submit/check_status/retire/abort/download results • BOINC provides a quite rich set of APIs for remote batch (a set of jobs based
on the same input files) operations, but each application still needs its own server side CGI code and client side code for remote job submission– Some operations (Batch retire/abort/status check) are generic, can directly use BOINC API– Other operations like batch submit/results downloading are application specific, need to be
customized. – Can add fancy functions as “test running”, “estimate running time”
23/4/20 BOINC workshop 2013 @Grenoble 12
Lammps Job Submission• Jobs are created in batches.• A batch = 1 set of input files + different parameter-value pairs• A batch comprises from hundreds to thousands of jobs • Remote Job Submission: Batches are submitted through a
web portal by authenticated and authorized users• Authenticated and Authorized users can “operate” the
batches through the web portal (retire, abort, check status, download results)
23/4/20 BOINC workshop 2013 @Grenoble 13
Batch A –(input file1, input file 2)Job 1: Ka1=Va1 Kb1=Vb1Job 2: Ka2=Va2 Kb2=Vb2…..Job N: KaN=VaN KbN=VbN
LAMMPS
CAS User InterfaceCAS User Interface
File SandboxFile Sandbox
Test a JobTest a Job
Submit a BatchSubmit a Batch
Check Batch StatusCheck Batch Status
Get OutputGet Output
CAS@homeCAS@home
LAMMPS CGI LAMMPS CGI
File Sandbox Service
File Sandbox Service
Job1: Para List , Value List1Job2: Para List , Value List2Job3: Para List , Value List3….JobN: Para List , Value ListN
Job1: Para List , Value List1Job2: Para List , Value List2Job3: Para List , Value List3….JobN: Para List , Value ListN ……
Syntax check, GLOPS, output size estimationSyntax check, GLOPS, output size estimation
http
http
http
http
http
http
http
Web PortalWeb Portal
http
Pass the testPass the test
23/4/20 BOINC workshop 2013 @Grenoble 15
Sandbox SandboxFile1File2File1File2
LAMMPS CGI on CAS@home serverLAMMPS CGI on
CAS@home server
Job TesterJob Tester
Batch CreatorBatch Creator
Batch MonitorJob Monitor
Batch MonitorJob Monitor
Operations on BatchOperations on Batch
Abort/Retire a batchAbort/Retire a batch
Download ResultsDownload Results
Batch OperationsBatch Operations
Zip ResultsZip Results
Volunteer Hosts
Volunteer Hosts
Volunteer Hosts
Volunteer Hosts
UserUser
Test a job with chosen input files
Test a job with chosen input files
Submit a batchSubmit a batch
http
http
BOINC Sandbox
23/4/20 BOINC workshop 2013 @Grenoble 16
Can not repeat uploading a file
Can not delete files used by a running batch
Lammps Job Testing
23/4/20 BOINC workshop 2013 @Grenoble 17
Test the job to the server
Submit the batch
Lammps Specific !Lammps Specific !
Batch Monitoring
23/4/20 BOINC workshop 2013 @Grenoble 18
Admin can see the status of all batches
Batch status: In process, Completed, Aborted, Retired
Admin all batches
23/4/20 BOINC workshop 2013 @Grenoble 19
Job Status
23/4/20 BOINC workshop 2013 @Grenoble 20
Input files associated with this job
Results can be downloaded respectively
Batch Operations
23/4/20 BOINC workshop 2013 @Grenoble 21
Download results of this batch
Retire a batch
Download results of a work unit
Can Abort an unfinished batch
here
TreeThreader job submission• Jobs are created in batches: 1 protein sequence
corresponds to 1 batch (32 jobs)• Remote Job Submission: – Client side: provide a set of PHP APIs which allows
authenticated and authorized users to submit batches and operate (check status, retire, abort, get output)these batches from remote
– Server side:• Generic operations such as batch abort/retire/status check are already
included in BOINC code• Operations as batch submission and results downloading are application
specific, and implemented in a CGI program on the server side
23/4/20 BOINC workshop 2013 @Grenoble 22
TreeThreader Job Submission CGI• Batch submission
– Takes client uploaded the sequence and configuration files– create a batch of jobs based on the input files and all templates files which
are already stored on the server side.– Return a Batch ID
• Batch result downloading– uncompress all output files of the batch– put uncompressed output files into a same directory and compress it– return the downloading URL of the batch result file
23/4/20 BOINC workshop 2013 @Grenoble 23
TreeThreader Job Submission
TreeThreader CGI CAS@home
TreeThreader CGI CAS@home
Template P1Template P1
Template P2Template P2
Template P3Template P3
Template P32Template P32……
……
Template P4Template P4
ICT Web ServicesICT Web Services
APIAPI
Submit a sequenceSubmit a sequence
Status CheckStatus Check
Get OutputGet Output
SequenceM
erged Results
Thoughts on a more generic Job submission interface
• Server side still requires specific functions to create batches, merge results, testing, estimation
• On client side, can generalize the job submission and results downloading functions
• Use an XML file to describe input files, types of input files from the client side
23/4/20 BOINC workshop 2013 @Grenoble 25
23/4/20 BOINC workshop 2013 @Grenoble 26
<jobdesc> <file info> <number> 0 </number> <type>upload</type> !file needs to be uploaded to BOINC server </file info> <file info> <number> 1 </number> <type>online</type> !file already stored on BOINC server </file info> <file_ref> <file_number>0</file_number> <open_name>MySEQ.tar.gz</open_name> </file_ref> <file_ref> <file_number>1</file_number> <open_name>Templates</open_name> </file_ref></jobdesc>
The End!
23/4/20 BOINC workshop 2013 @Grenoble 27