cas@home wenjing wu [email protected] computer center, institute of high energy physics chinese...

27
CAS@home Wenjing Wu [email protected] Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 22/6/27 BOINC workshop 2013 @Grenoble 1

Upload: gervais-sullivan

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

CAS@home

Wenjing [email protected] Center,

Institute of High Energy PhysicsChinese Academy of Sciences, Beijing

23/4/20 BOINC workshop 2013 @Grenoble 1

Page 2: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

outline

• CAS@home project• Applications:– Lammps: dynamical molecular simulation– treeThreader: protein structure prediction

• Remote Job Submission

23/4/20 BOINC workshop 2013 @Grenoble 2

Page 3: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

CAS@HOME

23/4/20 BOINC workshop 2013 @Grenoble 3

First and Only Volunteer Project in mainland ChinaFirst and Only Volunteer Project in mainland China

Launched in June 2010, hosted by the computer center of IHEP, CAS

Launched in June 2010, hosted by the computer center of IHEP, CAS

To support scientific computing from Chinese Academy of Sciences and other Research Institutes

To support scientific computing from Chinese Academy of Sciences and other Research Institutes

Host multiple applications from various research fields, including nanotechnology, bioinformation, physics

Host multiple applications from various research fields, including nanotechnology, bioinformation, physics

Page 4: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

CAS@home status

23/4/20 BOINC workshop 2013 @Grenoble 4

Ever Since it was launched in June 2010Ever Since it was launched in June 2010

10K active users1/3 are Chinese

10K active users1/3 are Chinese

23K active hosts23K active hosts

7M CPU hoursSince Nov 20127M CPU hoursSince Nov 2012

Hosting 3 applications: Lammps , treeThreader, AevolOther ongoing applications: BOSS (VBoxwrapper based)

Hosting 3 applications: Lammps , treeThreader, AevolOther ongoing applications: BOSS (VBoxwrapper based)

1.3 TFLOPS(real time computing

power)

1.3 TFLOPS(real time computing

power)

Peak: 1M/monthvalidated CPU hours

Peak: 1M/monthvalidated CPU hours

Page 5: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Some project Statistics

Page 6: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Application 1: Lammps

• Software for dynamical molecular simulation, widely used by scientists from various research fields.

• Restartable, developed in C by an international group, can be compiled on both Windows and Linux with some effort.

• Input/output: 3 mandatory input files (<10MB)/ 1 compressed output file (hundreds of MB)

• Running time : 0.5 hour to 800 hours (it depends on a random number which decides the steps of the simulation)

23/4/20 BOINC workshop 2013 @Grenoble 6

Page 7: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Problems• Results are numerical, it generates discrepancy for 2 reasons:– float point calculation on different platforms– the checkpoints also cause discrepancy due to losing

precision with printing the value to a text file. • Solutions – Homogeneous Redundancy, or Homogeneous Application

Version

• Running problems:– Some long jobs (~hundreds hours) crash in the

middle without getting any credit.

23/4/20 BOINC workshop 2013 @Grenoble 7

Page 8: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Application 2: treeThreader

• For Protein structure prediction• Written in C by local scientists, can be compiled easily on both

Windows and Linux platform, restartable• Computing task: to compare a protein sequence file against

all existing protein templates. • Input files: configuration files, Protein Sequence file, ~50k

Protein templates (about 4GB)• Output files: a text file corresponds to a template file• It needs about 42GFLOPS/hour to compare one sequence file

against all templates.

23/4/20 BOINC workshop 2013 @Grenoble 8

Page 9: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Each comparison takes 6s

1 Host 1 Host

Computing task

A Protein sequenceA Protein sequence

Protein Template 1Protein Template 1

Protein Template 2Protein Template 2

Protein Template 3Protein Template 3

Protein Template 50,000

Protein Template 50,000

It takes about 84 hours on a single core

Page 10: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Each comparison takes 6s,each sub package takes 9000s on a

host

Running it on BOINC

A Protein sequenceA Protein sequence

It takes 9000s (2.5 hours) to finish the task

Host A1Host A1

Sub Package 1 (sticky file)Sub Package 1 (sticky file)Protein Template 1500Protein Template 1500

Protein Template 1Protein Template 1Protein Template 2Protein Template 2

Host A2Host A2

Sub Package 2(sticky file)Sub Package 2(sticky file)Protein Template 3000Protein Template 3000

Protein Template 1501Protein Template 1501Protein Template 1502Protein Template 1502

Host AmHost Am

Sub Package 32(sticky file)Sub Package 32(sticky file)Protein Template 48000Protein Template 48000

Protein Template 46501Protein Template 46501Protein Template 46502Protein Template 46502

Host AnHost An

Sub Package 14(sticky file)

Sub Package 14(sticky file)

Sub Package 15(sticky file)Sub Package 15(sticky file)

Sub Package 16(sticky file)

Sub Package 16(sticky file)

Locality Scheduling (job goes to where

the data is)

Page 11: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Problems

• Long tail batches– There is a front end server which submits batches and

does the pre-processing and post processing of the sequence, hence it can only maintain/watch a maximum number of active batches (batches in progress) in parallel (300)

– a whole batch is delayed by the slowest job– No new batches will be submitted to the BOINC server due

to some batches are still “in progress” (waiting for the slowest jobs)

– A lot of hosts end up in “starving” situation

23/4/20 BOINC workshop 2013 @Grenoble 11

Page 12: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Remote Job Submission• CAS@home hosts multiple applications• Each application has multiple users• Application users have no privileges to submit jobs via CAS@home server

directly• It requires remote job submission which allows authorized and

authenticated users to submit jobs through remote machines.• Basic Remote Job Submission functions: batch

submit/check_status/retire/abort/download results • BOINC provides a quite rich set of APIs for remote batch (a set of jobs based

on the same input files) operations, but each application still needs its own server side CGI code and client side code for remote job submission– Some operations (Batch retire/abort/status check) are generic, can directly use BOINC API– Other operations like batch submit/results downloading are application specific, need to be

customized. – Can add fancy functions as “test running”, “estimate running time”

23/4/20 BOINC workshop 2013 @Grenoble 12

Page 13: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Lammps Job Submission• Jobs are created in batches.• A batch = 1 set of input files + different parameter-value pairs• A batch comprises from hundreds to thousands of jobs • Remote Job Submission: Batches are submitted through a

web portal by authenticated and authorized users• Authenticated and Authorized users can “operate” the

batches through the web portal (retire, abort, check status, download results)

23/4/20 BOINC workshop 2013 @Grenoble 13

Batch A –(input file1, input file 2)Job 1: Ka1=Va1 Kb1=Vb1Job 2: Ka2=Va2 Kb2=Vb2…..Job N: KaN=VaN KbN=VbN

Page 14: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

LAMMPS

CAS User InterfaceCAS User Interface

File SandboxFile Sandbox

Test a JobTest a Job

Submit a BatchSubmit a Batch

Check Batch StatusCheck Batch Status

Get OutputGet Output

CAS@homeCAS@home

LAMMPS CGI LAMMPS CGI

File Sandbox Service

File Sandbox Service

Job1: Para List , Value List1Job2: Para List , Value List2Job3: Para List , Value List3….JobN: Para List , Value ListN

Job1: Para List , Value List1Job2: Para List , Value List2Job3: Para List , Value List3….JobN: Para List , Value ListN ……

Page 15: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Syntax check, GLOPS, output size estimationSyntax check, GLOPS, output size estimation

http

http

http

http

http

http

http

Web PortalWeb Portal

http

Pass the testPass the test

23/4/20 BOINC workshop 2013 @Grenoble 15

Sandbox SandboxFile1File2File1File2

LAMMPS CGI on CAS@home serverLAMMPS CGI on

CAS@home server

Job TesterJob Tester

Batch CreatorBatch Creator

Batch MonitorJob Monitor

Batch MonitorJob Monitor

Operations on BatchOperations on Batch

Abort/Retire a batchAbort/Retire a batch

Download ResultsDownload Results

Batch OperationsBatch Operations

Zip ResultsZip Results

Volunteer Hosts

Volunteer Hosts

Volunteer Hosts

Volunteer Hosts

UserUser

Test a job with chosen input files

Test a job with chosen input files

Submit a batchSubmit a batch

http

http

Page 16: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

BOINC Sandbox

23/4/20 BOINC workshop 2013 @Grenoble 16

Can not repeat uploading a file

Can not delete files used by a running batch

Page 17: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Lammps Job Testing

23/4/20 BOINC workshop 2013 @Grenoble 17

Test the job to the server

Submit the batch

Lammps Specific !Lammps Specific !

Page 18: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Batch Monitoring

23/4/20 BOINC workshop 2013 @Grenoble 18

Admin can see the status of all batches

Batch status: In process, Completed, Aborted, Retired

Page 19: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Admin all batches

23/4/20 BOINC workshop 2013 @Grenoble 19

Page 20: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Job Status

23/4/20 BOINC workshop 2013 @Grenoble 20

Input files associated with this job

Results can be downloaded respectively

Page 21: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Batch Operations

23/4/20 BOINC workshop 2013 @Grenoble 21

Download results of this batch

Retire a batch

Download results of a work unit

Can Abort an unfinished batch

here

Page 22: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

TreeThreader job submission• Jobs are created in batches: 1 protein sequence

corresponds to 1 batch (32 jobs)• Remote Job Submission: – Client side: provide a set of PHP APIs which allows

authenticated and authorized users to submit batches and operate (check status, retire, abort, get output)these batches from remote

– Server side:• Generic operations such as batch abort/retire/status check are already

included in BOINC code• Operations as batch submission and results downloading are application

specific, and implemented in a CGI program on the server side

23/4/20 BOINC workshop 2013 @Grenoble 22

Page 23: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

TreeThreader Job Submission CGI• Batch submission

– Takes client uploaded the sequence and configuration files– create a batch of jobs based on the input files and all templates files which

are already stored on the server side.– Return a Batch ID

• Batch result downloading– uncompress all output files of the batch– put uncompressed output files into a same directory and compress it– return the downloading URL of the batch result file

23/4/20 BOINC workshop 2013 @Grenoble 23

Page 24: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

TreeThreader Job Submission

TreeThreader CGI CAS@home

TreeThreader CGI CAS@home

Template P1Template P1

Template P2Template P2

Template P3Template P3

Template P32Template P32……

……

Template P4Template P4

ICT Web ServicesICT Web Services

APIAPI

Submit a sequenceSubmit a sequence

Status CheckStatus Check

Get OutputGet Output

SequenceM

erged Results

Page 25: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

Thoughts on a more generic Job submission interface

• Server side still requires specific functions to create batches, merge results, testing, estimation

• On client side, can generalize the job submission and results downloading functions

• Use an XML file to describe input files, types of input files from the client side

23/4/20 BOINC workshop 2013 @Grenoble 25

Page 26: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

23/4/20 BOINC workshop 2013 @Grenoble 26

<jobdesc> <file info> <number> 0 </number> <type>upload</type> !file needs to be uploaded to BOINC server </file info> <file info> <number> 1 </number> <type>online</type> !file already stored on BOINC server </file info> <file_ref> <file_number>0</file_number> <open_name>MySEQ.tar.gz</open_name> </file_ref> <file_ref> <file_number>1</file_number> <open_name>Templates</open_name> </file_ref></jobdesc>

Page 27: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013

The End!

23/4/20 BOINC workshop 2013 @Grenoble 27