alien v2-20

16
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ DB ES A. Abramyan, L. Betev, D. Goyal, A. Grigoras, C. Grigoras, M. Litmaath, N. Manukyan, M. Martinez, J. Porter, P. Saiz, S. Sankar, S. Schreiner AliEn v2-20

Upload: lilli

Post on 07-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

AliEn v2-20. A. Abramyan , L. Betev , D. Goyal , A. Grigoras , C. Grigoras , M. Litmaath , N . Manukyan , M. Martinez, J . Porter, P. Saiz, S. Sankar , S. Schreiner. Content. New features on v2.20 TaskQueue Catalogue Service communication Deployment Summary. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: AliEn  v2-20

Experiment Support

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

DBES

A. Abramyan, L. Betev, D. Goyal, A. Grigoras,

C. Grigoras, M. Litmaath, N. Manukyan,

M. Martinez, J. Porter, P. Saiz,

S. Sankar, S. Schreiner

AliEn v2-20

Page 2: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

25 Oct 2012 Pablo Saiz ALICE offline week

Content

• New features on v2.20– TaskQueue– Catalogue– Service communication

• Deployment• Summary

Page 3: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

35 Oct 2012 Pablo Saiz ALICE offline week

Database Layout

• Single DB• Innodb tables

– Row locking– Foreign keys– Transactions

• not used…

• Lookup tables• 2 JDLs per job• JDL fields mapped to

columns • Link to full graph

Page 4: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

45 Oct 2012 Pablo Saiz ALICE offline week

Brokering

• Avoid Classad matching– Less fields to parse

• Match in a single SQL statement.

• Two attempts at matching:– With packages already installed– With any packages– (Add a third attempt with remote data??)

Page 5: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

55 Oct 2012 Pablo Saiz ALICE offline week

File brokering

Site A Site B Site C

File 1

File 2

File 3

File 4

File 5

Current schemaSubmit 4 jobs:

File1File 4

File2 File3 File 5

Broker per fileSubmit 3 empty subjobs

File1,2,4,5

When a job starts, analyze as much as possible

File 3

If nothing left, just exit

Page 6: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

65 Oct 2012 Pablo Saiz ALICE offline week

More TaskQueue

• MaxWaitingTime: amount of time that job can stay in ‘WAITING’– If time exceeded, job ends up in error– New state: ERROR_EW (Expired Waiting)

• Retrial:– Number of times that a single job can be

resubmitted– Resubmission done by central services

• Reusing JobId in resubmission• Direct removal of KILLED jobs

Page 7: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

75 Oct 2012 Pablo Saiz ALICE offline week

Some results…

• DB time to insert a job, and 8 change status:

Time to process all 230M ALICE jobs:

4.8 days

Page 8: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

85 Oct 2012 Pablo Saiz ALICE offline week

Service communication

• Replacing SOAP with JSON– Less overhead (no XML encoding)– Easier to interact with other clients– And even from a web browser

• Backward incompatible change

Page 9: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

95 Oct 2012 Pablo Saiz ALICE offline week

SOAP vs JSON

• Apache web server

• 32 hosts for clients – 16 cores– 8000 calls

per client

Page 10: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

105 Oct 2012 Pablo Saiz ALICE offline week

Catalogue

• Innodb tables– Row locking– Transactions– Foreign keys

Page 11: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

115 Oct 2012 Pablo Saiz ALICE offline week

Deployment

• All the features already deployed on ALICE_TEST

• Instead of one single big-bang release, divide it in three:– TaskQueue– JSON– Catalogue

• Reduces amount of downtime, – Increases complexity of deployment…

Page 12: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

125 Oct 2012 Pablo Saiz ALICE offline week

Central Services

catalogue

TaskQueue Transfers

LDAP

Central Services

Api

Api

Api

Api

aliensh

vobox

ROOT

3 machines (+1 slave, backups)

12 machines

8 machines

80 sites

3 machines (+1 slave, backups)

AliEn v2-17

12 machinesAliEn v2-19**, v2-17

8 machinesAliEn v2-19**

80 sitesAliEn v2-19.(80-163)

JA

40.000 wn40.000 wnAliEn v2-19.(80-163)

BACKUP

Page 13: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

135 Oct 2012 Pablo Saiz ALICE offline week

Deployment of TaskQueue

• Only needed on the central services• Database migration of 1 hour (24 GB)• Already done!

– Monday, 1st Oct• Downtime of 12 hours

• Method:– Install new version– Stop services– Convert DB– Start services

Page 14: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

145 Oct 2012 Pablo Saiz ALICE offline week

Deployment of JSON

• Full deployment– Once Central Services updated, old installation

won’t be able to connect

• No database migration• Plan:

– Install new version everywhere– Stop all services– Restart everything with new version

• When:– ?

Page 15: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

155 Oct 2012 Pablo Saiz ALICE offline week

Deployment of catalogue

• Only needed on central services• Very delicate operation• Database migration of 24 hours

– 430 GB, 290 big tables

• Plan:– Prepare a hybrid version– Install v2-20 and hybrid– Restart services with hybrid– Convert DB– Restart services with v2-20

• When:?

Page 16: AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

165 Oct 2012 Pablo Saiz ALICE offline week

Summary

• Parts of AliEn v2.20 already deployed!• TaskQueue speed improved drastically

– 40 times insertion rate– 20 times resubmission time– Improved concurrency

• Need to schedule 2 more upgrades– JSON: Improve service communication– New catalogue layout