irods/ddn user group 20140908 sanger

36
Future plans for iRODS John Constable Informatics Support Group [email protected]

Upload: john-constable

Post on 28-Nov-2014

180 views

Category:

Technology


1 download

DESCRIPTION

My presentation to the UCL hosted DDN/iRods user group help on the 8/9th September.

TRANSCRIPT

Page 1: iRODS/DDN User Group 20140908 Sanger

Future plans for iRODS

John Constable Informatics Support Group

[email protected]

Page 2: iRODS/DDN User Group 20140908 Sanger

About the Institute Funded by Wellcome Trust.

2nd largest research charity in the world. ~700 employees.

Large scale genomic research.

Sequenced 1/3 of the human genome (largest single contributor).

We have active cancer, malaria, pathogen and genomic variation studies.

All data is made publicly available.

Websites, ftp, direct database, access, programmatic APIs.

3 About Us

Page 3: iRODS/DDN User Group 20140908 Sanger

The Sanger Institute: A Little Background

1997 (yeast genome

completed)

2003 (first mouse genome draft

Malarial parasite sequence completed)

2010 (Completion of 1000 genomes

Start or uk10k study)

2005 (WTGCCC

established)

2008 (start of 1000

genome project)

2001 (First draft of

human genome. Sanger upped

contribution to 1/3)

4 About Us

Page 4: iRODS/DDN User Group 20140908 Sanger

Sequence till 2011

5 About Us

Page 5: iRODS/DDN User Group 20140908 Sanger

Original Design Brief

Image credit: Ryan Raffa, ryanraffa.com 6

Design Brief

Page 6: iRODS/DDN User Group 20140908 Sanger

Is the data safe?

7 Design Brief

Page 7: iRODS/DDN User Group 20140908 Sanger

Can the scientists find their data?

Image credit: searchengineland.com 8

Design Brief

Page 8: iRODS/DDN User Group 20140908 Sanger

Path of least surprise

Image credit: betanews.com 9

Design Brief

Page 9: iRODS/DDN User Group 20140908 Sanger

Minimal Maintenance

Image credit: failblog.cheezburger.com 10

Design Brief

Page 10: iRODS/DDN User Group 20140908 Sanger

Current Setup

11

Page 11: iRODS/DDN User Group 20140908 Sanger

Metadata Heavy Usage

Example attribute fields → Users query and access

data largely from local compute clusters

Users access iRODS

locally via the cli Largely provided on

creation with Baton API’s via pipelines

attribute: library attribute: total_reads attribute: type attribute: lane attribute: is_paired_read attribute: study_accession_number attribute: library_id attribute:

sample_accession_number attribute: sample_public_name attribute: manual_qc attribute: tag attribute: sample_common_name attribute: md5 attribute: tag_index attribute: study_title attribute: study_id attribute: reference attribute: sample attribute: target attribute: sample_id attribute: id_run attribute: study attribute: alignment

12 Current Setup

Page 12: iRODS/DDN User Group 20140908 Sanger

Current Deployment

Replication between data centre ‘rooms’ (direct, not queued)

One resource set as default for incoming objects,

migration via cron script as it fills up

Checksum via iput strongly encouraged

13 Current Setup

Page 13: iRODS/DDN User Group 20140908 Sanger

Current Logical Design

Sanger1 (Portal Zone)

/seq

green red

/humgen

green red

Portal provides kerberised access Federation using head zone accounts

14 Current Setup

Page 14: iRODS/DDN User Group 20140908 Sanger

The Future

Image copyright: flyinglow.ca 15

Page 15: iRODS/DDN User Group 20140908 Sanger

Future Logical Design

Sanger1 (Portal Zone)

/seq

green red Orange?

/humgen

Red green Orange?

Orange will be offsite at Infinity

16 The Future

Page 16: iRODS/DDN User Group 20140908 Sanger

Why the change?

17 The Future

Page 17: iRODS/DDN User Group 20140908 Sanger

Everyone has an airport, why is that noteworthy?

18 The Future

Page 18: iRODS/DDN User Group 20140908 Sanger

•  10g via JANET between sites •  Tested on AWS first (successfully!) •  Using Oracle failover

•  (HA tnsnames entry to RAC clusters) •  Build here and ship to new site

•  ~3PB to transfer •  Need to have local replicas

19 The Future

Page 19: iRODS/DDN User Group 20140908 Sanger

Image credit: diy.despair.com 20

Page 20: iRODS/DDN User Group 20140908 Sanger

The spindles spin, but does the data(base)?

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●●●

●●●●●

●●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

0 20 40 60 80 100 120 140 160

010

0020

0030

0040

00

IRODS upload of genotyping data archiveStarted 2014−04−24 10:50

Hours after start

Files

uplo

aded

per

hou

r

010

2030

4050

6070

80

Mea

n file

s uplo

aded

per

minu

te

5 pm Friday 9 am Monday

mean = 55.6/min

21 Experience

Page 21: iRODS/DDN User Group 20140908 Sanger

Our block storage deployment

22

Page 22: iRODS/DDN User Group 20140908 Sanger

Your mileage may vary

23 Experience

Page 23: iRODS/DDN User Group 20140908 Sanger

‘Wait, WHAT?’

24 Experience

Page 24: iRODS/DDN User Group 20140908 Sanger

Databases are good at data, right?

25 Experience

Page 25: iRODS/DDN User Group 20140908 Sanger

Can you spot the optimisation time point?

26 Experience

Page 26: iRODS/DDN User Group 20140908 Sanger

Database Tuning FTW

27 Experience

Page 27: iRODS/DDN User Group 20140908 Sanger

Features we’re not using.. yet

Image credit: Melissa Penta; mydigitalmind.com 28

Page 28: iRODS/DDN User Group 20140908 Sanger

More rules based notification E.g. notifying PI’s on access to restricted data

Pam authentication instead of Kerberos

iDrop

(metadata query non trivial ATM)

29 Features

Page 29: iRODS/DDN User Group 20140908 Sanger

Features we want

Image credit: www.paperspencils.com 30

Features

Page 30: iRODS/DDN User Group 20140908 Sanger

Object store plugins/integration

•  Caching plugin thoughts

•  Streaming files •  Local cache space •  Managing local cache •  Multi site, esp updates

•  Integration with Vendors •  Replica count? •  Site/geographical awareness •  metrics, metrics, metrics! (also reliability, manageability, low cost.. )

31 Features

Page 31: iRODS/DDN User Group 20140908 Sanger

Instrumentation

32 Features

Page 32: iRODS/DDN User Group 20140908 Sanger

More like this, pls

33 Features

Page 33: iRODS/DDN User Group 20140908 Sanger

Oracle MySQL

34 Features

Page 34: iRODS/DDN User Group 20140908 Sanger

What our users are doing

Source: projectcartoon.com 35

Page 35: iRODS/DDN User Group 20140908 Sanger

Serapis (REST API, python, RabbitMQ, MongoDB)

Baton

Stuff they haven’t told us

36 Users

Page 36: iRODS/DDN User Group 20140908 Sanger

Thank you!

Acknowledgements: Dr Pete Clapham

John Constable Informatics Support Group

[email protected] @kript

37