service and support for science it-peter kunzst, university of zurich

34
Service and Support for Science IT Scientific Cloud Experiences Dr. Peter Kunszt Director S 3 IT

Upload: mind-the-byte

Post on 22-Jun-2015

185 views

Category:

Healthcare


1 download

TRANSCRIPT

Page 1: Service and Support for Science IT-Peter Kunzst, University of Zurich

Service and Support for Science ITScientific Cloud Experiences

Dr. Peter KunsztDirector S3IT

Page 2: Service and Support for Science IT-Peter Kunzst, University of Zurich

Outline

• Introduction– What is Science IT– How are we organized

• UZH ScienceCloud Infrastructure and Implementation

• Science Data and Security/Privacy

Page 3: Service and Support for Science IT-Peter Kunzst, University of Zurich

Challenge : Scale Up

• High Throughput Instruments– Much larger data volumes– Increased data complexity

• Large Collaborations– More people– More experiments and measurements– More coverage

BIG

DATA

Page 4: Service and Support for Science IT-Peter Kunzst, University of Zurich

Fire and forget...

• Scientists do not want to be bothered with infrastructure details

• IT JUST NEEDS TO WORK!

Page 5: Service and Support for Science IT-Peter Kunzst, University of Zurich

Widening Complexity Gap: IT-Research

Local IT Resources

Research LabsCore Facilities

MiracleSCIENCE IT

Page 6: Service and Support for Science IT-Peter Kunzst, University of Zurich

What is Science IT ?

FILL THE GAPDedicated Support Center for Science IT

• SPEED : faster time to solution• ACCESS : to infrastructure,

software, expertise• ENABLE : use IT technology and

software for new ideas

Speed

Access

Enablement

Page 7: Service and Support for Science IT-Peter Kunzst, University of Zurich
Page 8: Service and Support for Science IT-Peter Kunzst, University of Zurich

Supporting Science• Be a partner to research projects for Science IT• Provide services to individual researchers, groups and consortia

– Consultancy for advanced usage of IT in Science– Research software development and support– Access to competitive IT infrastructure– Access to a library of tools and software– Project management and collaboration support– Training and education on the usage of infrastructure and software

• Collaborate internally, nationally and internationally with partners, suppliers and other Science IT units

• Maintain high level of internal expertise on topics relevant to Science IT

• Advise UZH Governance on evolution of needs, assist in prioritization

Page 9: Service and Support for Science IT-Peter Kunzst, University of Zurich

Organization Structures are Changing

OrgA

Org C Org D Org E

Org B Org F

Org G

Org H

Org AOrg B

Org C

OrgD

Old world: Hierarchical New world: Federated

http://www.fedsm.eu/

Page 10: Service and Support for Science IT-Peter Kunzst, University of Zurich

S3IT Organization

Core Team

Site Team

Site Team

EE

EE

EE

EE

EE

...

...

EE = Embedded ExpertWorking directly in projects or on-site in groups on specific tasks

Site TeamsJoint teams with other units providing local support and some global services

Core TeamDirectorate, Office, core services, central infrastructure and consultancy, project mgmt

Page 11: Service and Support for Science IT-Peter Kunzst, University of Zurich

Partner Interactions

CoreFacilitiesCore

FacilitiesCoreFacilities

Agreements

Services

Research GroupsProjectsProjectsProjects

Partners / Clients

Research GroupsResearch

GroupsResearch Groups

Services

FacultiesInstitutes

Departments

FacultiesInstitutes

Departments

Services

Central IT

Partners / Suppliers

Agreements

CSCS

internalexternal

VendorsVendorsVendorsVendors

Agreements

Page 12: Service and Support for Science IT-Peter Kunzst, University of Zurich

S3IT Core Business: Project Support

• Infrastructure is important but ‚just‘ a means to an end• Science IT Support: Applications, access, integration• Data analysis• Simulations• Data Integration• Application scaling, making use of big infrastructures• Workflows, automation• Visualization• Software design and usage advice, Code Clinic• Training and education• ...

Page 13: Service and Support for Science IT-Peter Kunzst, University of Zurich

14

Understand the science..

.. to map Science IT services!

Page 14: Service and Support for Science IT-Peter Kunzst, University of Zurich

Mapping Security and Privacy

• Most science follows 3 stages– Conception, preparation, proposition stage – private – Project stage (3-5y) – share in group– Publication of results – open to all

• Some have additional constraints (regulations)– Medicine – patient data records need consent

(different per country)– Law and business – confidentiality in projects– Engineering, pharmacology, etc.. – patents

Page 15: Service and Support for Science IT-Peter Kunzst, University of Zurich

Infrastructure• Supercomputing

– Used as a scientific instrument by • theoretical physics, astrophysics, mathematics, computational chemistry,

biochemistry, quantum chemistry• Continuous usage

• Cluster computing– Used as a workhorse by many groups

• Life science, biochem, geoscience, medicine, digital humanities, banking and finance, art history, ...

• Data analysis, statistical analysis, parameter studies, etc• Non-continuous usage

• Server computing– Used as interactive computers by many groups

• All groups. Interactive processing, visualization, steering of computation. Commercial and open-source tools.

• Daily usage, non-continuous.

Page 16: Service and Support for Science IT-Peter Kunzst, University of Zurich

Storage Classes

• Large, cheap data store for projects O(xPB)– No need to be backed up: Easy to regenerate but time-

consuming• Reliable project data store O(1PB)

– With secondary copy– Only addition, no changes

• Working storage O(x100TB)– Active data, databases, server-side processes

• Fast storage for streaming analysis O(100TB)– Fast changing data, immediate analysis, rare!

Page 17: Service and Support for Science IT-Peter Kunzst, University of Zurich

Datacenter Consolidation

OCI – S3IT

ZMB

BIOC

MATH

PHYS

IMLS / Neuro

Consolidate into

Central Datacenter

Aim: Scale and Secure!

NEW

Page 18: Service and Support for Science IT-Peter Kunzst, University of Zurich

UZH ScienceCloud Implementation

• OpenStack – based on Canonical• Deployment using Ansible• Vagrant-like system for configuration:

Elasticluster (developed at UZH)• Flexible submission and workflow framework

for job control: GC3pie (developed at UZH)• Database management framework openBIS

for data lifecycle management (developed at ETH/SystemsX.ch)

Page 19: Service and Support for Science IT-Peter Kunzst, University of Zurich

Business Model

• Supercomputing– Investment every 4 years into the system– Research groups to find 3rd party funding

• Commodity Cloud and Storage– Subscription / year : Cores, TB– Per use fee– Subsidized, not TCO – covering operations

• Servers / Pets– Yearly or monthly fee– Size matters

• Yearly acquisition / rollover– Easy to plan

Page 20: Service and Support for Science IT-Peter Kunzst, University of Zurich

Experience so far:

• Supercomputing needed only by few groups– Can be completely outsourced to national center, done as of 2015

• Cloud is suitable for most Science Workloads– User support scales well– Can cover very many use cases– Build dedicated boxes for exceptions, don‘t be driven by them– Flexibility is key

• Must use local infrastructure for secure, data intensive and memory intensive workloads– Data locality needed for COST and (rarely) policy reasons – exception:

medical data– Hybrid cloud – burst available for CPU intensive jobs– Deal with heterogeneity

Page 21: Service and Support for Science IT-Peter Kunzst, University of Zurich

Future Cloud Strategy: HYBRID

• Run sizeable local cloud infrastructure for internal workloads

• Burst peak loads to public cloud providers– For selected workloads coherent with policy and cost

Advantages• Plannable local infrastructure (plan for full usage)• Flexibility in scaling, quick provisioning of needed

capacity

Page 22: Service and Support for Science IT-Peter Kunzst, University of Zurich

Open Questions

• Policies. What workloads can be burst to public clouds? Under what conditions– Calculations, simulations usually OK– Data analyis: depends on data (network issues being resolved)– Check compliance of cloud providers. ISO, HIPAA, etc– Adherence to swiss cantonal data protection regulations

• Cost. How to buy public cloud services? – Public procurement of agreements? – How not to be bound to a single provider? – Is this necessary at all?

• How do i charge my users?– For internal and for external use?– Aim: consolidate their workload into our cloud. No TCO!

Page 23: Service and Support for Science IT-Peter Kunzst, University of Zurich

Comments on Security in academia

• Users in academia are super smart. They remove barriers faster than you can erect them.

• Do risk assessment and risk analysis instead of prevention.• Don‘t do anything ‚for security reasons‘, always qualify

with real risk numbers• Public Clouds are MUCH MORE secure than our own

– Amazon, Microsoft, IBM etc have whole teams of security experts – they hired our best students for this

• It is a question of TRUST– Regulations by countries– Do we trust the US not to do industrial and academic

espionage, forcing their own companies to give out our data?

Page 24: Service and Support for Science IT-Peter Kunzst, University of Zurich

Scientific Requirements

• Know your workload: Data, Privacy, Science, Sharing aspects are tightly connected

• Lots of hidden complexity and contradicting requirements

29

Page 25: Service and Support for Science IT-Peter Kunzst, University of Zurich

1. What Data?

• Different kinds of ‚BIG‘ data• Volume, Variety, Velocity, Veracity• Understanding is Knowledge is Science

– Data vs. Information and Knowledge – What are the right questions?– What should be protected, till when?– How to navigate, explore, evolve

30

WHO OWNS THE DATA?For science, proprietary data is a hindrance

Page 26: Service and Support for Science IT-Peter Kunzst, University of Zurich

2. Data Reuse

• Currently a wealth of data is not reused for new discovery

• Lots of potential! Regulators need to be told..

• Data repositories with computing and search capability – perfect for Cloud Model

• Do the computation where the data is – Private, public, hybrid Cloud

31

IP on TOOLS, ease of data USE, not DATA itself.

Page 27: Service and Support for Science IT-Peter Kunzst, University of Zurich

3. Motivate to annotate

• Scientists publish what is necessary and prescribed by the journals, not more –mandate better annotation

• Provide more recognition for producing ´good´ datasets – Data Citation

• Check Data quality – bad quality ordata without annotation has no value

32

Creation of well annotated, sustained public resources

Page 28: Service and Support for Science IT-Peter Kunzst, University of Zurich

4. Standard Formats

• Too many ‚Standards‘ or not used– Instrument vendors often at fault

• Protection of data by proprietary formats– Data is lost to research

• Do not pay for data in nonstandardformats– Data value is zero if unusable

33

Mandate standard formats for domain data

Page 29: Service and Support for Science IT-Peter Kunzst, University of Zurich

5. Data Sharing/Publishing

• Share in collaborative mode• Avoid Data Loss • Motivate and enable data publication• Establish business model for data publication

(reward/career benefit)• Journals adapt, see Scientific Data

http://www.nature.com/scientificdata

New role for Archives and Libraries

Page 30: Service and Support for Science IT-Peter Kunzst, University of Zurich

6. Patient Data Records

• Legal issues of data privacy• People are not in control of their own data• Difficult to get consent• NSA effect – trust

Put citizens back in control

Page 31: Service and Support for Science IT-Peter Kunzst, University of Zurich

Patient Data Records

• TRUST– Swiss Cooperative: citizen owned

• NEUTRALITY– A simple e-Banking system for any personal health data. Same level of

security• TRACTION

– Volume: it is free, it‘s rewarded• IMPACT

– Request data directly, avoid legal issues

36

Page 32: Service and Support for Science IT-Peter Kunzst, University of Zurich

• It is a cooperative, not a business• Funding by running campaigns to ask people to

participate in research & surveys• Participants are REWARED for sharing their data

or providing new data• Build tools on top

• Currently seeking funding– H2020, foundations– Projects with hospitals, clinics 37

Page 33: Service and Support for Science IT-Peter Kunzst, University of Zurich

Approach at S3IT

• Early involvement with Research Groups– Proposal writing, partnership– Advice on Data Management, infrastructure, standards

• Strong cooperation with Libraries– Early involvement with publishers, archives– Joint information to research groups on data management

plans, data citations• Seeking contact with funding bodies and decision makers

– Communicate business plan for Science IT ‚project consumables‘

– Evaluation of projects based on technology cost and feasibility– Usage of public and each others‘ cloud resources for cash

Page 34: Service and Support for Science IT-Peter Kunzst, University of Zurich

Links

• www.s3it.uzh.ch - Science IT at UZH• www.sybit.net - Systems Biology IT, SystemsX.ch• www.erasysapp.eu - Systems Biology, DMMCore

project• www.healthbank.ch - Public Cooperative being

set up for patient-owned data. Seeking funding (H2020, pending, and other sources)