open science data cloud (june 21, 2010)

38
Open Science Data Cloud Robert Grossman Open Cloud Consortium

Upload: robert-grossman

Post on 24-May-2015

1.748 views

Category:

Technology


1 download

DESCRIPTION

This is a talk that I gave at the ScienceCloud 2010 Workshop in Chicago on June 21, 2010.

TRANSCRIPT

Page 1: Open Science Data Cloud (June 21, 2010)

Open Science Data Cloud

Robert GrossmanOpen Cloud Consortium

Page 2: Open Science Data Cloud (June 21, 2010)

Today is a good day to get involved with the Open Science Data Cloud.

Page 3: Open Science Data Cloud (June 21, 2010)

3

Astronomical dataBiological data (Bionimbus)

Networking dataImage processing for disaster relief

Part 1: Basic Facts About the OSDC

Page 4: Open Science Data Cloud (June 21, 2010)

Who are we?

Page 5: Open Science Data Cloud (June 21, 2010)

• 501(3)(c) Not-for-profit corporation• Supports the development of standards,

interoperability frameworks, and reference implementations.

• Manages testbeds: Open Cloud Testbed and Intercloud Testbed.

• Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.

• Develops benchmarks.

5www.opencloudconsortium.org

Page 6: Open Science Data Cloud (June 21, 2010)

OCC Members

• Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo

• Universities: CalIT2, Johns Hopkins, MIT Lincoln Lab, Northwestern Univ., University of Illinois at Chicago, University of Chicago

• Government agencies: NASA• Open Source Projects: Sector Project

6

Page 7: Open Science Data Cloud (June 21, 2010)

Operates Clouds

• 500 nodes• 3000 cores• 1.5+ PB• Four data centers• 10 Gbps• Target to refresh 1/3

each year.

• Open Cloud Testbed• Open Science Data Cloud• Intercloud Testbed• Cloud-based Disaster

Relief Services

Page 8: Open Science Data Cloud (June 21, 2010)

Open Cloud Consortium Perspective

• Vendor neutral• Open, interoperable

architecture• Experiment at scale• Operate infrastructure at the

scale of a small data center• Long term point of view

(think like a library not cloud service provider)

• Think public, private & hybrid clouds

Page 9: Open Science Data Cloud (June 21, 2010)

What Are the Projects?

Page 10: Open Science Data Cloud (June 21, 2010)

Project 1: Bionimubs

10www.cistrack.org

Page 11: Open Science Data Cloud (June 21, 2010)

Project 2: Bulk Download of the SDSSSource Destination LLPR* Link BandwidthChicago Greenbelt 0.98 1 Gb/s 615 Mb/sChicago Austin 0.83 10 Gb/s 8000 Mb/s

11

•LLPR = local / long distance performance • Sector LLPR varies between 0.61 and 0.98

Recent Sloan Digital Sky Survey (SDSS) data release is 14 TB in size.

Page 12: Open Science Data Cloud (June 21, 2010)

Project 3: Image Processing in the Cloud

Mapper Input Key: Bounding Box

Mapper Input Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper resizes and/or cuts up the originalimage into pieces to output Bounding Boxes

(minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5)

Step 1: Input to Mapper

Step 2: Processing in Mapper Step 3: Mapper Output

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

Page 13: Open Science Data Cloud (June 21, 2010)

Project 4: Anomalies in Network Data

13

Page 14: Open Science Data Cloud (June 21, 2010)

What is the OSDC?

Page 15: Open Science Data Cloud (June 21, 2010)

Hosted, managed, distributed facility to:• Manage & archive your medium and large datasets• Provide computational resources to analyze it• Provide networking to share it with your colleagues

and the public.

Page 16: Open Science Data Cloud (June 21, 2010)

Long Time Goal

Build a (small) data center for science.

Page 17: Open Science Data Cloud (June 21, 2010)

And preserve your data the same way that libraries preserve books &

museums preserve art.

Page 18: Open Science Data Cloud (June 21, 2010)

Why do it?

Page 19: Open Science Data Cloud (June 21, 2010)

Work on Stuff That MattersTim O’Reilly, Jan 11, 2009

1. Work on something that matters to you more than money [and, presumably, papers].

2. Create more value than you capture.3. Take the long view.

Page 20: Open Science Data Cloud (June 21, 2010)

What is similar?

Page 21: Open Science Data Cloud (June 21, 2010)

Internet Archive

Page 22: Open Science Data Cloud (June 21, 2010)

Wayback Machine

Page 23: Open Science Data Cloud (June 21, 2010)

Part 2:Why Another Cloud Project?

Page 24: Open Science Data Cloud (June 21, 2010)

Small Medium to Large Very Large

Data Size

Low

Med

Wide

Variety of analysis

No infrastructure Dedicated infrastructureGeneral infrastructure

Scientist with laptop

Open Science Data Cloud

High energy physics, astronomy

Page 25: Open Science Data Cloud (June 21, 2010)

Single workstations

Small to medium clusters

HPC

Cycles

Small

Med

Large

Persistent data

data clouds

Large & spec. clusters

databases

Page 26: Open Science Data Cloud (June 21, 2010)

Who do you most trust to manage your data for 100 years?

Companies may not be here tomorrow.

Think of a not for profit with that mission.

Government agencies have a role, but not always easy to use.

Page 27: Open Science Data Cloud (June 21, 2010)

Part 3:Technical Approach

Page 28: Open Science Data Cloud (June 21, 2010)

Condominium Clouds• In a condominium cloud, you buy your own rack

or bunch of racks.• The racks are managed and operated by the

condominium association, in this case the OCC.• If your rack is 120 TB, you get the rights to c. 40

TB of storage in the cloud. The rest is a shared resource.

• The Open Cloud Testbed is a condo cloud managed by the OCC.

28

Page 29: Open Science Data Cloud (June 21, 2010)

Raywulf rack

Condo Clouds

Open source software stack: Hadoop, Sector, Eucalyptus, Nova, NoSQL DBs,

Page 30: Open Science Data Cloud (June 21, 2010)

Data Migration

• Challenge: data migration.• Solution: use Hadoop style replication.

Page 31: Open Science Data Cloud (June 21, 2010)

Operating ModelYear New

RacksTotal Racks

New Cap Total Cap

Net New Cap

1 10 10 1.28 1.28 02 10 20 1.92 3.20 1.923 10 30 2.88 6.08 2.884 10 30 4.32 9.12 3.045 10 30 6.48 13.68 4.566 10 30 9.72 20.52 6.85

Operating model requires constant cap ex investment each year, for example 10 racks or $1M. (Cap in PB)

Page 32: Open Science Data Cloud (June 21, 2010)

Retiring Equipment

• Challenge: Adding & removing racks.• Solution: Support virtual networks, virtual

data centers, etc.

Page 33: Open Science Data Cloud (June 21, 2010)

We Have Several Ways of Defining Virtual Networks….

VN-Link

VLAN

VPNs

BGP

MPLSOpenFlow

Open vSwitch

vSwitchCloudSwitch

Page 34: Open Science Data Cloud (June 21, 2010)

But No Vendor Neutral VN Standard That

• That scales to 100,000+ VMs • Supported by multiple vendors• Spans multiple physical switches• Supports VN Mobility• Provides strong isolation of VN• Is easy for VMs to join and leave VNs• Includes management interfaces ….

OCC has a working group working on VN standards

Page 35: Open Science Data Cloud (June 21, 2010)

Bridging the Gaps…A Small Step

Infrastructure as a Service– Virtual Data Centers (VDC)– Virtual Networks (VN)– Virtual Machines (VM)– Physical Resources

Platform as a Service– Cloud Compute Services– Data as a Service

Open Virtualization Format (OVF)

Open Cloud Computing Interface (OCCI)

SNIA Cloud Data Management Interface (CDMI)

Large Data Cloud Interoperability Framework

Metadata service linking IaaS and DaaS

Metadata service naming and linking entities in the IaaS layers

Page 36: Open Science Data Cloud (June 21, 2010)

One Day We Hope to Peer

Open Science Data Cloud

Page 37: Open Science Data Cloud (June 21, 2010)

More Challenges: Finding a Business Model That Works Long Term

• Challenge: raising constant amount of funding each year.

• To date: talking to foundations.

Page 38: Open Science Data Cloud (June 21, 2010)

Thank You

• For more information:– www.opencloudconsortium.org– rgrossman.com (for research papers, etc.)