open science data cloud (june 21, 2010)

Post on 24-May-2015

1.748 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a talk that I gave at the ScienceCloud 2010 Workshop in Chicago on June 21, 2010.

TRANSCRIPT

Open Science Data Cloud

Robert GrossmanOpen Cloud Consortium

Today is a good day to get involved with the Open Science Data Cloud.

3

Astronomical dataBiological data (Bionimbus)

Networking dataImage processing for disaster relief

Part 1: Basic Facts About the OSDC

Who are we?

• 501(3)(c) Not-for-profit corporation• Supports the development of standards,

interoperability frameworks, and reference implementations.

• Manages testbeds: Open Cloud Testbed and Intercloud Testbed.

• Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.

• Develops benchmarks.

5www.opencloudconsortium.org

OCC Members

• Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo

• Universities: CalIT2, Johns Hopkins, MIT Lincoln Lab, Northwestern Univ., University of Illinois at Chicago, University of Chicago

• Government agencies: NASA• Open Source Projects: Sector Project

6

Operates Clouds

• 500 nodes• 3000 cores• 1.5+ PB• Four data centers• 10 Gbps• Target to refresh 1/3

each year.

• Open Cloud Testbed• Open Science Data Cloud• Intercloud Testbed• Cloud-based Disaster

Relief Services

Open Cloud Consortium Perspective

• Vendor neutral• Open, interoperable

architecture• Experiment at scale• Operate infrastructure at the

scale of a small data center• Long term point of view

(think like a library not cloud service provider)

• Think public, private & hybrid clouds

What Are the Projects?

Project 1: Bionimubs

10www.cistrack.org

Project 2: Bulk Download of the SDSSSource Destination LLPR* Link BandwidthChicago Greenbelt 0.98 1 Gb/s 615 Mb/sChicago Austin 0.83 10 Gb/s 8000 Mb/s

11

•LLPR = local / long distance performance • Sector LLPR varies between 0.61 and 0.98

Recent Sloan Digital Sky Survey (SDSS) data release is 14 TB in size.

Project 3: Image Processing in the Cloud

Mapper Input Key: Bounding Box

Mapper Input Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper resizes and/or cuts up the originalimage into pieces to output Bounding Boxes

(minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5)

Step 1: Input to Mapper

Step 2: Processing in Mapper Step 3: Mapper Output

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

Mapper Output Key: Bounding BoxMapper Output Value:

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

+ Timestamp

Project 4: Anomalies in Network Data

13

What is the OSDC?

Hosted, managed, distributed facility to:• Manage & archive your medium and large datasets• Provide computational resources to analyze it• Provide networking to share it with your colleagues

and the public.

Long Time Goal

Build a (small) data center for science.

And preserve your data the same way that libraries preserve books &

museums preserve art.

Why do it?

Work on Stuff That MattersTim O’Reilly, Jan 11, 2009

1. Work on something that matters to you more than money [and, presumably, papers].

2. Create more value than you capture.3. Take the long view.

What is similar?

Internet Archive

Wayback Machine

Part 2:Why Another Cloud Project?

Small Medium to Large Very Large

Data Size

Low

Med

Wide

Variety of analysis

No infrastructure Dedicated infrastructureGeneral infrastructure

Scientist with laptop

Open Science Data Cloud

High energy physics, astronomy

Single workstations

Small to medium clusters

HPC

Cycles

Small

Med

Large

Persistent data

data clouds

Large & spec. clusters

databases

Who do you most trust to manage your data for 100 years?

Companies may not be here tomorrow.

Think of a not for profit with that mission.

Government agencies have a role, but not always easy to use.

Part 3:Technical Approach

Condominium Clouds• In a condominium cloud, you buy your own rack

or bunch of racks.• The racks are managed and operated by the

condominium association, in this case the OCC.• If your rack is 120 TB, you get the rights to c. 40

TB of storage in the cloud. The rest is a shared resource.

• The Open Cloud Testbed is a condo cloud managed by the OCC.

28

Raywulf rack

Condo Clouds

Open source software stack: Hadoop, Sector, Eucalyptus, Nova, NoSQL DBs,

Data Migration

• Challenge: data migration.• Solution: use Hadoop style replication.

Operating ModelYear New

RacksTotal Racks

New Cap Total Cap

Net New Cap

1 10 10 1.28 1.28 02 10 20 1.92 3.20 1.923 10 30 2.88 6.08 2.884 10 30 4.32 9.12 3.045 10 30 6.48 13.68 4.566 10 30 9.72 20.52 6.85

Operating model requires constant cap ex investment each year, for example 10 racks or $1M. (Cap in PB)

Retiring Equipment

• Challenge: Adding & removing racks.• Solution: Support virtual networks, virtual

data centers, etc.

We Have Several Ways of Defining Virtual Networks….

VN-Link

VLAN

VPNs

BGP

MPLSOpenFlow

Open vSwitch

vSwitchCloudSwitch

But No Vendor Neutral VN Standard That

• That scales to 100,000+ VMs • Supported by multiple vendors• Spans multiple physical switches• Supports VN Mobility• Provides strong isolation of VN• Is easy for VMs to join and leave VNs• Includes management interfaces ….

OCC has a working group working on VN standards

Bridging the Gaps…A Small Step

Infrastructure as a Service– Virtual Data Centers (VDC)– Virtual Networks (VN)– Virtual Machines (VM)– Physical Resources

Platform as a Service– Cloud Compute Services– Data as a Service

Open Virtualization Format (OVF)

Open Cloud Computing Interface (OCCI)

SNIA Cloud Data Management Interface (CDMI)

Large Data Cloud Interoperability Framework

Metadata service linking IaaS and DaaS

Metadata service naming and linking entities in the IaaS layers

One Day We Hope to Peer

Open Science Data Cloud

More Challenges: Finding a Business Model That Works Long Term

• Challenge: raising constant amount of funding each year.

• To date: talking to foundations.

Thank You

• For more information:– www.opencloudconsortium.org– rgrossman.com (for research papers, etc.)

top related