off the grid

67
Off the Grid QJUG February 2007 Nick Partridge Veitch Lister Consulting Tom Adams Workingmouse Introduction to Grid Computing with GridGain

Upload: tom-adams

Post on 11-May-2015

2.650 views

Category:

Technology


4 download

DESCRIPTION

Grid computing is a form of distributed computing that is increasing in popularity in fields that have high computation and/or data storage requirements. In the presentation we give an overview of grid computing, describe our experiences using grid tools on a real project and develop a working grid across a cluster of two nodes using GridGain, an open source grid toolkit.

TRANSCRIPT

Page 1: Off the Grid

Off the Grid

QJUGFebruary 2007

Nick PartridgeVeitch Lister Consulting

Tom AdamsWorkingmouse

Introduction to Grid Computing with GridGain

Page 2: Off the Grid

Why are we here?

Page 3: Off the Grid

Large distributed application

Page 4: Off the Grid

Grid-based solution worked

Page 5: Off the Grid

Flow

Page 6: Off the Grid

Grid?•Multiple independent computing clusters which act like a

"grid" (Wikipedia)

•Many nodes, each node is indistinguishable from other nodes

•Complete machines over co-located CPUs?

•Multiple processes?

•Commodity hardware?

•Homogenous machines?

Page 7: Off the Grid

A tale of two grids

Page 8: Off the Grid

Partition data across grid

Page 9: Off the Grid

Partition processing across grid

Page 10: Off the Grid

http://www.jroller.com/nivanov/entry/grid_computing_compute_grid_data

Page 11: Off the Grid

Selection

Page 12: Off the Grid

Requirements•Callable from a Rails webapp

•Real-time - synchronous responses less than 30 seconds

•Large dataset - 100 GB (computation runs across all data)

Page 13: Off the Grid

Rails webapp•Simple document-literal web service

•Ruby - soap4r

•Java - GlassFish, Spring-WS

•Not really interesting for this talk... see Brisbane.rb

Page 14: Off the Grid

Data•Read-only

•Full control

•45 TB (became 100 GB with pre-processing)

•SQL? 3 tables, one query w/ 2 joins

Page 15: Off the Grid

Don’t want to roll our own

Page 16: Off the Grid

(Row) database good enough

Page 17: Off the Grid

And we can federate them

Page 18: Off the Grid

Result?

Page 20: Off the Grid

What about BigTable?

Page 21: Off the Grid

Column database

Page 22: Off the Grid

Result?

Page 24: Off the Grid

Where are we?

Page 25: Off the Grid

Progress•Don’t need to distribute data ⇒ no data grid

•No off the shelf solutions that scale/go fast

•Understand data better ⇒ happy to roll our own as fallback

Page 26: Off the Grid

Data solution

Page 27: Off the Grid

Data•CSV files on filesystem (now binary)

•Directories form indices

•Data files broken up into chunks

Page 28: Off the Grid

What about the code?http://giapet.net/wp-content/uploads/2007/05/luluwtf.gif

Page 29: Off the Grid

Need to distribute the computation

Page 30: Off the Grid

Options?

Page 31: Off the Grid

Erlang

Page 32: Off the Grid

Scala

Page 33: Off the Grid

Java

Page 35: Off the Grid

GridGain

Page 36: Off the Grid

GridGain•“fully open source full-stack grid computing platform for Java”

•Map/reduce-based computation

•Easy to setup and use

•Can be extended via SPI implementations

•Just works

•“Scalable” (we’ve had it up to 32 nodes)

Page 37: Off the Grid

Map/reduce

Page 38: Off the Grid

When does it work•When data is independent (pure/referentially transparent)

•When data can be combined (reduce) based solely on input

Page 39: Off the Grid

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

map

foo barbarbazquuxbarbazbar

foo:1bar:1bar:1baz:1quux:1bar:1baz:1bar:1

reducesplit

Page 40: Off the Grid

GridGain grid

Page 41: Off the Grid

Grid

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

Page 42: Off the Grid

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

?

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

Page 43: Off the Grid

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

Page 44: Off the Grid

foo barbar baz

MasterNode

Node

foo bar

Node

quux bar

bar: 2baz: 1quux: 1

MasterNode

quux barbaz bar

bar baz

foo: 1bar: 2baz: 1

baz bar

Page 45: Off the Grid

Did you say map/reduce?

Page 46: Off the Grid

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

reduce

map map

Page 47: Off the Grid

Show me the types!

Page 48: Off the Grid

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

map[A, B](List[A], A → B) → List[B]

reduce[B, C](List[B], C, (C, B) → C) → List[C]

Page 49: Off the Grid

Terminology

Page 50: Off the Grid

Task

foo barbar bazquux barbaz bar

MasterNode

Node

Job

foo barbar baz

foo: 1bar: 2baz: 1

Node

bar: 2baz: 1quux: 1

Job

quux barbaz bar

Result

foo: 1bar: 4baz: 2quux: 1

Page 51: Off the Grid

MasterNode

Node Node

Jobbar baz

Job

foo bar

Task

foo barbar bazquux barbaz bar

Jobquux bar

Job

baz bar

Result

foo: 1bar: 4baz: 2quux: 1

Page 52: Off the Grid

Result

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobfoo bar

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

Page 53: Off the Grid

What defines a grid?

Page 54: Off the Grid

Node

Node Node

IP MCast: 228.1.2.4

Node

Node Node

IP MCast: 228.1.2.5

Page 55: Off the Grid

Failover

Page 56: Off the Grid

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobfoo bar

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

Page 57: Off the Grid

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobfoo bar

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

X

Page 58: Off the Grid

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

XJob

foo bar

Page 59: Off the Grid

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

XX

Jobbar baz

Jobfoo bar

Page 60: Off the Grid

Task execution

Page 61: Off the Grid

http://www.gridgain.com/javadoc/org/gridgain/grid/GridTask.html

Page 62: Off the Grid

GridGain demo

Page 63: Off the Grid

The good, the bad, the ugly

Page 64: Off the Grid

Just works, fast, easy, extensible, scalable

Page 65: Off the Grid

Error messages, doco, code quality, coupling, odd APIs,

management overview

Page 66: Off the Grid

Nomenclature, JMS?

Page 67: Off the Grid

References•http://wiki.workingmouse.com/

•http://www.gridgain.com/

•http://labs.google.com/papers/mapreduce.html