mbrace: cloud computing with f#

Post on 24-May-2015

1.488 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

MBrace is a programming model and cluster infrastructure for effectively defining and executing large scale computation in the cloud. Based on the .NET framework, it builds upon and extends F# asynchronous workflows. https://skillsmatter.com/skillscasts/5157-mbrace-large-scale-distributed-computation-with-f

TRANSCRIPT

Cloud Computing with F#

Athens based ISV company

Specialize in the .NET framework and C#/F#

Various business fields◦ Business process management

◦ GIS

◦ Application framework development

R&D Development◦ OR Mappers

◦ MBrace and related frameworks

◦ Open Source development

About Nessos IT

What is MBrace?

A Programming Model.◦ Leverages the power of the F# language.

◦ Inspired by F#’s asynchronous workflows.

◦ Declarative, compositional, higher-order.

A Cluster Infrastructure.◦ Based on the .NET framework.

◦ Elastic, fault tolerant, multitasking.

Hello World

The MBrace Programming Model

val hello : Cloud<unit>

let hello = cloud { printfn "hello, world!"

return ()}

MBrace.CreateProcess <@ hello @>

Sequential Composition

The MBrace Programming Model

let first = cloud { return 15 }let second = cloud { return 27 }

cloud {let! x = firstlet! y = secondreturn x + y

}

Example : Sequential fold

The MBrace Programming Model

val foldl : ('S -> 'T -> Cloud<'S>) ->

'S -> 'T list -> Cloud<'S>

let rec foldl f s ts = cloud {match ts with| [] -> return s| t :: ts' ->

let! s' = f s treturn! foldl f s' ts'

}

Parallel Composition

The MBrace Programming Model

val (<||>) : Cloud<'T> -> Cloud<'S> -> Cloud<'S * 'T>

cloud {let first = cloud { return 15 }let second = cloud { return 27 }

let! x,y = first <||> second

return x + y}

Parallel Composition (Variadic)

The MBrace Programming Model

val Cloud.Parallel : Cloud<'T> [] -> Cloud<'T []>

cloud {let sqr x = cloud { return x * x }let jobs = Array.map sqr [|1 .. 100|]

let! sqrs = Cloud.Parallel jobs

return Array.sum sqrs}

Non-Deterministic Parallelism

The MBrace Programming Model

val Cloud.Choice : Cloud<'T option> [] -> Cloud<'T option>

let tryPick (f : 'T -> Cloud<'S option>) (ts : 'T []) = cloud {

let jobs = Array.map f tsreturn! Cloud.Choice jobs

}

Exception handling

The MBrace Programming Model

let first = cloud { return 17 }let second = cloud { return 25 / 0 }

cloud {try

let! x,y = first <||> secondreturn x + y

with :? DivideByZeroException ->return -1

}

Example: Map-Reduce

The MBrace Programming Model

let mapReduce (mapF : 'T -> ICloud<'S>)(reduceF : 'S -> 'S -> ICloud<'S>)(identity : 'S) (inputs : 'T list) =

let rec aux inputs = cloud {match inputs with| [] -> return identity| [t] -> return! mapF t| _ ->

let left,right = List.split inputslet! s1, s2 = aux left <||> aux rightreturn! reduceF s1 s2

}

aux inputs

Demo 1

About that MapReduce workflow…

About that MapReduce workflow…

Communication Overhead.◦ Data captured in cloud workflow closures.

◦ Needlessly passed between worker machines.

Granularity issues.◦ Each input entails a scheduling decision by the cluster.

◦ Cluster size not taken into consideration.

◦ Multicore capacity of worker nodes ignored.

The Cloud Ref

Distributed Data in MBrace

let createRef (data : string list) = cloud {let! cref = CloudRef.New data

return cref : CloudRef<string list>}

let deRef (cref : CloudRef<string list>) = cloud {

return cref.Value}

The Cloud Ref

Distributed Data in MBrace

Simplest data primitive in MBrace.

References a value stored in the cluster.

Conceptually similar to ML ref types.

Immutable by design.

Values cached in worker nodes for performance.

Disposable types

Distributed Data in MBrace

cloud {use! data = CloudRef.New [| 1 .. 1000000 |]

let! x,y = doSomething data <||> doSomethingElse data

return x + y}

Demo 2

Performance

We tested MBrace against Hadoop.

Tests were staged on Windows Azure.

Clusters of 4, 8, 16 and 32 Large Azure instances.

Two algorithms were tested, grep and k-means.

Source code available on github.

Distributed grep

Performance

Find occurrences of given pattern in text files.

Straightforward Map-Reduce algorithm.

Input data was 32, 64, 128 and 256 GB of text.

Distributed grep

Performance

Find occurrences of given pattern in text files.

Straightforward Map-Reduce algorithm.

Input data was 32, 64, 128 and 256 GB of text.

Distributed grep

Performance

K-means

Performance

Centroid computation out of a set of vectors.

Iterative algorithm.

Not naturally describable in Map-Reduce workflows.

Hadoop implementation using Apache Mahout.

Input was 106, randomly generated 100-dimensional points.

K-means

Performance

Future

Better C# support.◦ LinqOptimizer, LinqOptimizer.GPU and CloudLINQ.

◦ Support for the upcoming C# interactive.

Open Source.◦ FsPickler, Thespian, CloudLINQ, etc.

components of MBrace already published.

Mono/Linux support.

http://github.com/nessos

Find more at

http://www.m-brace.net

top related