mbrace: cloud computing with f#
Post on 24-May-2015
1.488 Views
Preview:
DESCRIPTION
TRANSCRIPT
Cloud Computing with F#
Athens based ISV company
Specialize in the .NET framework and C#/F#
Various business fields◦ Business process management
◦ GIS
◦ Application framework development
R&D Development◦ OR Mappers
◦ MBrace and related frameworks
◦ Open Source development
About Nessos IT
What is MBrace?
A Programming Model.◦ Leverages the power of the F# language.
◦ Inspired by F#’s asynchronous workflows.
◦ Declarative, compositional, higher-order.
A Cluster Infrastructure.◦ Based on the .NET framework.
◦ Elastic, fault tolerant, multitasking.
Hello World
The MBrace Programming Model
val hello : Cloud<unit>
let hello = cloud { printfn "hello, world!"
return ()}
MBrace.CreateProcess <@ hello @>
Sequential Composition
The MBrace Programming Model
let first = cloud { return 15 }let second = cloud { return 27 }
cloud {let! x = firstlet! y = secondreturn x + y
}
Example : Sequential fold
The MBrace Programming Model
val foldl : ('S -> 'T -> Cloud<'S>) ->
'S -> 'T list -> Cloud<'S>
let rec foldl f s ts = cloud {match ts with| [] -> return s| t :: ts' ->
let! s' = f s treturn! foldl f s' ts'
}
Parallel Composition
The MBrace Programming Model
val (<||>) : Cloud<'T> -> Cloud<'S> -> Cloud<'S * 'T>
cloud {let first = cloud { return 15 }let second = cloud { return 27 }
let! x,y = first <||> second
return x + y}
Parallel Composition (Variadic)
The MBrace Programming Model
val Cloud.Parallel : Cloud<'T> [] -> Cloud<'T []>
cloud {let sqr x = cloud { return x * x }let jobs = Array.map sqr [|1 .. 100|]
let! sqrs = Cloud.Parallel jobs
return Array.sum sqrs}
Non-Deterministic Parallelism
The MBrace Programming Model
val Cloud.Choice : Cloud<'T option> [] -> Cloud<'T option>
let tryPick (f : 'T -> Cloud<'S option>) (ts : 'T []) = cloud {
let jobs = Array.map f tsreturn! Cloud.Choice jobs
}
Exception handling
The MBrace Programming Model
let first = cloud { return 17 }let second = cloud { return 25 / 0 }
cloud {try
let! x,y = first <||> secondreturn x + y
with :? DivideByZeroException ->return -1
}
Example: Map-Reduce
The MBrace Programming Model
let mapReduce (mapF : 'T -> ICloud<'S>)(reduceF : 'S -> 'S -> ICloud<'S>)(identity : 'S) (inputs : 'T list) =
let rec aux inputs = cloud {match inputs with| [] -> return identity| [t] -> return! mapF t| _ ->
let left,right = List.split inputslet! s1, s2 = aux left <||> aux rightreturn! reduceF s1 s2
}
aux inputs
Demo 1
About that MapReduce workflow…
About that MapReduce workflow…
Communication Overhead.◦ Data captured in cloud workflow closures.
◦ Needlessly passed between worker machines.
Granularity issues.◦ Each input entails a scheduling decision by the cluster.
◦ Cluster size not taken into consideration.
◦ Multicore capacity of worker nodes ignored.
The Cloud Ref
Distributed Data in MBrace
let createRef (data : string list) = cloud {let! cref = CloudRef.New data
return cref : CloudRef<string list>}
let deRef (cref : CloudRef<string list>) = cloud {
return cref.Value}
The Cloud Ref
Distributed Data in MBrace
Simplest data primitive in MBrace.
References a value stored in the cluster.
Conceptually similar to ML ref types.
Immutable by design.
Values cached in worker nodes for performance.
Disposable types
Distributed Data in MBrace
cloud {use! data = CloudRef.New [| 1 .. 1000000 |]
let! x,y = doSomething data <||> doSomethingElse data
return x + y}
Demo 2
Performance
We tested MBrace against Hadoop.
Tests were staged on Windows Azure.
Clusters of 4, 8, 16 and 32 Large Azure instances.
Two algorithms were tested, grep and k-means.
Source code available on github.
Distributed grep
Performance
Find occurrences of given pattern in text files.
Straightforward Map-Reduce algorithm.
Input data was 32, 64, 128 and 256 GB of text.
Distributed grep
Performance
Find occurrences of given pattern in text files.
Straightforward Map-Reduce algorithm.
Input data was 32, 64, 128 and 256 GB of text.
Distributed grep
Performance
K-means
Performance
Centroid computation out of a set of vectors.
Iterative algorithm.
Not naturally describable in Map-Reduce workflows.
Hadoop implementation using Apache Mahout.
Input was 106, randomly generated 100-dimensional points.
K-means
Performance
Future
Better C# support.◦ LinqOptimizer, LinqOptimizer.GPU and CloudLINQ.
◦ Support for the upcoming C# interactive.
Open Source.◦ FsPickler, Thespian, CloudLINQ, etc.
components of MBrace already published.
Mono/Linux support.
http://github.com/nessos
Find more at
http://www.m-brace.net
top related