09:04:41 x10 tutorial notes.txt 1 · 09:04:41 x10_tutorial_notes.txt 1 x10 tutorial, 12.11.2010 at...

11/17/1009:04:41 1x10_tutorial_notes.txt

X10 Tutorial, 12.11.2010 at FAU ErlangenChristoph von Praun

0 Introduction==============

Welcome! This is a tutorial for the programming language X10 version 2.1. Let’s do a brief round of introduction.

0.1 Sources===========

Some examples and discussion in this tutorial is adopted from the following documents:

[1] Vijay Saraswat, Bard Bloom, Igor Peshansky, Olivier Tardieu, and David Grove: Report on the Programming Language X10 - Version 2.1, October 2010. http://dist.codehaus.org/x10/documentation/languagespec/x10-latest.pdf

[2] Bard Bloom: X10 2.01 Tutorial, May 2010. http://docs.codehaus.org/display/XTENLANG/X10+2.0.1+Tutorial

0.2 Goals and Methodology=========================

You learn:- language mechanics on small examples programswith emphasis on constricts for parallel and distributed programming

You don’t learn:- much about the sequential core language (much adopted from Scala/Java)- how to tune/adapt/design algorithms for parallel and/or distributed processing- the art of performance engineering: locality and load balancing

Methodology of this tutorial:- start with X10 principles and sequential core (˜15%)- exercise canonical problems on small code snippets in eclipse, read/edit/write X10 code! (60%)- development of a small programming exercise (25%)

1 X10 Design Principles=======================

X10 is a "middleware" language for high-performance computing;middleware means, that the programmer has a fair amount of control ofmachine-/ respectively run time level behavior. X10 operates at level(2) and (3) in the following abstraction hierarchy:

Tiers of parallelism:


(1) automatic parallelizing compiler-----------------------------------------------------------(2) deterministic fully independent computations or serialization-----------------------------------------------------------(3) explicitly synchronized critical sections, transactions, data race free threads, actors-----------------------------------------------------------(4) low-level implementation of threads, with data race synchronization mechanisms, conditions non-blocking data structures

X10 enables seamless progression form sequential to parallel and fromparallel to distributed programs

X10 supports multi-level parallelism

X10 has a strong typing and a powerful versatile type system thatsupports and encourages error detection at compile time

X10 provides controlled runtime environment: - language/type safety - garbage collection - simplified error analysis

All together: X10 aims at a 10-times productivity increase of theprogrammer.

Caveat: X10 simplifies many fairly mundane aspects in the life of aprogrammer who would otherwise use C++ or Fortran. X10 as aprogramming language has only limited effect on the following twomajor challenges in the transition to highly efficient concurrentprograms: X10 does not solve the grand challenges access locality andload balancing. The construction of an efficient parallel programoften requires redesign of your (form sequential) algorithm and datastructures. X10 does not solve this problem for you.

Caveat: X10 cannot (at this point in the development handle programswith data races correctly. (we cannot develop highly efficientconcurrent data structures such as the Michael/Scott queue).

1.1 X10 vs Java===============

- checked exceptions+ structs + mechanisms for intra-thread concurrency are part of the language, not library async/finish/atomic+ type inference+ var/val, val is the default+ anonymous functions+ Better constructors (constant fiends are definitely initialized, this does not leak);+ Better exception model (exceptions are not dropped if they reach the


top of the call stack of an activity)

+ operator overloading

1.2 X10 vs. Scala=================

o syntax, in particular syntax for type expressions close to Scala- no closures

1.3 Significant changes from version 2.06=========================================

- ateach- foreach- future (has been replaced by library construct x10.util.Future- await ... or- Rail and ValRail (replaced by Array)o Array is only local Array+ Type DistArrayo handling / copying of objects when place shifting+ GlobalRef

2 Hello World=============

------------------------------------------------------public class HelloWorld { public static def main(Array[String]): Void { Console.OUT.println("Hello World"); }}------------------------------------------------------

Note that the type Array[String] does not specify the number ofdimensions (called rank). If you want to access the array with anordinary Int valued index then you have to specify the rank of thearray: Array[String](1) which is a short form forArray[String]{self.rank==1}

------------------------------------------------------public class HelloWorld2 { public static def main(args: Array[String](1)): Void { Console.OUT.println("Hello World - my first arg is ’" + args(0) + "’"); }}------------------------------------------------------

3 Need-to-Know of Sequential Core Language==========================================

3.1 Types and Type Inference


=============================

X10 is strongly-typed language like Java or Scala. Objects aredescribe by classes. Each class defines a type. A subtyping relationexists between two classed if the programmer specifies so (keywordsextends or implements). Unlike Scala, X10 does not support structuralsubtyping.

3.1.1 Type Any [LangSpec 9.3]==============================

In X10, all objects and structs inherit implicitly from the interfacetype x10.lang Any; Even functions are treated like objects of typeAny. Classes inherit also from class x10.lang.Object. Any requiresthe following methods that can be overridden: public def equals(Any):Boolean; public def hashCode():Int; public def typeName():String; public def toString():String;

If the programmer does not explicitly implement those methods, theyare either inherited from class x10.lang.Object, or defaultimplementations are generated.

3.1.2 Values and Variables==========================

The syntax for the declaration of types is adopted from Scala. X10distinguished between values (constant) and variables. Values have tobe initialized. The following program initializes the value of sumwith 34.

------------------------------------------------------val sum: Int = 13 + 21;------------------------------------------------------

It is not necessary to tell the compiler about the type, since it can infer the type from the arguments. Hence the following program is equivalent to the above.

------------------------------------------------------val sum = 13 + 21;------------------------------------------------------

If you still would like to specify the type as a hint ordocumentation, always do so using <: as follows:

------------------------------------------------------public class Types1 { // function must be called with Int that have value 123 static def id(i: Int) = i; public static def main(Array[String]{self.rank==1}): Void {


val x1 = 42; val x2 : Int = 42; val x3 : Any = 42; val x4 <: Int = 42; val x5 <: Any = 42;

id(x1); // OK id(x2); // OK id(x3); // compile-time error id(x4); // OK id(x5); // OK }}------------------------------------------------------

The symbol <: allows the compiler to internally infer a possiblystronger than specified type and use that ’internal’ type informationwhen validating further uses of the data. In the above code, thesymbol reads as follows: "sum3 is of type any or a subtype of it".

The declaration of variables is slightly different. First,initialization of vars is not mandatory. The specification of thetype is mandatory and you have to specify the precise type using theoperator ’:’.

------------------------------------------------------public class Types2 { // function must be called with Int that have value 123 static def id(i: Int) = i; public static def main(Array[String]): Void { var sum1 = 13 + 21; // compile-time error var sum2 <: Int = 13 + 21; // compile-time error var sum3: Int = 13 + 21; // ok }}------------------------------------------------------

Note that in some situation, e.g. in the declaration of methodarguments, the keyword val or var can be omitted. In this case, valis the default. Hence The declaration

public static def main(Array[String]): Void { ...}

means

public static def main(val args: Array[String]): Void { args([1]) = "Hello"; // OK args = new Array[String](0..1); // error}

An assignment to variable args would be flagged as a compile-time error. An assignment to one of the array variables is OK.


3.1.3 Structs and Classes [LangSpec 8, 9]==========================================

X10 supports a light-weight / efficient form of classes calledstructs. All fields of a struct must be values - novariables. Structs support inheritance and in particular are subtypesof type Any. Instances of a struct are called ’value objects’ orshort ’values’.

Primitive types (e.g. types int, double, float) are implemented asstructs in X10 and can be used like other value objects. Although X10supports ’int’ and ’double’, these names are aliases of the structs’Int’ and ’Double’ and you should generally use the latter whenwriting your X10 program.

3.2 Guards and Dependent Types [LangSpec 4.4 and 8,4]=======================================================

The X10 Compiler knows more about types than conventional compilers do. For example, the values of int variables are ’tracked’.

---------------------------------------------------------public class ConstrainedType1 { // function must be called with Int that have value 123 static def id(i: Int{self == 123 }) = i; public static def main(Array[String]): Void { val x1 = 123; val x2 = 42; val x3 : Int = 123; val x4 : Int = 42; val x5 <: Int = 123; val x6 <: Int = 42; id(x1); // static check OK id(x2); // compile-time error id(x3); // dynamic check (error with option -STATIC_CALLS) id(x4); // run-time error (ClassCastException: x10.lang.Int{self==123}) id(x5); // static check OK id(x6); // compile-time error }}---------------------------------------------------------

The specification Int{self==123} narrows the Type Int. The suffix"{self==123}" is called a Type constraint. Int{self==123} specifiesthe subtype of Int objects that that have the value 123. ’self’ refersto the objects described by the current type. In {<constraint>},<constraint> refers to a boolean expression over constants in theenvironment and the constant fields (called properties) of objectsreferred to by the type.

The above example uses another constrained type, namelyArray[String]{self.rank==1}. Property rank specifies the number of


dimensions in a multi-dimensional array. The field is constant,i.e. the value is determined and set when the array is allocated.

The extended type check of the compiler enforces constraints asfollows: If a violation is certain, the compiler issues a staticerror. If a violation is possible, the compiler inserts a runtimecheck. If the a violation is impossible, the runtime check is omitted.

The above method could be also defined as follows: ---------------------------------------------------------public class MethodGuard1 { // function must be called with Int that have value 123 static def id(i: Int}{i == 123 } = i;...---------------------------------------------------------

This time, the constraint is associated with the method not with atype. Constraints associated with methods are called ’guards’ and canbe understood as preconditions that have to hold when he method startsexecution.

Current limitation: The static verification of constraints and thegeneration of dynamic checks is currently only available for equalityconstraints. The following program compiles and executes withouterror.

---------------------------------------------------------public class ConstrainedType2 { static def id(i: Int{self >= 123 }) = i; public static def main(Array[String]): Void { val x : Int = 42; id(x); Console.OUT.println("Hello"); }}---------------------------------------------------------

Another example:

In the following program, the array access fails. The compiler reportsthat the the accessor method of the array (Array::apply(Int)) is notavailable; in fact this accessor method is only available for arrayswith rank ==1. The compiler cannot establish this condition at theaccess site.

---------------------------------------------------------public class ConstrainedType3 { public static def main(args: Array[String]): Void { Console.OUT.println("Hello" + args(0)); }}---------------------------------------------------------

To aid the compiler in proving that the method guard is established atthe access site add a type constraint for the argument of the main


method:

---------------------------------------------------------public class ConstrainedType4 { public static def main(args: Array[String]{self.rank==1}): Void { Console.OUT.println("Hello" + args(0)); }}---------------------------------------------------------

The following program computes the dot product of two vectors. Notethat the method is available only at call sites, where the constraintanalysis can guarantee that the regions of both arguments are thesame. Preview: The region of an array defines the number of dimensionsand their respective sizes. The property ’region’ of the array is oftype Region and is defined when the array is created.

---------------------------------------------------------public class MethodGuard { public static def main(Array[String]) { val v = new Array[Double](0..3, ([i]:Point(1)) => 6.0); x10.io.Console.OUT.println("v*v = " + dot(v,v)); } static def dot(v: Array[Double], w: Array[Double]) {v.region == w.region} = { var s : Double = 0.0; for(i in v.region) s += v(i) * w(i); return s; }}---------------------------------------------------------

3.3 Point / Array / Regions [LangSpec 16]===========================

3.3.1 Points [LangSpec 16.1]=============================

Objects of type Point represent points in the cartesian space. Anobject of type Point has a property called rank that specifies thenumber of dimensions. The type Point stands for all objects that areinstances of Point, regardless their rank.

There is a short syntax for type constraints that refer to propertiesof a class. For example, The type Point{rank==1} stand for1-dimensional points. Point(1) is a short syntax for Point(rank==1).

X10 supports special syntax for Point literals. For example theexpression [17,13,42] stands for a 3 dimensional Point object withcoordinates 17, 13, and 42 (type Point(3)).

API Doc: http://dist.codehaus.org/x10/xdoc/2.1.0/x10/array/Point.html


3.3.2 Region [LangSpec 16.2]============================

A region is a set of point. Typically a rectangular region in thecartesian space.

X10 supports special syntax for Region literals. E.g. 0...123 standsfor a 1 dimensional region that contains the points {[0], ..., [123]};0..2*0..1 stands for the cartesian product and contains the points{[0,0], [0,1], [1,0],..., [2,1]}.

There are various constructors and operations on regions that you findin the API documentation athttp://dist.codehaus.org/x10/xdoc/2.1.0/x10/array/Region.html.

Regions implement the type Iterable<Point>, which means that you canenumerate the points of a region in their lexical order using a forstatement with the following convenient syntax>

---------------------------------------------------------public class PointIteration1 { public static def main(Array[String]): Void { val reg = (0..2)*(0..1); for (p in reg) Console.OUT.print(p); Console.OUT.println(); val reg2 = (17..5); // nothing is printed here - region is empty for (p in reg2) Console.OUT.print(p); }}// prints// [0,0][0,1][1,0][1,1][2,0][2,1]---------------------------------------------------------

Variable p has type Point and takes on subsequent values of Points inthe region set. A special syntax can be used to extract Int coordinatevalues from the Point as follows:

---------------------------------------------------------public class PointIteration2 { public static def main(Array[String]): Void { val reg = (0..2)*(0..1); for ([x,y] in reg) { Console.OUT.print(" "); Console.OUT.print(x); Console.OUT.print(","); Console.OUT.print(y); Console.OUT.print(" "); }


}}// prints: 0,0 0,1 1,0 1,1 2,0 2,1 ---------------------------------------------------------

3.3.3 Arrays============

Array[T] is the type of an array object with elements of type T. Allelements of the array are allocated in the same place. The type doesnot specify the dimensions of the array, called the rank. rank is aproperty of an array and hence can be used to constrain the type:Array[T]{rank==1} or shorter Array[T](1) is a 1 dimensional array.

An array is allocated like other objects using the ’new’ operator. Aregion specifies the extension and rank of the array to be allocated.

val arr = new Array[Double](0..10); // eleven variables with defaultvalue 0.0.

Arrays are accessed with round bracket notation (ordinary call offunction apply()). The argument to the accessor is either a Point or aseries of int coordinates. For example: arr([1,1]) or arr(1,1).

Since the type for arrays does not necessarily specify the rank, wecan write rank-oblivious codes, i.e. code that operates correctly onarrays with arbitrary rank.

--------------------------------------------------public class RankOblivious { static def addInto(src: Array[Int], dest:Array[Int]) {src.region == dest.region} = { for (p in src.region) dest(p) += src(p);} static def printArray(arr: Array[Int]) { for (p in arr.region) { Console.OUT.print(p); Console.OUT.print("="); Console.OUT.print(arr(p)); Console.OUT.print(", "); } Console.OUT.println(); } public static def main(Array[String]): Void { val reg1 = 0..3; val src1 = new Array[Int](reg1, ([x]:Point(1)) => x); val dst1 = new Array[Int](reg1); val reg2 = (0..3)*(0..3); val src2 = new Array[Int](reg2, ([x,y]:Point(2)) => x+y); val dst2 = new Array[Int](reg2); addInto(src1, dst1);


printArray(dst1); addInto(src2, dst2); printArray(dst2); } // prints: // [0]=0, [1]=1, [2]=2, [3]=3, // [0,0]=0, [0,1]=1, [0,2]=2, [0,3]=3, [1,0]=1, [1,1]=2, [1,2]=3, [1,3]=4, [2,0]=2, [2,1]=3, [2,2]=4, [2,3]=5, [3,0]=3, [3,1]=4, [3,2]=5, [3,3]=6,

}--------------------------------------------------

3.4 Function Literals (Anonymous Functions) [LangSpec 10.2]===========================================================

In the previous examples, we learned already that the initial valuesof array variables can be specified by a function that maps Pointsinto values. The function is executed after the array is allocated andbefore the array is made available to the context of arrayconstruction.

Like Scala and C#, X10 allows to treat functions as first classcitizens, i.e. like objects. That means that functions can be storedin variables, and they can be passed around the calling chain likeobjects. X210 supports syntax for specifying function literal (alsocalled anonymous functions, since they do not have a name). Thegeneral syntax for such function literals is (<args>) =><expr>. <expr> can refer to the arguments and it can also refer toconstant values defined in the environment where the function literalis defined. Unlike Scala, X10 does only support closures.

Function literals are commonly used to define operations that arerepeatedly applied on a set of values, e.g., the comparison operationin a sort function or the accumulation operator in a reduction.

--------------------------------------------------import x10.util.ArrayList;

public class FunctionLiterals { static def count[T](f: (T) => Boolean, xs: Iterable[T]): Int = { var acc: Int = 0; for (x: T in xs) if (f(x)) acc++; return acc; } static def printIterable[T](itbl: Iterable[T]) { for (i in itbl) { Console.OUT.print(i); Console.OUT.print(" "); } Console.OUT.println(); }


public static def main(Array[String]): Void { val l = new ArrayList[Int](10); for ([i] in 0..9) l.add(10-i); printIterable(l); l.sort((a1:Int, a2:Int) => (a1 == a2) ? 0 : (a1 < a2) ? -1 : 1); printIterable(l); val isEven = (i: Int) : Boolean => i%2 == 0; Console.OUT.println(count(isEven,l)); }}// prints // 10 9 8 7 6 5 4 3 2 1// 1 2 3 4 5 6 7 8 9 10 // 5--------------------------------------------------

4 Places [LangSpec 10.2]========================

An X10 place is a repository for data and activities, correspondingloosely to a process or a processor. Places induce a concept of"local".

Objects and structs (with the exception of distributed arrays) arecreated in a single place â\200\223 the place that the constructor call wasrunning in. They cannot change places.

Places are numbered 0 through Places.MAX_PLACES-1, stored in the fieldpl.id. The Sequence[Place] Place.places() contains the places of theprogram, in numeric order. The set of places in your program,Place.places, is chosen when you start your program, and does notchange. In the default configuration, places are a homogenous andflat. Configurations are conceivable, where places are arranged in alogical hierarchy and/or where places have different computationalcapabilities or even architectures.

The variable here is always bound to the current place -- that is theplace that the current line of code is executing at.

--------------------------------------------------public class PlaceIteration { public static def main(Array[String]): Void { Console.OUT.println("Here is " + here); for (p in Place.places()) { Console.OUT.print(p); Console.OUT.print(" isCuda = "); Console.OUT.println(p.isCUDA()); } }}

// prints


// Here is (Place 0)// (Place 0) isCuda = false// (Place 1) isCuda = false// (Place 2) isCuda = false// (Place 3) isCuda = false--------------------------------------------------

4.1 Computing in different places=================================

A programmer can direct the computation to continue in anotherplace. This is called "place shifting". In the following program, thecomputation of the sum takes place in different place that thecomputation of main.

--------------------------------------------------public class PlaceShifting { public static def main(Array[String]): Void { Console.OUT.println("Main starts " + here); val input = 625.0; val sqrt_625 = at (here.next()) { Console.OUT.println("Sqrt computes at " + here); return Math.sqrt(input); }; Console.OUT.println(sqrt_625); }}// prints:// Main starts (Place 0)// Sqrt computes at (Place 1)// 25.0--------------------------------------------------

Note that certain data is transferred between place #0 and place #1:the input is copied, the output is copied back. We will look at thecopying semantics of objects that are input to computations in placesother than ’her’ in the following section.

4.2 Copying Semantics ======================

at(p) S implicitly copies nearly all data that S might reference, andsends it to place p, before executing S there. The only things thatare not copied are values only reachable through GlobalRefs andtransient fields; more about these two special cases in the nextsection.

--------------------------------------------------public class ImplicitCopying1 { public static def main(Array[String]): Void { val c = new Cell[Int](123); at (here.next()) { Console.OUT.println("Cell has value " + c() + " at place #" + here.id);


c.set(42); Console.OUT.println("Cell has value " + c() + " at place #" + here.id); } Console.OUT.println("Cell has value " + c() + " at place #" + here.id); }}// prints// Cell has value 123 at place #1// Cell has value 42 at place #1// Cell has value 123 at place #0--------------------------------------------------

Note that the copying also occurs with at(here) S, even tough thecomputation of S occur in the same place as the execution of the atstatement.

The semantics of the copy is a "deep copy". Hence be careful whatobjects are used in an at statement, since there can be hidden /non-obvious performance implications. All objects that aretransitively reachable are copied, regardless of whether they areactually accessed or not. IN the following example, both, the arrayand the individual cell objects are copied to place here.next().

--------------------------------------------------public class ImplicitCopying2 { public static def main(Array[String]): Void { val arr = new Array[Cell[Int]](0..9, ([x]: Point(1)) => new Cell[Int](x)); Console.OUT.println("Cell at idx 0 has value " + arr(0)() + " at place #" + here.id); at (here.next()) { Console.OUT.println("Cell at idx 0 has value " + arr(0)() + " at place #" + here.id); arr(0).set(42); Console.OUT.println("Cell at idx 0 has value " + arr(0)() + " at place #" + here.id); } Console.OUT.println("Cell at idx 0 has value " + arr(0)() + " at place #" + here.id); }}// prints// Cell at idx 0 has value 0 at place #0// Cell at idx 0 has value 0 at place #1// Cell at idx 0 has value 42 at place #1// Cell at idx 0 has value 0 at place #0--------------------------------------------------

There are two instruments to break the chain of deep recursivecopying. First, transient fields are not copied, instead thefield/variable is initialized with the default value (e.g. 0 for Int,null for object references). Second, a reference can be ’boxed’ intoa GlobalRef object. We discuss both in the next paragraphs.

4.3 Transient Fields======================

In the following programs, copies of HugeThing objects are made atseveral occasions during program execution. Can you identify where?


Already during the copy process, the value of the transient field islost - in fact this is how a programmer can recognize that a copy hashappened.

--------------------------------------------------public class HugeThingDemo1 { static class HugeThing { transient val trans_id: Int = 12345; val ordinary_id: Int = 98765; def this() { Console.OUT.println("HugeThing trans_id=" + trans_id + " ordinary_id=" + ordinary_id + " allocated at place " + here.id); } } static def useHugeThing(ht: HugeThing) { Console.OUT.println("HugeThing trans_ids=" + ht.trans_id + " ordinary_id=" + ht.ordinary_id + " used at place " + here.id); } public static def main(Array[String]): Void { val arr = new Array[HugeThing](0..Place.MAX_PLACES-1); for (p in Place.places()) arr(p.id) = at (p) new HugeThing();

at (here.next()) { for (i in arr) { useHugeThing(arr(i)); } } }}// prints// HugeThing trans_id=12345 ordinary_id=98765 allocated at place 0// HugeThing trans_id=12345 ordinary_id=98765 allocated at place 1// HugeThing trans_id=12345 ordinary_id=98765 allocated at place 2// HugeThing trans_id=12345 ordinary_id=98765 allocated at place 3// HugeThing trans_ids=0 ordinary_id=98765 used at place 1// HugeThing trans_ids=0 ordinary_id=98765 used at place 1// HugeThing trans_ids=0 ordinary_id=98765 used at place 1// HugeThing trans_ids=0 ordinary_id=98765 used at place 1--------------------------------------------------

4.4 Global References=====================

GlobalRef looks at first like an ordinary library class but this typeis special and recognized by the compiler and runtime system. Objectsthat are boxed in a GlobaRef object are not copied to another placewhen passed into the scope of an at statement. Instead, only thereference is transmitted to the target place (i.e. the GlobalRefobject is copied). The computation can - through the GlobalRef, referback to the original place where the boxed object resides and performon that place an operation or access to the boxed object.


--------------------------------------------------public class HugeThingDemo2 { // this should not be copied static class HugeThing { def this() { Console.OUT.println("HugeThing allocated at place " + here.id); } } static def useHugeThing(ht: HugeThing) { Console.OUT.println("HugeThing used at place " + here.id); } public static def main(Array[String]): Void { val arr = new Array[GlobalRef[HugeThing]](0..Place.MAX_PLACES-1); for (p in Place.places()) arr(p.id) = at (p) GlobalRef[HugeThing](new HugeThing());

at (here.next()) { for (i in arr) { val tmp = arr(i); at (tmp.home) useHugeThing(tmp()); } } }}// prints// HugeThing allocated at place 0// HugeThing allocated at place 1// HugeThing allocated at place 2// HugeThing allocated at place 3// HugeThing used at place 0// HugeThing used at place 1// HugeThing used at place 2// HugeThing used at place 3--------------------------------------------------

5 Concurrency=============

5.1 async=========

The statement async S creates an activity in the current place thatexecutes the statement or statement block S concurrently to thecalling activity. Statement S can refer to values in the invocationcontext, not to local variables however. The following programillustrates how you can communicate values into and out of an asyncblock.

--------------------------------------------------public class AsyncVarAccess {

public static def main(Array[String]) {


val one = 1; var two: Int = 2; val three = new Cell[Int](3); val retval = new Cell[Int](0); async { Console.OUT.println(one); // OK // Console.OUT.println(two); // compile-time error Console.OUT.println(three()); // OK retval.set(4); // OK } // this happens to work on some machines - but it is broken, more on that later while (retval() == 0) ; Console.OUT.println("Main saw four"); }}// output // 1// 3// Main saw four--------------------------------------------------

When accessing shared resources such as shared variables or theconsole output stream, special care has to be taken to avoidunintended interference. We will discuss this aspect in the section onconcurrency control.

--------------------------------------------------public class AsyncWithRaces {

public static def main(Array[String]) { async Console.OUT.println("Hello, World"); async Console.OUT.println("Bore Da, Byd"); async Console.OUT.println("Bonjour, Monde"); }}// prints // HBeolnljoo,u rW,o rMlodn// Bore Da, Byd// de--------------------------------------------------

5.2 Concurrent For ==================

The following program computes prime numbers. The iterations of theloop in the main method are independent, hence can executeconcurrently w/o interference. Parallel execution of the loopiteration can be achieved by adding an async to the for loop (as donein the program).

--------------------------------------------------public class ComputePrimes { static def isPrime(n: Int): Boolean { var prime: Boolean = true;


for (var i:Int = 3; i <= Math.sqrt(n as Double); i += 2) { if (n % i == 0) { prime = false; break; } } if (( n%2 != 0 && prime && n > 2) || n == 2) return true; else return false; } static val MAX = 100l; public static def main(Array[String]) { val doSomething = () => {}; for ([i] in 0..MAX) async { if (isPrime(i)) doSomething(); } }}-------------------------------------------------- The parallelization is correct, however not efficient: The granularityof async is too fine; if the granularity is coarsened, there may be aload balancing problem.

Exercise: Think about a better strategy to parallelize this problemand implement it in X10.

One notable property o X10 programs is that the activities at anypoint in time during program execution form a logical hierarchyaccording to the lexical scoping scoping of the program text. Thishierarchy induces a tree structure that allows to relate activities as’parent’ or ’children’. The parent

5.3 Global vs. Local Termination [LangSpec 14]==============================================

X10 uses this logical tree of activities in crucial ways. First is thedistinction between local termination and global termination of astatement. The execution of a statement by an activity is said toterminate locally when the activity has finished all itscomputation. (For instance the creation of an asynchronous activityterminates locally when the activity has been created.) It is said toterminate globally when it has terminated locally and all activitiesthat it may have spawned at any place have,recursively, terminatedglobally. For example, consider:

--------------------------------------------------import x10.util.Timer;public class LocalTermination { public static def main(Array[String]) { val start = Timer.milliTime();


async { Activity.sleep(2000); } async { Activity.sleep(3500); } Console.OUT.println("Local termination after " + (Timer.milliTime() - start) + "ms"); }}// prints// Local termination after 15ms--------------------------------------------------

The primary activity spawns two child activities and then terminateslocally, very quickly. The child activities may take 500, respectively 2000 milliseconds to terminate (... and may in general spawngrandchildren). When s1(), s2(), and all their descendants terminatelocally, then the primary activity terminates globally.

5.3.1 Finish============

A finish block converts global termination to global termination.

--------------------------------------------------import x10.util.Timer;public class Finish { public static def main(Array[String]) { val start = Timer.milliTime(); finish { async { Activity.sleep(2000); async {Activity.sleep(2000); } } async { async {Activity.sleep(3500);} Activity.sleep(3500); } Console.OUT.println("Local termination after " + (Timer.milliTime() - start) + "ms"); } Console.OUT.println("Global termination after " + (Timer.milliTime() - start) + "ms"); }}// prints // Local termination after 19ms// Global termination after 4022ms--------------------------------------------------

FInish provides a mechanism for coordination across place, i.e., a finish block terminates locally only when all (transitively) spawned activities, includingthose in different places, terminated.

--------------------------------------------------import x10.util.Timer;public class FinishAcrossPlaces { public static def main(Array[String]) { val start = Timer.milliTime(); finish { async { Activity.sleep(100); at (here.next()) async {Activity.sleep(2000); } } Console.OUT.println("Local termination after " + (Timer.milliTime() - start) + "ms"); } Console.OUT.println("Global termination after " + (Timer.milliTime() - start) + "ms"); }}


// prints // Local termination after 11ms// Global termination after 2484ms--------------------------------------------------

5.3.2 Rooted Exception Model [LangSpec 14.1]============================================

The tree of activities also guides the propagation of exceptions. Ifan exception occurs during the execution of an activity and thatactivity does not handle the exception (try-catch), the the exceptionis propagated up the hierarchy of activities. If activity B throws anexception that is not caught by itself, then the activity isterminated (abnormally) and the exception can be caught at an activitythat is the root-of activity B. An activity A is a root of anactivity B if A is an ancestor of B and A is blocked at a statement(such as the finish statement Â§14.4) awaiting the termination of B(and possibly other activities)

In the following example, the root activity is also the parentactivity of the activity that throws the exception.

--------------------------------------------------public class RootedExceptionModel { public static def main(args:Array[String]) { var ctr: Int = 0; try { finish { async { throw new RuntimeException("not handled"); } for (var i: Int = 0; i < 625; i++) ctr++; } } catch (e: RuntimeException) { Console.OUT.println("Got Exception " + e); Console.OUT.println("Counter value is " + ctr); } }}// prints// Got Exception x10.lang.RuntimeException: not handled// Counter value is 625--------------------------------------------------

5.4 A Program with a Data Race===============================

Let’s turn the following program that computes a numeric integrationinto a parallel version:

--------------------------------------------------


public class NumericIntegration_Sequential { public static def main(args:Array[String]) { val NSTEPS: Long = 1000000l; val step: Double = 1.0 / NSTEPS; var sum: Double = 0.0; for (var i: Int = 0; i < NSTEPS; i++) { var x: Double = (i + 0.5) * step; sum += 4.0 / (1.0 + x * x) ; } Console.OUT.println("S = " + (sum * step)); }}// prints S = 3.1415926535897643--------------------------------------------------

The compiler would not allow to simply turn the sequential loop simplyinto a parallel loop. The reason is, that the body of the loopeffectively becomes an async statement and the variables i and sumfrom the surrounding context cannot be accessed within the async.

This is hardship for the programmer is intended: If concurrentactivities would read and write the same variables (in this case iand sum) concurrently, the program would not behave correctly.

Here is another approach: We try to circumvent the restrictions ofvariable access inside asyncs as follows: We let the compiler manageallocation and access to the iteration count (i) and we allocate sumin a Cell object.

--------------------------------------------------// this program is brokenpublic class NumericIntegration_DataRace1 { public static def main(args:Array[String]) { val NSTEPS: Int = 10000; val step: Double = 1.0 / NSTEPS; val sum = new Cell[Double](0.0); for ([i] in 0..NSTEPS-1) async { val x: Double = (i + 0.5) * step; val tmp = 4.0 / (1.0 + x * x) ; sum.set(sum + tmp); } Console.OUT.println("S = " + (sum * step)); }}// prints: S = 3.1350940423806692 (or other, behavior is non-deterministic)--------------------------------------------------

This program compiles, but the computed value is not correct. Thereare several problems: First, there is a data race on the Cellvariable since different loop iterations read and write this variable.Second, the print statement may execute before all iterations of theloop executed - and the print statement also reads variable sum. Wesill see how to use concurrency control to avoid data races and


coordinate the execution of activities among each other in the nextsection.

6 Concurrency Control=====================

6.1 fork/join style synchronization===================================

6.1.1 Iterative (finish-for-async)==================================

One simple way to coordinate activities in a fork-join style is asimple finish-for-async loop.

The following program corrects one problem of the previousprogram. The access to sum in the print statement occurs strictlyafter all activities forked at for-async terminated.

--------------------------------------------------// this program is brokenpublic class NumericIntegration_DataRace2 { public static def main(args:Array[String]) { val NSTEPS: Int = 10000; val step: Double = 1.0 / NSTEPS; val sum = new Cell[Double](0.0); finish for ([i] in 0..NSTEPS-1) async { val x: Double = (i + 0.5) * step; val tmp = 4.0 / (1.0 + x * x) ; sum.set(sum + tmp); } Console.OUT.println("S = " + (sum * step)); }}// prints: S = 3.1350940423806692 (or other, behavior is non-deterministic)--------------------------------------------------

6.1.2 Recursive (Future)========================

Another common pattern to implement (recursive) fork-join styleparallelism is through Futures. The following program illustrates thefunctionality of Future that is provided as a library constructx10.util.Future.

--------------------------------------------------import x10.util.Future;

public class Fib(n: Int) { static val CUTOFF = 5; def this (n: Int) { property(n);


} def apply(): Int { return fib(n); } def fib (i: Int): Int { if (i < CUTOFF) return seqFib(i); else { val f1 = Future.make( ()=>fib(i-1) ); val f2 = Future.make( ()=>fib(i-2) ); return f1() + f2(); // force futures } } def seqFib(i: Int): Int { if (i <= 1) return i; else return seqFib(i-1) + seqFib(i-2); }

public static def main(Array[String]) { val f13 = new Fib(13)(); Console.OUT.println(f13); }}// prints 233--------------------------------------------------

6.2 Critical Sections (atomic) [LangSpec 14.7]===============================================

An atomic S guarantees that the execution of S occurs as if theexecuting activity was the only activity in the system. S isrestricted as follows:

- S may not spawn another activity- S may not use any blocking statements; when, next, finish. (The use of a nested atomic is permitted.)- S may not force() a Future- S may not use at expressions.

Those restrictions are (currently) not enforced by the compiler - ifviolated, a x10.lang.IllegalOperationException is thrown.

Atomic is the perfect mechanism to control race conditions amongconcurrent activities. The race condition does not go away but thenumber of possible program behaviors is restricted to serializationsof the atomic blocks - which enables the programmer to reason moreintuitively about the possible behaviors.

--------------------------------------------------public class AsyncWithRacesControlledByAtomicBlocks {

public static def main(Array[String]) { async atomic Console.OUT.println("Hello, World");


async atomic Console.OUT.println("Bore Da, Byd"); async atomic Console.OUT.println("Bonjour, Monde"); }}// prints // Hello, World// Bore Da, Byd// Bonjour, Monde--------------------------------------------------

The previous NumericIntegration can now finally be corrected:

--------------------------------------------------// this program is now correctpublic class NumericIntegration_Correct { public static def main(Array[String]) { val NSTEPS: Int = 10000; val step: Double = 1.0 / NSTEPS; val sum = new Cell[Double](0.0); finish for ([i] in 0..NSTEPS-1) async { val x: Double = (i + 0.5) * step; val tmp = 4.0 / (1.0 + x * x) ; atomic sum.set(sum + tmp); } Console.OUT.println("S = " + (sum * step)); }}// prints: S = 3.141592654423134 --------------------------------------------------

Atomic is not only a prefix for a statement or statement sequence, butalso a method modifier. The following program implements a genericCAS operation:

--------------------------------------------------import x10.util.Future;public class CASTest { static val theCell = new Cell[Int](0); static def bang(times: Int): Int { var ret: Int = 0; for (var i: Int = 0; i < times; ++i) { val old= theCell(); if (CAS(old, old+1, theCell)) ret++; } return ret; // number of successful CAS } // generic CAS operation static atomic def CAS[T](old_val: T, new_val: T, target: Cell[T]): Boolean { if (target().equals(old_val)) { target.set(new_val); return true; } return false;


}

public static def main(Array[String]) { val t1 = Future.make( ()=>bang(2000)); val t2 = Future.make( ()=>bang(2000)); Console.OUT.println("# of successes for future-1: " + t1() + "/2000"); Console.OUT.println("# of successes for future-2: " + t2() + "/2000"); }}// prints // # of successes for future-1: 1974/2000// # of successes for future-2: 1970/2000--------------------------------------------------

Notice that the atomicity of an atomic block is only guaranteed in theevent of normal termination. In the event of an abrupt terminationdue to an exception, the modifications made by the atomic block beforethe occurrence of the exception remain in memory. It is theprogrammer’s responsibility to recover or undo prior updates ifnecessary.

Exercise: Augment the class GlobalRefDemo, such that every HugeThingobject obtains a global unique reference, regardless of the placewhere the object is allocated.

6.3 Conditional Critical Sections (when)========================================

Conditional atomic blocks allow the activity to wait for somecondition to be satisfied before executing an atomic block.

--------------------------------------------------import x10.util.Timer;

public class ConcurrentQueue[T] { static val QSIZE = 10; static val NITEMS = 10000; var head: Int = 0; var tail: Int = 0; val items = new Array[T](0..QSIZE-1); public def enq(val x: T) { when (tail-head != QSIZE) { items(tail % QSIZE) = x; tail++; } } public def deq(): T { var item: T; when (tail != head) { item = items(head % QSIZE); head++; }


return item; } public static def main(Array[String]) { val q = new ConcurrentQueue[Int](); val start = Timer.milliTime(); finish { async { // producer for (var i: Int = 0; i < NITEMS; ++i) q.enq(i); } async { // consumer for (var i: Int = 0; i < NITEMS; ++i) q.deq(); } } Console.OUT.println("Piped "+ NITEMS + " items through concurrent queue in " + (Timer.milliTime() - start) + "ms"); }}// prints: // Piped 10000 items through concurrent queue in 736ms--------------------------------------------------

6.4 Clocks (lockstep computations)===================================

A clock is a concurrent objects that multiple activities can use tocoordinate their progress in a parallel computation. In the simplescase, a clock object serves as a simple barrier. The behavior of thefollowing main method is the same as the version in the previousexample:

--------------------------------------------------public static def main(Array[String]) { val q = new ConcurrentQueue[Int](); val start = Timer.milliTime(); val c = Clock.make(); async clocked (c) { // producer for (var i: Int = 0; i < NITEMS; ++i) q.enq(i); next; } async clocked (c) { // consumer for (var i: Int = 0; i < NITEMS; ++i) q.deq(); next; } next; Console.OUT.println("Piped "+ NITEMS + " items through concurrent queue in " + (Timer.milliTime() - start) + "ms");}--------------------------------------------------

The operation next means that an activity is ready to cross thebarrier, hence continue its computation. It may do so, only if all


other activities registers on the barrier (clock) have reached next aswell. In the above program, the use of the clock is as follows: Themain activity creates a clock and is implicitly registered with thisclock. It creates two further concurrent activities to which itpasses the clock - those activities are registers on the clock aswell. Finally, all three activities eventually invoke the nextoperation, which advances all clocks that an activity is registeredwith. The effect here is, that The main method prints the time, onlyafter both child activities have completed their for loops.

A more sophisticated usage scenario of a clock is illustrated in thefollowing program, where two activities print ying and yang in strictalternation. The activities are said to operate in lockstep. Thecomputational patterns is commonly found in SPMD codes.

--------------------------------------------------public class ClockedComputation { public static def main(Array[String]) { val NITERS = 10; val yin = () => {atomic Console.OUT.print("yin ");}; val yang = () => {atomic Console.OUT.print("yang ");}; clocked finish{ clocked async { for (var i: Int = 0; i < NITERS; ++i) { yin(); next; ; next; } } clocked async { for (var i: Int = 0; i < NITERS; ++i) { ; next; yang(); next; } } } }}// prints:// yin yang yin yang yin yang yin yang yin yang yin yang yin yang yin yang yin yang yin yang --------------------------------------------------

Note that the use of clocks entails the possibility of deadlock.

--------------------------------------------------public class DeadClock { public static def main(Array[String]) { val NITERS = 10; val c1 = Clock.make(); val c2 = Clock.make(); async clocked (c1, c2) { c1.next();


} async clocked (c1, c2) { c2.next(); } }}--------------------------------------------------

If you follow the following simple syntactic discipline, the use ofclocks is safe, i.e. deadlock cannot occur.

- Never invoke the next() method on individual clock objects - only use the operator ’next’, which advances the phase of all clocks the current activity is registered with.- Inside of finish S, all clocked asyncs must be in the scope of an unclocked async.

A simple correctness argument for the deadlock freedom of this regimeis given in the language report in chapter 15.4.

Clocks can be used like split-phase barriers, See language report 15.1.3.

7 Distribution of Data ======================

7.1 Distributions=================

Class Dist stands short for ’distribution’. A Dist object adds to aRegion a mapping that assigns a Place to every Point in the Region.The distribution says how the variables of an array are distributedover the Places in our system, e.g. it might say that the first halfof the array is on processor 3 and the second half on processor 4.

There are several factory functions in class Dist that returndistribution instances. Here are the most common Distributions: -

- Constant Distribution: Sometimes it’s useful to put all your data in one place. R -> here is a Dist that maps every Point in R to here. R -> p maps every point to p.

- Cyclic Distribution: This puts adjacent points into different places. Dist.makeCyclic(R) puts the first Point of R on the first place, the second Point on the second place, etc. It wraps around, so if there are 4 places, the fifth Point gets back to the first Place.

- Block Distribution: This arranges the points as evenly as possible among the places. The first fraction of the points goes to the first place, the second fraction to the second, and so on. The fractions are as close as possible to the same sizeâ\200\223some might be one item bigger than others if the number of places doesn’t evenly divide the number of points, but they’ll never be more than one item off.


--------------------------------------------------public class ShowDistributions { static def show(s:String, d:Dist) { x10.io.Console.OUT.print(s + " = "); for(p:Point in d.region) x10.io.Console.OUT.print("" + d(p).id); x10.io.Console.OUT.println(""); } public static def main(Array[String]) { R : Region = 1..35; show("R->here ", R->here); show("Dist.makeConstant(here) ", Dist.makeConstant(R, here)); show("Dist.makeBlock(R) ", Dist.makeBlock(R)); }}// prints// R->here = 00000000000000000000000000000000000// Dist.makeConstant(here) = 00000000000000000000000000000000000// Dist.makeBlock(R) = 00000000011111111122222222233333333--------------------------------------------------

7.2 Distributed Arrays======================

Distributed arrays are implemented in class DistArray. Unlike allother objects that have a single home place, DistArray objects aretypically partitioned to several places.â\200¨A DistArray instance isallocated through Factory Methods in class DistArray (not throughoperator ’new’).

--------------------------------------------------public class DistArraySum1 { public static def main(Array[String]) { val R : Region= 1..8; val D = Dist.makeBlock(R); val a = DistArray.make[Int](D, ([i]:Point) => 10*i); val b = DistArray.make[Int](D, ([i]:Point) => i*i); x10.io.Console.OUT.println("str(a) = " + str(a)); x10.io.Console.OUT.println("str(b) = " + str(b)); val c = add(a,b); x10.io.Console.OUT.println("str(c) = " + str(c)); } static def str[T](a:DistArray[T]):String = { var s : String = ""; var first : Boolean = true; for(x in a) { if (first) first=false; else s += ","; s += at (a.dist(x)) a(x); }


return s; } static def add(a:DistArray[Int], b:DistArray[Int]) {a.dist == b.dist} :DistArray[Int]{self.dist == a.dist} = { val c = DistArray.make[Int](a.dist, (p:Point)=>0); for (val p in a.dist) { // could be done in parallel at(a.dist(p)) { c(p) = a(p) + b(p); } } return c; }}--------------------------------------------------

This code has very fine granular place shifing, e.g. in method str(when accessing the array a) of in method ass inside the for loop.The the result is a very inefficient program.

Exercise: Remove the place shifting operation in method str and observe what happens. [A BadPlaceException will be thrown since the activity that invokes method str tries to access an array variable that is allocated in a place different from ’here’].

7.3 Computation on distributed data (SPMD)==========================================

Referring to the previous example, it is preferable to arrange thecomputation such that processors/places can proceed mostlyindependently, i.e. without frequent place shifting orcommunication. That is, place will execute the same operation (in thiscase sum) but on different data, namely those data that are local toeach place. This idea follows the SPMD style of parallel computation.We modify the add method of the previous example to SPMD style:

Method add can be improved according to the SPMD paradigm --------------------------------------------------static def add(a:DistArray[Int], b:DistArray[Int]){a.dist == b.dist}: DistArray[Int]{self.dist == a.dist} ={ val c = DistArray.make[Int](a.dist, (p:Point)=>0); val places = a.dist.places(); finish for (place in places) async { at (place) { for (pt in a.dist|here) async { // hierarchical parallelism c(pt) = a(pt) + b(pt); } } } return c;}--------------------------------------------------


8 Distribution of Computation=============================

It is sometimes desirable to move a computation to a specific place,e.g. if the place has specialized hardware to perform the computationwith high efficiency. In this case, the data has to be moved/copied tothe target place as well and X10 helps you to this.

We assume in the following example, that our hardware configurationhas a specific place (place #1) that can efficiently evaluatefunction isPrime(). The for loop in the main program spread over allplaces, but the execution of isPrime is always directed to place’primeProcessor’.

--------------------------------------------------public class ComputePrimesDist { static def isPrime(n: Int): Boolean { var prime: Boolean = true; atomic Console.OUT.println("isPrime("+ n +") evaluated on place #" + here.id); for (var i:Int = 3; i <= Math.sqrt(n as Double); i += 2) { if (n % i == 0) { prime = false; break; } } if (( n%2 != 0 && prime && n > 2) || n == 2) return true; else return false; } static val MAX = 20; public static def main(Array[String]) { val PRIME_PROCESSOR = here.next(); for ([i] in 0..MAX) async { if (at (PRIME_PROCESSOR) isPrime(i)) atomic Console.OUT.println("found prime "+ i +" on place #" + here.id); } }}// prints// isPrime(1) evaluated on place #1// isPrime(20) evaluated on place #1// isPrime(0) evaluated on place #1// isPrime(2) evaluated on place #1// isPrime(4) evaluated on place #1// isPrime(3) evaluated on place #1// isPrime(19) evaluated on place #1// found prime 3 on place #0// found prime 2 on place #0// isPrime(5) evaluated on place #1// found prime 19 on place #0


// isPrime(6) evaluated on place #1// isPrime(18) evaluated on place #1// found prime 5 on place #0// isPrime(7) evaluated on place #1// isPrime(17) evaluated on place #1// isPrime(8) evaluated on place #1// found prime 7 on place #0// isPrime(9) evaluated on place #1// found prime 17 on place #0// isPrime(10) evaluated on place #1// isPrime(16) evaluated on place #1// isPrime(11) evaluated on place #1// isPrime(12) evaluated on place #1// isPrime(13) evaluated on place #1// isPrime(15) evaluated on place #1// found prime 13 on place #0// isPrime(14) evaluated on place #1// found prime 11 on place #0--------------------------------------------------

9 Exercises===========

9.1 Heat Transfer=================

The example illustrates also how X10 interoperates with Java modules.

9.2 Parallel Prefix Sum=======================

9.3 Map-Reduce==============

9.4 Traveling Salesman======================

Take the example code and modify it such that the computation ofthe Banker’s sequence is computed always on place #2 (e.g. since thecomputation is expensive and that place may have special hardwaresupport).

9.5 Merge Sort==============

09:04:41 x10 tutorial notes.txt 1 · 09:04:41 x10_tutorial_notes.txt 1 x10 tutorial, 12.11.2010 at...

Documents