collection-oriented languagesfateman/264/papers/...collection-oriented languages jay m. sipelstein...

40
Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract This research was supported in part by the Defense Advanced Research Projects Agency (DOD) and mon- itored by the Avionics Laboratory, Air Force Wright AeronauticalLaboratories, Aeronautical Systems Division (AFSC), Wright-Patterson AFB, Ohio 45433-6543 under Contract F33615-87-C-1499, ARPA Order No. 4976, Amendment 20 and in part by the National Science Foundation. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of DARPA, the National Science Foundation or the U.S. government.

Upload: others

Post on 22-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

Collection-Oriented Languages

Jay M. Sipelstein Guy E. BlellochMarch 18, 1991CMU-CS-90-127

School of Computer ScienceCarnegie Mellon University

Pittsburgh, PA 15213

Abstract

This research was supported in part by the Defense Advanced Research Projects Agency (DOD) and mon-itored by the Avionics Laboratory, Air Force Wright Aeronautical Laboratories, Aeronautical Systems Division(AFSC), Wright-Patterson AFB, Ohio 45433-6543 under Contract F33615-87-C-1499, ARPA Order No. 4976,Amendment 20 and in part by the National Science Foundation.

The views and conclusions contained in this document are those of the authors and should not be interpretedas representing the official policies, either expressed or implied, of DARPA, the National Science Foundation orthe U.S. government.

Page 2: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

Abstract

Several programming languages arising from widely diverse practical and theoretical consid-erations share a common high-level feature: their basic data type is an aggregate of othermore primitive data types and their primitive functions operate on these aggregates. Ex-amples of such languages (and the collections they support) are FORTRAN 90 (arrays), APL

(arrays), Connection Machine LISP (xectors), PARALATION LISP (paralations), and SETL (sets).Acting on large collections of data with a single operation is the hallmark of data-parallelprogramming and massively parallel computers. These languages—which we call collection-oriented—are thus ideal for use with massively parallel machines, even though many of themwere developed before parallelism and associated considerations became important. Thispaper examines collections and the operations that can be performed on them in a language-independent manner. It also critically reviews and compares a variety of collection-orientedlanguages with respect to their treatment of collections, gives many examples and code frag-ments from these languages, and elucidates certain problems that may arise when definingand implementing collection operations.

1

Page 3: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

Keywords: massively parallel programming, data parallelism, collection-oriented lan-guage

Page 4: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

1 Introduction

We call a programming language collection-oriented if aggregate data structures and opera-tions for manipulating them “as a whole” are primitives in the language. Common kinds ofcollections supported by these languages include sets, sequences, arrays, vectors and lists.Common collection operations include summing all the elements of a collection, permutingthe order of the elements and applying a function to all elements of the collection. Table 1shows examples of such operations in several collection-oriented languages. Many conven-tional languages—such as C, PASCAL, and FORTRAN 77—supply an aggregate data structure,typically the array. However, the only primitive operations on these aggregates are accessorsto single elements. This often forces the user to write explicit loops to operate on elements inan aggregate fashion. Precisely because collection-oriented languages eschew explicit loops,they are in most cases ideally suited for implementation on massively parallel machines: theparallelism inherent in the operations removes the need for sophisticated compiler analysisnormally needed to uncover available parallelism.

When implemented on massivley parallel machines, collections can have their elementsdistributed across the available processors. Then, for example, to perform the componentwisemultiplication of two vectors, corresponding elements are assigned to the same processor andeach processor multiplies these values in standard Single-Instruction Multiple-Data (SIMD)fashion. An operation taking linear serial time may thus be implemented in constant paralleltime. Summing the elements of a sequence quickly in parallel is also straight-forward. Eachprocessor is assigned an element. Then in logarithmic time, the elements are summed witha tree-like computation: at each step half the remaining elements are removed by pairingup adjacent values and replacing the pair with their sum. An operation like transposing amatrix can be accomplished using the communication facilities of the parallel machine: eachprocessor computes the location to which its piece of the data must go and then sends it there.

In view of the available parallelism, it is no surprise that most high-level languages devel-oped for massively parallel machines are collection-oriented. The high-level languages for theConnection Machine [17, 36]—C* [29], *LISP [26], Connection Machine LISP (or CM-LISP) [34],and PARALATION LISP [30]—all are collection-oriented. Likewise, AL [37] and Apply [16] forthe Warp, Parallel PASCAL [11] for the MPP [6], and the array extensions to FORTRAN 90 [3](which were designed for use on parallel and vector computers) are all collection-oriented.When talking about massively parallel implementation collection-oriented languages are alsocalled data-parallel languages [18]. This is because the parallelism comes from applying asingle operation over a potentially large set of data, in contrast to control-parallel languagesin which different operations can be executed in parallel.

Collection-oriented languages have been developed independently of and prior to paral-lel machines. The programming language community has long recognized that aggregatedata structures and general operations on them give great flexibility to programmers andimplementors of a language—even for serial machines. This idea is the basis for many pro-gramming languages designed long before the advent of parallel machines. The first suchlanguage, APL [22], appeared in the early 1960’s and utilized a compact notation for repre-senting array operations.1 In the 1970’s, several other collection-oriented languages weredeveloped, including SETL [32] and FP [4]. Each has features that are in large part derived

1APL and APL2 are used as generic terms in this paper. The former refers to any version of APL without nestedcollections and the latter refers to any of the newer dialects including APL2 [21] and Dictionary APL [23], that dosupport nested collections.

1

Page 5: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

from APL. A number of successors to APL appeared in 1980’s: APL2 [21], NIAL [24], and Dictio-nary APL [23]. Many modern functional languages such as HASKELL [19] and MIRANDA2 [39]also have collection-oriented features—in particular, list comprehensions. With the adventof massively parallel machines, researchers have started to work on compiling these olderserial languages for these new architectures [10, 15, 40, 14].

This paper outlines, compares, and contrasts the collections and operations found in manycollection-oriented languages by putting them into a common framework. In the process,many problems that can occur in specifying such languages will be elucidated. Parallelimplementation of the collection operations and issues involving representations of collectionsare not covered in this paper; that would require a paper on its own. For further informationon implementation details the reader is referred to other sources [17, 2, 7].

This paper is organized into five sections.

� Section 2 gives some extended examples of collection operations in several languages toprovide the reader with a sample of the issues covered in this paper.

� Section 3 introduces a taxonomy of collections. Issues examined include the type ofelements a collection can contain, whether a collection must be homogeneously typed,and the ordering among the elements of a collection.

� Section 4 examines a wide variety of collection operations in great detail. Operationscovered include using a function to combine the elements of a collection together (re-duce), extracting elements satisfying certain properties from a collection (select or setcomprehension) and rearranging the elements of a collection (permute).

� Section 5 examines the apply-to-each form in collection-oriented languages. This formapplies a function to each element of a collection. Issues treated include whether theextension of a function over the elements is explicit or implicit and how the extension isapplied to functions with multiple arguments.

� Finally, Section 6 explores a variety of collection operations in specific collection-orientedlanguages. A variety of languages (including APL, SETL, CM-LISP, PARALATION LISP, andFORTRAN 90) are critically compared in this regard.

The distinction of whether or not a language is collection-oriented is not a precise one.Although many conventional languages allow the user to create a collection-oriented layer ontop of the existing language, the basic language is not collection-oriented. What we believeimportant is not the language on which the collection operations are based, but rather theset of operations that are used in programming. If a user sticks to a good set of collectionoperations, in whatever language the operations are embedded, it should be easy to converttheir programs to another collection-oriented language with a similar set of collection typesand collection operations while preserving efficiency of implementation.

2 Examples of Collection-Oriented Operations

This section illustrates four collection-oriented operations specified in several different lan-guages. The purpose of these examples is to give an overview of various collection-oriented

2Miranda is a trademark of Research Software Ltd.

2

Page 6: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

APL: -A

CM-LISP: (�- A)

SETL: [-e : e in A]

PARALATION LISP: (elwise ((e A))(- e))

FORTRAN 90: -A

A = [4 5 -2 11 -7]

) [-4 -5 2 -11 7]

Example 1: Negating each element of a collection

APL: A + B + 2

CM-LISP: (�+ (�+ A B) �2)

SETL: [A[i] + B[i] + 2: i in domain(A)]

PARALATION LISP: (elwise ((e1 A)(e2 B))

(+ (+ e1 e2) 2))

FORTRAN 90: A + B + 2

A = [4 5 2 11 7]B = [2 5 1 4 6]

) [8 12 5 17 15]

Example 2: Summing corresponding elements of two collections and a scalar

APL: A[P]

CM-LISP: (�@ P A)

SETL: [A[i] : i in P]

PARALATION LISP: (<- A :by(match P (index A)))

FORTRAN 90: A[P]

A = [r e s e t]P = [1 2 4 3 0]

CM-LISP ) [t r e e s]

Others ) [e s t e r]

Example 3: Permuting a collection; P is a permutation of the indices of A

APL2: +/¨A

CM-LISP: (��+ A)

SETL: [+/e : e in A]

PARALATION LISP: (elwise ((e A))(vref e :with ’+))

A = [[3 4 2] [2 8]]

) [9 10]

Example 4: Summing subcollections of a nested collection

Table 1: Some basic collection-oriented operations

3

Page 7: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

languages and to introduce many of the important issues discussed in detail later in thepaper. Implementations of these four examples in APL, SETL, PARALATION LISP, CM-LISP andFORTRAN 90 are shown in Table 1. Each code fragment is quite concise in comparison tothe equivalent code in a conventional language. Table 2 shows some examples of completeroutines.

Example 1: Unary Apply-to-each

An operator that applies a function to each element of a collection is called an apply-to-each.The first example in Table 1 shows how the negate function can be applied over the elementsof a collection. Apply-to-each is specified in a variety of ways. In APL and FORTRAN 90, justplacing the negate symbol in front of a vector signifies that each element should be negated.This syntax is called an implicit apply-to-each, since there is no explicit declaration that thenegate should be applied over the elements. In FP and CM-LISP it is necessary to place an �

in front of the negate; the � can be thought of as taking the function and distributing it overthe collection. We call this form an explicit apply-to-each. In SETL and PARALATION LISP, it isnecessary to bind a variable name to a representative element of the collection and then applythe negate to this variable. One can think of the expression as stating: “for each e in A, negatee.” We call this form, a binding apply-to-each. In SETL, this form is actually a special caseof the more general set / tuple comprehension primitive discussed in Section 5.3. What effectdoes the inclusion of one of these three forms have on a collection-oriented language? Weargue, for example, that implicit apply-to-each interacts badly with overloading of functionsbased on argument type (see Section 5.1). Another issue is which functions can be used inan apply-to-each? Some languages allow all primitive or user-defined functions to be so used;others only allow a fixed set of primitive functions.

Example 2: Non-unary Apply-to-each

The second example in Table 1 demonstrates the case of applying the function+ (addition) overthe corresponding elements of two collections and then adding the constant 2 to each elementof the result. Unlike the previous example, addition takes more than a single argument, andone of the arguments is not a collection. Two new issues arise from this example: elementcorrespondence and argument extension.

What does the phrase “corresponding elements of two collections” in the previous para-graph mean? Intuitively, we can think of lining up the two collections and applying thefunction + at each location. But what if the two collections cannot be “lined up” (they may beof different lengths, dimension or nesting level)? What if the collections are not ordered (aswith sets in SETL)? All the languages considered in Table 1 have a different way of definingapply-to-each on multiple argument functions. APL requires the two arguments be of equallength. SETL requires the use of an explicit index set. PARALATION LISP requires that thetwo arguments come from the same paralation—this is an even stronger requirement thanbeing of the same length. CM-LISP puts no requirements on the relationship between the twoarguments—elements having the same key are added.

Another issue raised by this example is how to define what it means to “add a scalar to acollection.” There must be some mechanism for specifying that the scalar should be treatedas a collection, each of whose elements has that particular value. We call this argument

4

Page 8: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

FORTRAN 90M[2:5] - M[1:4]

M = [3 4 9 3 5]) [1 5 -6 2]

Compute first difference of a vector

APL

1"(A=b/A)/��AA = [3 1 4 1 5 9]

) 6

Find the index of the first minimum element of A

CM-LISP(let ((len (length A))

(p (�/ (�+ A �1.0) �len)))(- (�+ (�* p (�lg p)))))

A = [a b c a d c b d]) 2

Compute Shannon entropy of A: H(i) = �P

p(i) lg p(i)where p(i) is the probability that a particular element of A is i: p(i) = (#i in A)=(length A)

FP

(=+) � (��) � trans : hA,Bi

A = [1 0 5 3]B = [3 4 3 7]

) 3 + 0 + 15 + 21 = 39

Compute the dot product of two vectors

SETL

a := [2..N];result := [];loop while #a > 0 do

p := first a;a := [x in a | (x mod p) /= 0];result := result with p;

end;print result;

N = 10) [2 3 5 7]

Find prime numbers with the Sieve of Erastosthenes

Table 2: Examples of routines with collections in a variety of languages

5

Page 9: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

extension.3 In CM-LISP this is accomplished via the same � form used for specifying an apply-to-each; this is called explicit extension. In APL scalars are automatically extended as needed;this is called implicit extension. Implicit argument extension can lead to ambiguity when thecollections to be extended are nested (see Section 5.2).

Example 3: Rearranging Elements

In addition to having a mechanism for applying a function to each element of a collection,collection-oriented languages supply operations that affect the structure of a collection, inde-pendently of the values of the collection’s elements. The third example in Table 1 illustratesone such operation: permute. The permute operation rearranges the elements of a collectionaccording to a collection of indices. Permute is an example of an operation that has differentdefinitions in different languages: APL performs a permute that is the mathematical inverseof that performed by CM-LISP. Permute is a special case of indexing elements in APL, FOR-TRAN 90 and SETL and the permutation indices all refer to the result collection. In CM-LISP

these indices refer to the argument collection. In languages like CM-LISP with more complexcollection types, permute can be generalized greatly (Section 4.6.3).

Example 4: Nested Collections and Operators

The final example demonstrates the utility of nested collections. A is a collection of collections,and the sum of the elements in each of these subcollections is needed. Not all collection-oriented languages allow nested collections: FORTRAN 90 and APL do not permit them andAPL2 does. Unless a language supplies nested collections, the set of functions that can beused in an apply-to-each is necessarily restricted. When used in an apply-to-each, a functionthat normally acts on a collection must act on a nested collection. For example, in an apply-to-each, a function that sorts a sequence would sort each element of a collection of sequences.Because of these restrictions imposed by non-nested collections, all recent successors to APL

have been extended to included nested collections.Example 4 combines the apply-to-each concept with the operation of summing the elements

of a collection. The latter computation is described in all of these languages as a plus reduction.Reduction (Section 4.2) is one example of a higher-order function: it takes as arguments botha combining function and a collection of elements to be combined. Each of the collection-oriented languages in these examples, except FORTRAN 90 and APL, possess such operators.APL and FORTRAN 90 both have a particular fixed set of reductions and no general mechanismfor defining others. Once again APL2 extensions correct this deficiency in APL.

3 Collections

The exact definition of a collection varies greatly from language to language. The simplestand most general characterization of a collection is that it is a group of objects viewed asa whole [27]. This captures the intent of our usage: collection-oriented programming lan-guages should be able to encapsulate a group of objects into a collection and then manipulate

3The term scalar promotion is sometimes used in the compiler community for this case of argument extension.Our term has a more general meaning.

6

Page 10: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

this conglomeration of elements in useful ways. It is the methods of encapsulation and ma-nipulation that are interesting. This section surveys the different kinds of collections thatcollection-oriented languages support.

3.1 General Classification of Collections

We categorize collections along three axes: the kinds of elements allowed, whether or notthese elements may be of mixed type, and whether the elements are implicitly and/or explicitlyordered within the collection.

Elements of Collection

Perhaps the greatest distinction between the kinds of collections that a language supports isthat between simple collections and nested collections. The elements of a simple collectionmay not be collections themselves. Languages supporting only simple collections include APL

and FORTRAN 90. Nested collections are the more general type of collection: they may havecollections as elements. Languages with nested collections include APL2, SETL, CM-LISP, andPARALATION LISP.

Nested collections are useful for a great many reasons. First, they allow a greater degree ofdata abstraction: many complex data types have direct representations as nested collectionsFor example, an image might be represented as collection of polygons, each of which is a col-lection of edges and vertices. Second, nested collections allow an added degree of parallelismto be specified: to process an image we may wish to perform in parallel a polygon operation(with may itself be a parallel operation) to each polygon in an image. Third, nested collectionsallow any function to be the argument of an apply-to-each form: it no longer matters whetherthe function acts on scalars or on collections.

A useful subclass of the simple collections are the structure collections. An element is astructure if it has a fixed number of fields and the only operations that can be performed onthe element are extraction and insertion of a field (for example, a PASCAL record, or a C orLISP structure). Both CM-LISP and PARALATION LISP support collections of structures.

Type Homogeneity of Elements

An issue orthogonal to the types of the elements allowed in a collection is whether (and how)elements of differing types may be present in the same collection. A homogeneous collectionis one in which all elements have the same type; a heterogeneous collection has no suchconstraint.

The characterization of a collection as being heterogeneous or homogeneous can only bemade relative to the type system of the underlying language. For example, consider thefollowing nested collection of integers:

[[5 12 13] [7 24 25] [12345 76199512 76199513] [1 2 3 4]].

In some languages the length of a collection might be part of the type of the collection. PASCAL

has no type consistent with each element of the above example; array [1..3] of integeris fine for the first three elements, but not for the fourth. In a language with a type systemlike PASCAL’s, this collection is heterogeneous: the last subcollection is of a different lengthfrom the others and hence of different type. The type systems of most collection-oriented

7

Page 11: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

a) Relation Among Element Types

Language heterogeneous homogeneousCM-LISP X

SETL X

PARALATION LISP X

APL X

APL2 X

FORTRAN 90 X

C* X

b) Kinds of Elements

Language atomic structure collectionCM-LISP X X X

SETL X X

PARALATION LISP X X X

APL X

APL2 X X

FORTRAN 90 X

C* X X

c) Ordering of Elements

Language unordered linear-ordered grid-ordered key-orderedCM-LISP xet xector — xappingSETL set tuple — mapPARALATION LISP — field — —APL — — array —APL2 — — array —FORTRAN 90 — — array —C* — — field —

Table 3: The collection types available in various collection-oriented languages.

8

Page 12: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

languages are not this strict and this collection is viewed as just a vector of vectors andtherefore homogeneous. To push this example further, if 16 bit and 32 bit integers were ofdistinct types in a language (say integer and long integer), then the third subcollectionmight itself be considered heterogeneous.

Table 3a shows the type homogeneity of some collection-oriented languages.

Collection Ordering

Another important property of collections is whether or not there is an ordering associatedwith the positions of elements in the collection independent of their values: can we say thatone element comes before another in the collection? For example, the elements of an array inFORTRAN are ordered (by their index), while the elements of a mathematical set are not. Thenature of the ordering of a collection has a strong influence on the collection operations thatcan be defined on it (Section 4).

We distinguish between four classes of collection orderings:

Unordered Unordered collections are essentially sets of elements, except that sets do notallow repetitions and a general unordered collection does.4

Sequence-Ordered The elements of a sequence-ordered collection are linearly-ordered bytheir position within the collection. Vectors and lists are examples of sequence-orderedcollections.

Grid-Ordered Grid-ordered collections are arrays of arbitrary dimension. There is an indexfunction which maps ordered tuples of integers in some interval to elements of thecollection.

Key-Ordered Key-ordered collections are indexed via an arbitrary mapping function thathas keys as its domain and values as its range. Some languages further require all thekeys of a collection to be unique.

Unordered sets are the foremost collection type in SETL. Sequence-ordered collectionsare the basic data structure of LISP-like languages. Grid-ordered collections are the basicdata structure of APL-like languages. Key-ordered collections are the most general since thedomain of the mapping function can be a sequence of integers (giving a sequence-ordering) orcan be tuples from a sequence of integers (for a grid-ordering). Table 3b shows the orderingssupported by the languages under consideration and the names each language gives to theseordered collections.

An important distinction between ordered and unordered collections is that the formermakes possible an unambiguous correspondence between elements of two differing collections.If two collections are ordered in the same manner and are of equal sizes, we may “superimpose”one collection on top of another by associating elements with the same index or key. Suchcollections are said to be of the same shape or to be conformable.

An additional kind of collection found in some functional languages is the infinite collec-tion. In languages supporting either lazy or normal order evaluation5 it is possible to create

4The distinction here is one of sets vs. multisets.

5The basic idea behind these languages is call-by-need: arguments to functions are not evaluated until they areused. Hughes has an excellent introductory article on lazy functional languages and their suitability for scientificcomputation. [20]

9

Page 13: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

collections that are potentially infinite in extent. Implementations of such collections onlycompute the values of elements of the collection as they are needed by the functions actingon them. As long as the manipulations deal with the collection itself, and not the individualelements, the implementation is free to avoid computing the values of those elements. Lan-guages supporting lazy collections include MIRANDA, HASKELL, and SCHEME [1]. Althoughthe infinite collections in these particular languages are linearly ordered, this does not haveto be true in general. For instance, it would be relatively easy in any of these languages tocreate the SETL-style set of the natural numbers: the collection would pick a new “random”natural number each time it is accessed and would guarantee no repetitions. Similarly, itwould be easy to build infinite collections of any of the other forms discussed in this section.One nice example of this is FAC [38], a lazy functional version of APL with infinite, raggededged arrays.

3.2 Language-specific Collections

Table 3c summarizes the differences that exist between the kinds of collections supported bysome collection-oriented languages. This section explores the distinctions between individuallanguages in greater detail.

The heterogeneous nestable array is the fundamental collection in APL2. This contrastswith APL, which only allows homogeneous simple arrays. The introduction of heterogeneousnested collections into APL2 allows arbitrary number of arguments and return values foruser defined functions (by wrapping the multiple values into a collection), as well as thecombination of numeric and character data in vectors. APL2 also adds a great many newoperators to APL for handling nested collections. The following APL2 collection containsvectors, strings and arrays (the boxes indicate nesting):

1 23 45 6

7 89

TEN

The fourth element is itself an array containing a scalar and a vector and the fifth is a vectorof characters.

SETL supports three kinds of collections: the set, the tuple and the map. Sets are unorderedand enclosed by f g (curly braces). Sets are constrained not to have duplicates and SETL

enforces this constraint by removing repeated elements. The sets

f1,1,2g, f2,2,1,1,1g, f1,2g, f2,1,2,2g

are all equivalent. Tuples are sequence-ordered and enclosed by [ ] (square braces). Mapsare key-ordered collections represented as a set of ordered pairs (two element tuples); theremay be multiple elements with the same key. Any valid SETL object can be an element of aset, an element of a tuple, or the key or value field of a map. This allows arbitrary levels ofnesting in collections.

The primitive collection of CM-LISP is the xapping. A xapping is a key-ordered heteroge-neous collection. Each element is an ordered pair of the form

index!value.

The index (also called the key) and value can be any LISP object (including another xapping)

10

Page 14: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

but the indices in a given xapping must all be distinct. An example of a xapping is

f b!boat c!car a!apple g.

CM-LISP also provides a shorthand notation for sequence-ordered collection (called xectors):

[one two three] � f 1!one 2!two 3!three g.

A second shorthand notation is available to describe sets (called xets):

f one two three g � f one!one two!two three!three g.

PARALATION LISP adds a new data type to COMMON LISP [35]: the paralation, a contractionof “parallel” and “relation”. A paralation consists of two parts: a fixed number of sites indexedfrom 0 and a dynamic number of fields. Each field of a paralation has a value for each siteof the paralation. It is helpful to think of a paralation as a database and a field as holding apiece of data for every element of the database. A typical paralation with two data fields andan index field looks something like:

index-field name-field year-field0 King George III 17601 Washington 17892 Adams 17973 Jefferson 1801...

......

40 Reagan 198141 Bush 1989

Fields are named (in this case, name-field and year-field) and field values may be hetero-geneous or fields (of other paralations) themselves, allowing nested collections. Paralationscan be created in two ways. The make-paralation function creates a new paralation of givenlength with one field (the index field) whose values are the numbers from 0 to (length � 1).Alternatively, a new paralation can be created by specifying the values of a field using PARA-LATION LISP reader syntax:6

(make-paralation 5) � #F(0 1 2 3 4).

Additional fields of an existing paralation can be created with the elwise function, whichtakes a list of fields in the same paralation, performs an elementwise computation on thevalues of those fields, and returns a new paralation containing the results. This construct isdiscussed in depth in Section 5. An alternative way of thinking about paralations is to viewthem as types: many fields may be created in one paralation, but each must have the samelength. Conversely, fields from different paralations may not be included in the same elwise,even if their paralations have the same length. Paralations (or their fields) are essentiallysequence-ordered, but PARALATION LISP also supports key-ordered operations on and betweenparalations.

6The #F is an addition to the PARALATION LISP syntax indicating that the LISP reader should interpret thefollowing data as belonging to a paralation field.

11

Page 15: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

FORTRAN 90 has extensions to FORTRAN 77 that support arrays as a fundamental data-structure. There is a sophisticated syntax for assigning elements into arrays and for indexingelements from arrays that is discussed in Section 6.3. The vector [1 2 3 4] can be denotedby the two constructs

(/ 1 2 3 4 /)(/ I, I=1,4 /).

4 Aggregate Operations

A collection-oriented language is characterized by two features: the kinds of collections itsupports and the operations permitted on those collections. The previous section focusedon the first of these issues. This section and the next examine the second in a language-independent manner. In Section 6 we compare the specific manner by which these operationare supported in various collection-oriented languages.

The collection operations described here should be thought of as abstract mathematicalconstructs. The result of performing a collection operation should depend only on the seman-tics of that operation in the language and not on its implementation. The machine on whicha particular program is run or the particular algorithm used in the implementation of anoperation should not affect the results of the calculation. For example, parallel implementa-tion of an operation is disallowed (or has unpredictable results, depending on the language)if it causes multiple side-effects to the same memory location. Problems such as this haveresulted in the historical difficulty of defining language semantics for parallel machines.

We divide the set of collection operations into two groups: aggregate operations and apply-to-each forms. This distinction can be quite fuzzy; what appears to be an apply-to-each inone language may be an aggregate operation in another. We emphasize that the space ofcollection-oriented operations has no simple topology. The classification scheme used in thispaper is only one of several possible.

This section discusses aggregate operations. These operations act on collections in theirentirety. In contrast, apply-to-each operations can be factored into the application of a functionto the individual elements of a collection. Different aggregate operations are applicable todifferent kinds of collections: ordered, unordered, sequence-ordered, arrays and etc. We callan aggregate operation generic if it is applicable to all varieties of collections. Table 4 lists avariety of these operations and the names under which they are found.

4.1 Information Operations

Perhaps the simplest generic operations are those returning basic information about a col-lection. One familiar example is the length operation that returns the length (number ofcomponents) of a collection:7

�length [3 1 4 1 5 9]

�=) 6

�length

1 23 4

�=) [2 2].

7Disclaimer: We will be using LISP notation throughout this paper in our non-language-specific examples. Thisis merely a syntactic convenience to avoid ambiguity and is not meant as an endorsement.

12

Page 16: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

Operation Other Names Referencereduce reduce APL[22], COMMON LISP [35]

vref PARALATION LISP[30]scan scan APL, Blelloch[7]

parallel prefix Ladner[25]append catenate APL

concatenate COMMON LISP

pack pack Schwartz [33]compress APL

irregular compression Batcher[5]permute permute Blellochindex index PARALATION LISP

count APL

enumerate enumerate Christman[13]

Table 4: Summary of Simple Operations

A summary of some aggregate operations and some of the places they haveappeared in the past. Operations in the second half of the table are notdiscussed in this paper.

The length of a collection is independent of the nature of the elements of the collection.This means the length of a nested collection is the number of “top–level” components itencapsulates:

�length [1 [2 3] [4 5 6] [7 8 9 10]]

�=) 4.

This useful operation is something most non-collection languages do not support. For exam-ple, there is no way to find the length of an array in C given its name. Such informationmust either be supplied as an extra parameter or be specified as a separate field in the datastructure.

Other examples of information operations include predicates and functions that describethe kind of collection being used, for example, (set? A), and (nested? A).

4.2 Restricted Reduce

A powerful generic operation is restricted reduction. The restricted reduction operator (reduce)takes as arguments a collection C and a binary function f that is associative and commutative.It returns the result of combining the elements of C with f:

�reduce f [a b c d]

�� (f a (f b (f c d)))�

reduce + [2 4 6 8]�

=) 20�reduce max [2 4 6 8]

�=) 8.

13

Page 17: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

The requirement that f be associative and commutative8 guarantees that the result of arestricted reduction be the same, regardless of the manner by which it might be evaluated.Section 4.7 discusses the consequences of removing these restrictions.

Typical uses of reduce are: summing the elements of a collection, finding the minimumor maximum element of a collection and using logical reduction to determine the and or or ofa boolean collection. Some languages may impose restrictions on the functions f over whichreductions may be performed: APL and FORTRAN 90 only allows a fixed set of reductions,while APL2, CM-LISP, PARALATION LISP, and SETL allow reduction on any binary function,including those defined by the user. See Section 6 to see how different languages supportreduction on generic collections.

The reduce operation gets its name from its use in APL. In APL reduce applied to an arraycombines elements along the final dimension of the array, reducing the overall rank (numberof axes) of the array by one:

�reduce min

3 1 4 15 9 2 6

�=) 5 9 4 6 .

Languages with this array-reduce operation usually allow an extra argument to indicate thedimension of the array upon which the operator should act.

4.3 Set Operations

One common kind of collection is the set. Most of the collection-oriented languages examinedhere support the standard corpus of set operations on collections: member?, intersection, andunion.

�member? 2 [1 2 3]

�=) True�

intersection [1 2 3] [2 3 4]�

=) [2 3]�union [1 2 3] [2 3 4]

�=) [1 2 3 4] or [1 2 3 2 3 4]

Although languages like SETL enforce a no-repetition constraint on sets, the exact definitionof a set is orthogonal to the support of set operations in a language. These set operations canbe defined analogously on multisets (as in the union example).

4.4 Append

A natural operation on two collections is to append one collection to another. If these col-lections are ordered, append might simply concatenate one collection to the end of the other.If they are not ordered, append could act like set union (with or without the removal of anyduplicate entries). For example, any of the following are reasonable outcomes of an appendoperation:

M =) [m i c k e y]�append M [m o u s e]

�=) [m i c k e y m o u s e]or [c e e i k m m o s u y]or [m i c k e y o u s].

8Floating point addition is an example of an operation that is commutative but not associative; append (Sec-tion 4.4) is associative, but not commutative.

14

Page 18: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

The actual output is determined by the particular language used and by the specific orderingimposed on M.

The append function can be defined to work on grid-ordered collections as well as onsequence-ordered collections. The only change is that the operation can now concatenate thecollections along any of their axes, if the dimensions are conformable.

�append 2

1 2 34 5 6

7 8 910 11 12

�=)

1 2 34 5 67 8 910 11 12

�append 1

1 2 34 5 6

7 8 910 11 12

�=)

1 2 3 7 8 94 5 6 10 11 12

The first argument specifies the dimension whose length will be modified by the operation.Extending the append function to act on key-ordered collections is trickier. If the keys

of the two collections are distinct from one another, or if the language allows repeated keys(SETL), there is no difficulty: just merge the two collections.

�append f a!1 b!2 c!3 g f d!4 e!5 f!6 g

=) f a!1 b!2 c!3 d!4 e!5 f!6 g

However, if a language does not allow repeated keys and the intersection of the argumentkey sets is non-empty, there is a problem: what happens to values with the same key? Onegeneral solution to this is to apply some function to the colliding values that either creates anew value or signals an error.9 This is similar to what is done with a key-ordered permute.See Section 4.6.3 for more details, as well as for an indication of the power of this kind ofoperation.

4.5 Select and Pack

Two closely related aggregate operations are select and pack. The select operation is genericand takes collection and predicate arguments. It returns a new collection containing thoseelements satisfying the predicate:

�select even? [1 2 3 4 5]

�=) [2 4]�

select prime? [1 2 3 4 5]�

=) [2 3 5].

The collection argument to select can be either ordered or unordered.The pack function takes two conformable (Section 3.1) ordered collections as arguments:

one containing data and another containing boolean values. The result of the pack is anothercollection consisting of those elements of the first collection for which the correspondingelement of the second collection is true (or 1):

�pack [m i c k e y] [1 1 1 0 1 0]

�=) [m i c e].

9Another alternative is to not guarantee which of the values will appear in the result. This might be the correctdecision for a parallel language.

15

Page 19: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

The pack operation can be used to implement select on ordered collections: apply a predi-cate to all the elements of the collection to generate the conforming boolean collection, and usepack to extract the elements satisfying the predicate. An important difference between theseoperations is that pack depends on the ordering of the collection arguments while select doesnot. Also, select requires higher-order functions, and is therefore not possible in languageslike APL.

4.6 Permute Operations

Thus far none of the operations considered rearrange the order of the elements in a collection.This is accomplished with the permute class of functions. We discuss some general varietiesof permute in this section. Less general permute operations that are found in some languagesinclude shift, rotate, and transpose.

4.6.1 Permute

The arguments to permute are two conformable ordered collections: the data collection andthe index collection. The latter is a collection whose elements are a permutation of the indices(no index is repeated and all are present) of the first collection. The result of the operationis a collection in which the index collection specifies where the corresponding element of thedata collection goes in the result collection:

out[P[i]]( in[i],

where P is the permutation vector and in and out are the input and output sequencesrespectively. For example:10

�permute [l e a s t s] [6 5 2 3 1 4]

�=) [t a s s e l].

In array-based languages, permuting arrays can be done in either an element by elementmanner (which may require that the index collection is a collection of pairs) or by column orrow.

4.6.2 Inverse Permute

The inv-permute operator is similar to permute, except that the index vector specifies wherethe corresponding result element comes from, instead of where it goes:

out[i] ( in[P[i]]�inv-permute [l e a s t s] [6 5 2 3 1 4]

�=) [s t e a l s].

This function can be generalized to the case where the index collection is not a proper permu-tation of the indices of the data collection. Each element can be put into its proper position aslong as the elements of the index collection are a subset of the indices of the data collection:

�inv-permute [m i c k e y] [2 1 1 2 4 5]

�=) [i m m i k e]�

inv-permute [m i c k e y] [1 5]�

=) [m e]�inv-permute [m i c k e y] [1 2 3 4 5 6 1]

�=) [m i c k e y m].

10We use the convention that indices start at 1.

16

Page 20: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

4.6.3 Key-ordered Permute

The definition of permute and inv-permute given above only applies to sequence-ordered col-lections. These definitions can be extended to apply to key-ordered collections as well. Thisextension demonstrates some of the power of more general kinds of collections.

The index collection of a key-ordered permute is a collection of elements whose keys arethe same as the keys of the data and whose values are all distinct. The result is a collectionwhose keys are the values of the index collection and whose values are the values of the datacollection and the pairing up is done according to the keys of both collections:

�permute values indices

��permute f1!a 2!b 3!cg f1!2 2!3 3!1g

�=) f1!c 2!a 3!bg.

This is exactly what we would expect if we viewed sequence-ordered collections as key-orderedcollections with their indices as keys. In this formalism, inv-permute just switches the rolesof the key and value of the index collection. The result has keys from the keys of the indexcollection, and has values from the values of the data collection and the pairing up is accordingto the values of the index collection (compare with above):

�inv-permute f1!a 2!b 3!cg f1!2 2!3 3!1g

�=) f1!b 2!c 3!ag.

This definition of a key-ordered permute can be extended to cover the cases when the setsof keys for the index and data collections are not the same by simply omitting the keys not inboth:

�permute fa!x b!y c!zg fa!apple b!book d!dogg

=) f apple!x book!y g.

A further generalization can be made to the case where the values of the index collectionare not distinct.11 The permute function can take an extra argument specifying a way toresolve collisions (elements that are supposed to be moved to locations with the same key). InCM-LISP this argument is a function and colliding elements are combined with this function:

�permute f1!17 2!19 3!23g f1!1 2!5 3!1g +

�=) f1!40 5!19g

The combining function must be associative and commutative because the key-ordered collec-tions are unordered and there is no a priori way to decide the grouping or ordering of collidingelements.

This key-ordered, collision-resolving permute is a powerful operation and generalizes manyof the operations given so far. For example, by forcing all elements to collide, permute canimplement the restricted reduce function from Section 4.2:

�reduce A f

���permute A f 1!1 2!1 � � � n!1 g f

�.

4.7 Reduce and Scan

Section 4.2 examined the restricted reduce operator for unordered collections which requiredthe combining function to be commutative and associative. When operating on a linearly-

11This is for languages in which the index set of a key-ordered collection must have distinct values.

17

Page 21: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

ordered collection, the combining function need not be commutative. The result of the reducecan unambiguously be defined as that obtained when the values are combined in the sameorder in which they appear in the collection:

�reduce append [[quick] [brown] [fox]]

�=) [quickbrownfox].

The scan operator is a generalization of reduce defined on linearly ordered collections.12

This function returns a collection whose ith element is the reduce of the first i elements of asequence-ordered collection by f, a binary associative function:

�scan f [a1 a2 . . . an]

�� [a1

�f a1 a2

�. . .

�reduce f [a1 a2 . . . an]

�].

For example, scan with addition gives the “running sum” of the argument collection:�scan + [3 1 4 1 5 9]

�=) [3 4 8 9 14 23].

As with restricted-reduce, these operations extend to arrays by reducing or scanning acrossrows or columns of a specified dimension.

Can we now relax the associativity constraint on the combining function, in addition to thecommutativity constraint? Unfortunately, this is more difficult. Suppose we define reduce onlinearly ordered collections so that the combining function is “put in text”:

�reduce f [1 2 3 4]

�=) 1 f 2 f 3 f 4.

The evaluation of this expression depends on whether the language or function in question isright or left associative. Indeed, since APL groups from the right and SETL from the left,

�reduce � [1 2 3 4]

�=) �2 in APL

=) �8 in SETL.

Because subtraction is not associative, these results do not agree.In general, if the collection argument to a non-associative reduce or scan is ordered, the

result of evaluating the expression will depend on the associativity direction of the combiningfunction or language. If the collection unordered, the additions could be grouped in anarbitrary manner, with no a priori correct result.

5 Apply-to-each

The apply-to-each forms are the second major class of collection operations. Apply-to-eachforms apply a function to every element of a collection. This kind of operation maps perfectly tothe massively parallel programming paradigm: all processors apply the function to differentelements at the same time. A simple example of apply-to-each is negating each element of acollection (see Table 1a).

We use form instead of function to describe apply-to-each because there are a number ofdistinct methods for specifying apply-to-each in different languages. Only in those languagesthat have higher-order functions might apply-to-each really be a function.

12Versions of scan for trees [7] and other data structures can also be defined.

18

Page 22: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

There are two styles of apply-to-each in collection-oriented languages: extension and bind-ing. In a binding apply-to-each13 a representative element of a collection is given a name andthe computation to be performed on that element is described. This is the method used byPARALATION LISP (the elwise statement) and by SETL (in the guise of set comprehension).On the other hand, extensions modify the evaluation of a function so that it iterates overthe elements of a collection. These extensions are specified in one of two ways: implicitly orexplicitly. In some languages (APL for example), extensions are performed automatically ifthe operation in question needs them in order to be well defined: this is the implicit case.Alternatively, explicit extension requires some notation for precisely describing those func-tions and/or arguments that must be extended. The tradeoff between these alternatives islargely one of convenience and conciseness vs lack of ambiguity. Extensions are used by APL,FORTRAN 90, CM-LISP, and FP.

Probably the most important difference between collection-oriented languages is whetherthere are any limitations on which functions are permitted as the functional argument to anapply-to-each form. All the languages under consideration in this paper with explicit apply-to-each allow any function, whether primitive or user-defined, to be so used—and thereforealso allow nested collections (see Section 3.1). The languages with implicit apply-to-each,FORTRAN 90 and APL, restrict the functions to a fixed set of primitive operators. It is nocoincidence that these languages have primitive type schemes, no polymorphism, and nonested collections; they can use the syntactically succinct implicit apply-to-each form and yetincur no ambiguity.

5.1 Function Extension and Unary Functions

This section explores the complications that develop in the simplest case of apply-to-each:functions taking a single argument. Consider an example. Suppose we have a collectionconsisting of five integers. What should be the value of applying the square function to thiscollection, where square of an integer returns its square? One possible way of defining thisresult is to square each element of the collection and put these values into a new collection,while preserving any ordering:

�square [1 2 3 4 5]

�=) [1 4 9 16 25].

This can be viewed as extending the domain of the function square from integers to collectionsof integers or, alternatively, as overloading the function definition for collection types. In thesame manner, the domain of negate could be extended to key-ordered collections:

�negate f a!2 b!3 c!5 d!7 g

�=) f a!-2 b!-3 c!-5 d!-7 g.

In general, suppose f is a unary function of type O1 ! O2. The domain of f can then beextended to CO1, the collections containing elements of O1:

f : CO1 ! CO2,f([x� x� � � � x!]) � [f(x�) f(x�) � � � f(x!)],

where the Greek subscripts indicate a general ordering that is preserved by the operation.

13There does not appear to be a standard term for this in the literature.

19

Page 23: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

Unfortunately, this can lead to possible ambiguity. Suppose that the square exampleoccurred in the context of some linear algebra code written in a language allowing functionsto be overloaded based on the type of their arguments. It would be reasonable to add adefinition of square for vectors that calculates the inner product of an argument with itself.In this case our example evaluates:

�square A

�=) 55.

To remove this possible ambiguity and exactly specify which result is desired, explicitfunctional extension can be introduced into the language. Implicit functional extension usesno extra notation to denote the apply-to-each of the function. Explicit extension denotes anapply-to-each operation using a special syntax. FP and CM-LISP have explicit extension andboth denote apply-to-each operations with the symbol �; this paper borrows their syntax.With explicit extension, the square example becomes:

��square [1 2 3 4 5]

�=) [1 4 9 16 25].

If no � is used, then the overloaded vector square is called instead. Explicit functionalextension allows either functionality to be achieved: the � specifies which version of squareto use.

Another case in which ambiguity can occur is with nested collections. Consider a nestedcollection C of three collections, each of which is a collection of three elements:

C =) [[1 2 3] [4 5 6] [7 8 9]].

What should the value of�reverse C

�be? There are two possible solutions, depending on the

level of nesting at which reverse acts: reverse may apply to the whole collection, or to eachelement of the collection separately. These two cases may be disambiguated with an explicitapply-to-each:

�reverse C

�=) [[7 8 9] [4 5 6] [1 2 3]]�

�reverse C�

=) [[3 2 1] [6 5 4] [9 8 7]].

The second example should be read as “apply reverse to the elements of C.” Also,�reverse

��reverse C

�����reverse

�reverse C

��=) [[9 8 7] [6 5 4] [3 2 1]],

both of which reverse the collection and each subcollection.An apply-to-each operation can also be described by a binding apply-to-each form. This is

an explicit construct similar in form to a loop over the elements of a collection, but with noexplicit loop bounds. A variable name is bound to a representative element of the collection,and the computation to be performed on that element is described. In PARALATION LISP, thebinding apply-to-each is denoted with the key-word elwise, which we adopt for this paper.

An elwise form consists of a list of pairs and a function body. Each pair is comprised ofa collection and a dummy name for a representative element of the collection. All collectionsin a single elwise must be conformable. Using this notation, the square example givenpreviously becomes:

�elwise ((a [1 2 3 4 5])) (square a)

�=) [1 4 9 16 25].

20

Page 24: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

This should be read as “take each element a of the collection and square it.” The two versionsof the nested reverse example are written:

�reverse C

��elwise

��c C

��(reverse c)

�.

The full reversal is written�elwise ((c (reverse C))) (reverse c)

�or�

reverse (elwise ((c C)) (reverse c))�.

The set comprehension primitive of SETL and MIRANDA is another way to denote a bindingapply-to-each. In SETL, the reverse example is expressed as:

[reverse a : a in A],

which is read as “create a tuple consisting of the reverse of each element a in A.” Setcomprehension is further explained in Section 5.3.

From a purely notational perspective, both kinds of explicit apply-to-each forms, extensionand binding, have advantages and disadvantages over their implicit counterpart. The primaryadvantage is one of semantic and syntactic clarity: there is no ambiguity about the operationsbeing performed and code is quite clean and easy to read. Unfortunately, for trivial operations,or when there is no possibility of ambiguity, some people find the extra syntax tedious.

5.2 Argument extension and non-unary functions

The preceding section discussed issues arising when only considering unary functions. Howcan these ideas be extended to n-ary functions? Now the primary new issues are argumentextension14 and conformance. Given an operation like (+ A B), where A and B are collectionsof integers, how must A and B be related? What if A is an integer and B is a collection? Whatif A and B are unordered? What if they are nested at different levels?

First consider the case of binary functions. One possible way of defining apply-to-eachfor binary functions is to proceed in the same manner as for unary functions. This meansextending the domain of definition of a binary function so that it takes a collection of orderedpairs as arguments and applies the function to each pair. The elements of the ordered pairare then the arguments to the primitive function:

f([(x�; y�) (x�; y�) � � � (x!; y!)]) � [f(x�; y�) f(x� ; y�) � � � f (x!; y!)].

This definition is used by FP15 and generalizes nicely to n-ary functions: instead of collectionsof pairs we can have collections of n-tuples. Unfortunately this does not really solve thebinary case. What we have done is change the definition of f from a binary function to aunary function whose single argument is a pair, and then apply the new unary function tothe collection. Also, using such a mechanism requires constructing these sequences of tupleswhen we usually have separate collections.

14Once again, there seems to be no standard terminology for this.

15FP provides a primitive function transpose that takes two sequences and creates the sequence of pairs ofassociated elements.

21

Page 25: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

An alternative approach is for f to take two conformable collections as arguments andoperate on each pair of values with the same index or key. The result is a new collectionconformable with the old that preserves the ordering of elements:

f([x� x� � � � x!]; [y� y� � � � y!]) � [f(x�; y�) f (x�; y�) � � � f(x!; y!)].

For example, with sequence-ordered collections:

A =) [1 2 3 4 5]B =) [2 3 5 7 11]�+ A B

�=) [3 5 8 11 16]�

* A B�

=) [2 6 15 28 55],

and with key-ordered collections:

City =) f 1!‘Pittsburgh ’ 2!‘NY ’ 3!‘Boston ’ gTeam =) f 1!‘Pirates’ 2!‘Yankees’ 3!‘Red Sox’ g�append City Team

�=)

f1!‘Pittsburgh Pirates’ 2!‘NY Yankees’ 3!‘Boston Red Sox’g.

This element-by-element generalization of a binary function, and the obvious extension to n-ary functions, is the method used in all the collection-oriented languages under considerationin this paper, except FP.

As in the previous section, collection-oriented languages may use explicit function exten-sion (instead of the implicit extension just shown) to indicate this kind of operation. In thiscase, the same examples are written:

��+ A B

�=) [3 5 8 11 16]�

�* A B�

=) [2 6 15 28 55]��append City Team

�=)

f1!‘Pittsburgh Pirates’ 2!‘NY Yankees’ 3!‘Boston Red Sox’g.

Examining the definition of binary apply-to-each closely reveals a few tacit assumptions.First note that the collections under consideration must be ordered. There is an inherentmatching-up of indices that cannot occur with unordered collections. This really is not tooundesirable: elementwise addition on two sets does not seem to make much sense.

A deeper issue to consider is what happens when the collections being operated upon arenot conformable. This can happen for one of two reasons. The first is when the index/key setsof the collections are not identical. If the collections are sequence-ordered this means theyare of different lengths. One way to handle this situation is simply to signal a runtime error.Both APL and FORTRAN 90 do this. Some languages try to make this ill-matched apply-to-eachmeaningful. In particular, the apply-to-each is well-defined on intersection of the index setsand can have the standard interpretation there. Thus a new collection should be createdwhose index set is the intersection of the index sets of the arguments and the values of theanswers should be correct for these indices:

�+ f a!2 b!5 c!3 g f b!1 c!9 d!7 g

�=) f b!6 c!12 g.

For sequences of differing lengths, this means making a new sequence whose length is theminimum of the lengths of the argument sequences and whose elements represent the apply-

22

Page 26: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

to-each of the truncated sequences:�+ [2 3] [4 5 6 7]

�=) [6 8].

The problem now is what to do with the rest of the elements. There are a few possibilities.The extra elements can just be dropped (CM-LISP does this). Alternatively, the new indexset could be the union of the index sets of the arguments with any elements with unmatchedindices inserted into the result:

�+ [2 3] [4 5 6 7]

�=) [6 8 6 7].�

+ f a!2 b!5 c!3 g f b!1 c!9 d!7 g�

=) f a!2 b!6 c!12 d!7 g.

These solutions have the problem that they are fairly arbitrary: there is no real reasonto prefer one to the other. One way to solve this problem is to prevent its occurrence. Thissituation never arises in PARALATION LISP because elwise can only be used for fields of thesame paralation. These are guaranteed to be of the same length.

The other case of non-conformance is when the collection arguments of an apply-to-each donot have the same rank (number of dimensions). The most common instance of this is whenone argument is a collection and the other is not (we call this value a scalar). The collection-oriented languages under consideration here each combine each element of the collection withthe scalar:

�f s [x� x� � � � x!]

�=) [(f s x�) (f s x�) � � � (f s x!)].

This allows the following:

B =) [2 3 5 7 11]�+ B 2

�=) [4 5 7 9 13]�

* B 7�

=) [14 21 35 49 77].

Another way of looking at this is to say that we argument-extend16 the scalar s by con-verting it into a collection all of whose elements are s and that conforms with the collectionargument. As with function extension, argument extension may be either implicit (as above)or explicit:

��* B �7

�=) [14 21 35 49 77].

The � before the 7 may be thought of as specifying that enough copies of the scalars arecreated to conform to the shape of the collection argument. At this point each function isapplied to its arguments. The CM-LISP manual [34] discusses this idea in depth.

It may seem that if a language allows explicit function extension there is no need for anyexplicit argument extension. In particular, in the previous example the�7 seems unnecessary.Since * has been extended, the interpreter or compiler can deduce that 7 must be extendedas well. However, just as with implicit function extension, implicit argument extension mayresult in ambiguity that requires explicit clarification. An example using nested collectionsdemonstrates this:

16This is sometimes called scalar promotion in the FORTRAN compiler community.

23

Page 27: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

A =) [[2 3] [4 5]]B =) [[6 7] [8 9]]�

append A B�

=) [[2 3] [4 5] [6 7] [8 9]]��append A B

�=) [[2 3 6 7] [4 5 8 9]]�

�append A �B�

=) [[2 3 [6 7] [8 9]] [4 5 [6 7] [8 9]]]��append �A B

�=) [[[2 3] [4 5] 6 7] [[2 3] [4 5] 8 9]].

Each of these results has a different collection structure. In general, without some sort ofexplicit argument extension, it may be impossible to specify which of these results is desired.

Ambiguity can also result if overloaded operators are present in the language. A nestedversion of the square from the beginning of section 5.1 is an example of this. If square isoverloaded to compute inner products when given a vector argument, all the problems of theappend case are present, in addition to the confusion regarding which square is actually beingapplied (the vector version or the scalar version).

The binding apply-to-each discussed in the previous section can also be used with binaryfunctions. In this case, the binding preserves the correspondence between elements:

�elwise ((a C1) (b C2)) (+ a b)

����+ C1 C2

when the collection arguments are conformable. Argument extension for creating equal rankarguments becomes trivial with binding apply-to-each:

�elwise ((c C)) (+ c 4)

adds four to each element of the collection C.Using a binding apply-to-each is actually more general than using � for specifying explicit

argument and function extension. Consider an example involving collections that are nestedto different depths:

A =) [[1 2] [3 4]]B =) [3 10]���* A �B

�=) ?

The intent here is that two �’s are needed to get two levels into collection A: one to apply theplus to each subcollection and a second to apply to each element of the subcollection. Theproblem now is to decide which elements of B correspond to which elements of A. There aretwo possibilities, both of which are consistent (as we have defined �) with the given code:17

�elwise ((C A))

(elwise ((c C)(b B))(* c b))

�=) [[3 20] [9 40]]

�elwise ((C A) (b B))

(elwise ((c C))(* c b))

�=) [[3 6] [30 40]].

Since the explicitly extended code for these operations is ambiguous, exactly one alternative(whichever the language decides) can be specified in the language. This is a general problemwith the � notation as we have described it (and how it is used in CM-LISP): if there is

17In CM-LISP, the first of these is what actually would be returned.

24

Page 28: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

an apply-to-each involving a function with more than a single argument, and the functionmust be applied at different levels of nesting in each, then there may be no way to specifythe operation using only �. Binding apply-to-each allow the nesting levels to be explicitlydescribed.

5.3 Collection Comprehensions

Collection comprehensions are powerful constructs becoming popular in modern high-levellanguages. Gaining its name from the Axiom of Comprehension in set theory, collectioncomprehensions are a way to create a new collection by applying a function to all elements ofan existing collection that satisfy some property.

Comprehensions first appeared as a programming construct in SETL (these comprehen-sions on sets and tuples were called set formers [31]). Since then they have been incorporatedinto a number of modern functional languages: MIRANDA [39] (where they were called Z-Fexpressions), HASKELL [19], and the dataflow language ID [28]. In this section we borrow thenotation of SETL.

A typical comprehension operation has the form:18

fe(x) : x in S | p(x)g.

Here S is any collection-valued expression, e is a function and p is a boolean predicate. Bothe and p are defined on elements of S. The result of this statement is a collection of all e(x)with x chosen from all the elements in S that satisfy p. We use the term set comprehension toindicate a comprehension construct in which the final result is a set. Similarly, we have tuplecomprehensions, list comprehensions, array comprehensions, etc.

A number of collection operations defined on unordered collections can be described quiteeasily with set comprehensions. For example:

�intersection A B

�� fx : x in A, x in Bg�

select A P�� fx : x in A | P(x)g.

Similarly, a unary apply-to-each operation on ordered (or unordered) collections is sequence(or set) comprehension without an elimination predicate:

��f X

�� [f(x) : x in X].

The set comprehension notation extends naturally to ordered collections if we require thatkeys of values are preserved by the comprehension:

A =) f a!2 b!3 c!4 d!5 e!6 gfa : a in A | (prime? a)g =) f a!2 b!3 d!5 g.

All the languages mentioned in this section allow any number of in clauses in a compre-hension. This is defined to iterate through all combinations of elements in the collections thatsatisfy the selection predicate (in a manner analogous to a nested loop):

[[x y]: x in [1 2], y in [a b]] =) [[1 a] [1 b] [2 a] [2 b]].

18If we surround the comprehension with curly braces fg, the result is a set; if we use square braces [], theresult is a tuple.

25

Page 29: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

The order in which the elements are selected depends on the ordering of the collectionsinvolved. For sequence-ordered collections both SETL and MIRANDA iterate through the ele-ments sequentially, with the leftmost clause being iterated most quickly. This was used inthe previous example. Since MIRANDA has infinite lists, this scheme could not work if any ofthe lists being iterated over were infinite: there is no way to ever get to the next element ofthe other lists. MIRANDA has a construct for diagonal sequencing through multiple lists, forjust this situation:

[[x, y] // x in [1..], y in [5..]]=) [[1 5] [1 6] [2 5] [1 7] [2 6] [3 5] ...].

Comprehensions have one major limitation: there is no easy way to create a correspon-dence between elements of two collections. For example, to use comprehensions to specify abinary apply-to-each on two sequences of the same length, one cannot do the following:

[f(x,y): x in X, y in Y].

This produces the value of f(x,y) for each ordered pair [x,y] in the set product of X and Y:

[x * y : x in [3, 4], y in [5, 7]] =) [15, 21, 20, 28].

To compute the pairwise product of the tuples one must write:

[f(X(i), Y(i)) : i in domain(X)].

This explicitly uses the domain of the sets as indices in the specification. An interestingextension to set comprehension would be to allow some notation for implicitly creating acorrespondence between tuples (which could be used for binary apply-to-each operations)without resorting to index lists.

5.4 Side Effects

An important issue that we have been avoiding until now is side-effects and order of evaluationin apply-to-each forms. To execute collection-oriented languages on parallel machines wewould like not to be constrained to a particular order of evaluation. The implementationshould be free to schedule the function calls in any manner, so that all available concurrencycan be utilized. The only situation in which order of evaluation might make a difference inthe final output is when the called function has side-effects.19

Consider the following fragment of code:

(define f (x) (assign global-variable x))��f C

�.

What is the value of global-variable when the code has been completed? We have no ideawhich value of the collection was the last to be assigned.

There are a number of ways to get around this problem. Some languages explicitly definean ordering for the evaluation of an apply-to-each (SETL does this with tuple formers). Some

19The most common instance of a side-effect is when a procedure modifies a global variable while performing acomputation.

26

Page 30: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

may explicitly say that the result of such an operation is undefined (PARALATION LISP). Somerestrict the functional arguments (APL). While these solutions evade the problem, they donot solve it if we require parallel evaluation.

6 Language-Specific Collection Operations

Section 4 described various collection operations from an abstract point of view. We nowshift our focus to the specific collection operations supported by several collection-orientedlanguages: FORTRAN 90, APL, SETL, CM-LISP, and PARALATION LISP. Particular note shouldbe made of the interaction between the collection types provided and the definition of thecollection operators.

6.1 Reduction

FORTRAN 90 has intrinsic functions for computing reductions with a fixed set of operators:and, or, +, �, min, and max reductions are each specified by a keyword. For example, if A is anarray or vector, then (+-reduce A) is denoted by SUM(A). Each of the reduction operationshas two optional arguments delineated with the keywords DIM and MASK. The DIM argumentis a list of integers indicating the dimensions of the array over which the reduction is to beperformed:

SUM ( PRODUCT (A, DIM = 1)) =)X

j

Y

i

Aij.

MASK allows a select operation to be performed on the array before the reduction is carriedout:

SUM (X, MASK = X .GT. 0.0) =)X

Xi>0

Xi.

MASK may be either a boolean valued function or a boolean array conformable with the argu-ment. Each of these optional arguments defaults to allow the reduction of the entire array. Ifthe array being reduced is empty, or if the MASK is true for no elements, the identity elementis returned. Of the allowed reductions, all are commutative and only floating point SUM andPRODUCT are not associative. The current draft standard for FORTRAN 90 [3] only specifiesthat the evaluation of a reduction should produce a “processor-dependent approximation” tothe correct value: no explicit order of operation is defined, permitting efficient parallel orvector implementation.

The expression f/A is used in APL to specify reduction of an array A by a binary functionf. Standard APL requires f to be one of a fixed set of built-in dyadic scalar functions, but themore recent dialects generalize this. The semantics of the reduce operation define the resultof reducing a vector by f to be equivalent to that of evaluating the expression obtained bywriting the vector with an f between each two adjacent elements. Since the evaluation of anyAPL expression is carried out from right to left, the following is an identity:

f/1 2 3 4 � 1 f 2 f 3 f 4� (1 f (2 f (3 f 4)))

This gives a well-defined result for non-commutative reduce:

27

Page 31: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

-/1 2 3 � (1 - (2 - 3)) =) 2.

Unlike FORTRAN 90, APL only permits a single dimension of a multi-dimensional array tobe reduced at once. An optional axis argument may be used to select this dimension, whichdefaults to the last axis:20

A =)3 4 56 7 8

+/[1] A =) 9 11 13+/[2] A =) 12 21+/A =) 12 21.

SETL’s reduce operation is very similar to APL’s and uses the same syntax. The differencesare that there is no restriction to built-in operations, there are no extra dimensions to worryabout (since the basic collection type is not grid-ordered), and evaluation proceeds left to right.Also, since SETL’s sets are unordered, reduction with a non-commutative function should usetuples to obtain a well-defined result. SETL permits an optional left argument to reduce thatspecifies an element with which to begin the reduction (instead of the identity element). Thisarguments acts as a default value if the collection is empty:

0 +/S

gives the sum of the elements of S[f0g.CM-LISP uses the construct (�f A) to indicate reduction of A by f, where f is any CM-LISP

function or lambda expression:��+ ’f1!2 2!4 3!4g

�=) 10.

CM-LISP explicitly guarantees that the order of evaluation of a reduce is undefined for ageneral xapping: the compiler or interpreter is free to structure the calculation in the mostefficient manner. When reducing a xector, the ordering of the xector is respected, yielding areduction order predictable up to associativity. The reduce function is a special case of themore general three argument CM-LISP � (see Section 6.3).

In PARALATION LISP, reduce is performed by the vref function. Given the field field ofsome paralation and a binary function f, (vref field :with f) computes the reductionof field with f:

(vref ’#F(1 2 3 4) :with #’+) =) 10

(the #’+ must be used to specify the functional argument + in COMMON LISP). PARALATION

LISP allows a default value to be specified for a reduction of an empty field with the optional:else keyword. vref can be implemented and described in terms of the more general moveoperator, discussed later (see Section 6.3).

20This example assumes that the index origin is set to 1

28

Page 32: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

6.2 Append

In this section we examine mechanisms to append one collection to another. Once again,the definition of this operation is highly dependent on the order type of the collections beingcombined.

In SETL, two sets or tuples may be appended by using the + operator. In the case of sets,the resulting set is the union of the two arguments (all duplicates are removed):

f2, 3, 4, 5g + f4, 5, 6, 7g =) f2, 3, 4, 5, 6, 7g.

With tuples, the + operator just concatenates the two arguments:

[2, 3, 4, 5] + [4, 5, 6, 7] =) [2, 3, 4, 5, 4, 5, 6, 7].

If instead of concatenating in this manner one wishes to add an element to a set or a tuple,the with statement can be used:

f2, 3, 4, 5g with f4, 5, 6, 7g =) f2, 3, 4, 5, f4, 5, 6, 7gg[2, 3, 4, 5] with [4, 5, 6, 7] =) [2, 3, 4, 5, [4, 5, 6, 7]].

Each these results can also be obtained by using +, and making the second argument a nestedset or tuple; for example,

f2, 3, 4, 5g + ff4, 5, 6, 7gg =) f2, 3, 4, 5, f4, 5, 6, 7gg.

In APL there are two types of append operations. The first is for appending elements ontoa vector, or for appending one vector to another, and is denoted by the binary , (comma). Thisoperation concatenates (or just catenates, in APL lingo) the second argument onto the end ofthe first:

1 2 3 , 4 5 6 =) 1 2 3 4 5 6 .

The other possibility is to join the arguments together “side-by-side”—creating a new array ofone greater rank and joining the arguments along the new dimension. This operation is calledlamination, and is denoted by giving the catenate operator an extra axis specifier argument.This is a fractional value indicating where the new dimension should be added:

1 2 3 ,[.5] 4 5 6 =)1 2 34 5 6

1 2 3 ,[1.5] 4 5 6 =)1 42 53 6

1 2 3 ,[.5] 4 =)1 2 34 4 4

.

The axis specifier of .5 indicates that the new dimension should go before the first dimension(any value between 0 and 1 will work). Similarly, the 1.5 indicates that the new dimensionshould be after the first dimension. This is readily extended to arbitrarily dimensioned arrays.

29

Page 33: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

In the final example, implicit scalar extension changes the 4 into a vector long enough to belaminated.

The append operation in CM-LISP is complicated by the general nature of the key-orderedcollections the language supports. The primitive xunion is the provided mechanism forappending xappings. xunion takes three arguments: a combining function and two xappings.The index set of the resulting xapping is the union of the index sets of the two argumentxappings. If an index occurs in both xappings, the corresponding values are combined by thecombining function. xunion with an arbitrary combining function is equivalent to appendfor xappings with disjoint key sets:

�xunion #’foo ’f 1!one 2!two g ’f 3!three g

=) f 1!one 2!two 3!three g

where foo is any function of two variables. If the combining function instead selects its firstor second argument, xunion is precisely set union:

�xunion #’first ’fone twog ’ftwo threeg

�=) fone two threeg

where first returns the first of its two arguments.Appending one xector to another presents a problem. A xector is just shorthand notation

for a xapping whose index set is the first n natural numbers, where n is the length of thevector. Any two non-empty vectors thus have intersecting index sets (the smaller set willbe a subset of the larger). To perform an append operation, all elements in the index set ofthe second vector must be incremented by the length of the first vector. Then xunion can beapplied. This can be accomplished with a � operation, but the resulting function uses featuresof the language that we do not discuss in this paper.

As with CM-LISP, the operation of appending two fields in PARALATION LISP is not assimple as just concatenating the values together: a new paralation must be created in whichto hold the results. The primitive function for append is field-append-2. This functiontakes two fields, which may be from any (different) paralations, and returns a new field, in anew paralation. This field contains the concatenated contents of the two argument fields:

�field-append-2 #F(a b c) #F(d e)

�=) #F(a b c d e).

A second command, expand is provided which concatenates all the field valued elements of afield together into a single field of a new paralation:

�expand #F(#F(a b c) #F(d e))

�=) #F(a b c d e).

expand can be implemented using reduce and field-append-2:

(defun expand (field)(vref field :with #’field-append-2

:else (make-paralation 0)))

6.3 Permute

FORTRAN 90, APL and SETL each implement permute as one form of a generalized subscriptingmechanism. In contrast, both CM-LISP and PARALATION LISP express permute in terms of gen-

30

Page 34: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

eral communication primitives. This section examines both subscripting and communicationin detail.

FORTRAN 90 has a wide range of methods for indexing arrays. An index into an array isa tuple whose length is the rank of the array. Each element of the tuple can be a scalar or atriple of the form start:end:stride or a vector. For example, suppose A has been declaredto be of type INTEGER A (10,5):

A(3,4) =) A(3,4)A(2:7:2, 3) =) (/ A(2,3), A(4,3), A(6,3) /)A( (/ 1, 2 /), 4) =) (/ A(1,4), A(2,4) /)

A( (/ 1, 2 /), (/ 2, 5 /) ) =)A(1,2) A(1,5)A(2,2) A(2,5)

These general indices may also appear on the left hand side of an assignment, therebypermitting selective assignment to portions of an array. The only restriction (to allow vectorand parallel implementation) is that no array location is selected more than once. An inverse-permute can be performed by creating a permutation vector and indexing into the array tobe permuted. A regular permute is accomplished by using the permutation vector to specifyassignment into the array: (if I is an array of size 4)

I( (/ 3, 4, 2, 1 /) ) = (/ 8, 9, 10, 11 /)I =) (/ 11, 10, 8, 9 /).

APL has its own version of generalized indexing. In APL, any part of an indexing subscriptmay be a scalar, vector or array. The shape of the resulting array is equal to the catenationof the shapes of each subscript. For any vector component, all the matching elements of thearray will be chosen. An empty element indicates that the entire column should be selected.As with FORTRAN 90, if these indices are used on the left-hand side of an assignment onlythose selected elements of the array are assigned. APL allows subscripts on the left-hand sideto be repeated, but the outcome of the assignment is dependent on the implementation. Aninverse-permute is done by indexing into the data with the appropriate subscripts:

A =) 1 2 3 4 5

A[3 4 2 1 5] =) 3 4 2 1 5

Permuting elements of a higher dimensional array cannot be done in this manner since toomany indices will match the list:

A =)1 2 34 5 6

A[1 2 ; 3 1] =)3 16 4

This does not select only A[1;3] and A[2;1]. However, note that these are the diagonal ele-ments in the array produced; this will be true in the general case. APL provides a mechanism(dyadic transpose) to pull out the generalized diagonal of a multidimensional array.

The permute operation on tuples in SETL is expressed in terms of indexing and set com-prehension:

31

Page 35: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

A =) [2 4 6 8][A[i] : i in [3 1 4 2]] =) [6 2 8 4].

There is no generalized indexing facility as in FORTRAN 90 or APL. Indices can only be scalarsor intervals; sets and tuples are not permitted.

CM-LISP has a general form of permute that arises quite naturally from the data-parallelorigins of the language. CM-LISP is intended for massively parallel machines with a very largenumber of processors and arbitrary communication facilities. It is expected that every elementof a xapping is governed by a different (virtual) processor. In this model, a permute may bethought of as a primitive for interprocessor communication—a method to send informationfrom one processor to another. The natural question to ask about such a primitive is “Whathappens when more than one piece of data is sent to the same processor?” CM-LISP answersthis by combining the colliding data in some manner, just as with the xunion functiondescribed above. CM-LISP has a primitive � operator used to describe communication. Thisoperator takes three arguments: a combining function and two xappings. The result ofthis key-ordered permute is fully described in Section 4.6.3. This function also allows thecomputation of a key-ordered reduce: (�f d x) computes the xapping whose indices comefrom the values of xapping d and whose values are the f reduction of values of xapping x withthose same indices:

��+ ’f 1!2 2!4 3!4 g ’f 1!5 2!6 3!7 g

�=) f 2!5 4!13 g

PARALATION LISP, like CM-LISP, also uses permute for specifying interprocessor communi-cation. The basic primitive for all data movement is the <- (move) operation. This commandtakes up to four arguments: a source field, a combining function, a default field and a map-ping. A mapping is a fixed communication pattern and can be created by the match command.match takes two fields (to-field and from-field) as arguments and returns a communi-cation pattern. This pattern connects a site in from-field’s paralation with a site in theto-field’s paralation if and only if the values in the key fields of these paralations are equal.The <- works as follows: data is transferred according to the mapping from the source fieldto a new field in the destination paralation. Any collisions are reconciled with the combiningfunction, and the default field fills all the gaps. Here is a simple example that computes theproduct of the elements in primes that are also in nums:

(setq nums (make-paralation 8)) =) #F(0 1 2 3 4 5 6 7)(setq primes ’#F(2 3 5 7 11 13))(setq lil-primes

(<- primes :by (match primes nums):default (elwise ((num)) ’1)))

=) #F(1 1 2 3 1 5 1 7)(<- lil-primes :with #’*

:by (match (make-paralation 1)(elwise (lil-primes) 0))) =) 210

The first <- creates a field of all the elements in primes that are also in num, with 1 as thedefault. The second <- creates a mapping that forces all sites of a field in the lil-primesparalation to be sent to the same place–the one site of (make-paralation 1). These arethen combined with multiply. The vref command can be implemented in terms of <- using

32

Page 36: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

this same technique.

6.4 Pack and Select

The pack and select operations both involve pulling particular elements out of a collectionand putting them into a new collection. They are examined jointly in the section.

FORTRAN 90 has a primitive pack command that packs arrays into vectors. The argumentsto pack are an array, a boolean mask (which may be a constant or a predicate) conformablewith the array, and an optional vector that specifies both the minimum length of and defaultvalues for the result:

M =)1 2 34 5 6

PACK(M, M .GE. 3) =) (/3, 4, 5, 6/)PACK(M, M .GE. 5, VECTOR = (/0, 0, 0, 0/)) =) (/5, 6, 0, 0/).

The APL version of pack is called compress and is denoted by mask/value, where mask isa boolean array whose shape can be extended to conform with value:

1 0 1 0 1 / 1 2 3 4 5 =) 1 3 51 / 1 2 3 4 5 =) 1 2 3 4 5

1 0 1 /1 2 34 5 6

=)1 34 6

.

To perform a select, a boolean mask can be created that can be used for a compress operation:

(A > 3) / A.

The pack and select functions of SETL are described as simple set comprehensions:

fx : x in A | P(x)g

extracts all those elements of A that satisfy the boolean function P. This is discussed inSection 5.3

In PARALATION LISP, pack requires the movement of data and the creation of a new par-alation. As such, it must be implemented with <-. PARALATION LISP provides the choosefunction for creating a new paralation and a mapping between this paralation and the trueelements of a boolean field of another paralation. This allows data to be moved from a field ofthe original paralation to the new paralation:

(setq a (make-paralation 5)) =) #F(0 1 2 3 4)(setq mask (elwise ((a)) (prime? a))) =) #F(NIL NIL T T NIL)(<- a :by (choose mask)) =) #F(2 3).

This may look clumsy, but the only extra operation that is performed is the creation of themapping from the mask. This mapping may be stored and then used later to move data inother fields of the paralation with only a <-.

33

Page 37: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

7 Conclusion

This paper has reviewed and compared a set of languages we call “collection-oriented”. Theselanguages simultaneously:

1. Supply high-level data abstractions and operations on them. These tend to lead to codethat is clearer, easier to write and more concise than code in standard serial languages.

2. Provide implicit parallelism. Most collection operations have efficient parallel imple-mentations so the compiler does not have to do complex analysis on loops to find paral-lelism.

Currently there is a significant gap between the collection-oriented languages that havebeen implemented to run efficiently on parallel machines, and the most interesting andpowerful of the languages. In particular, most of the implemented languages, such as C*,FORTRAN 90, and AL, do not support nested collections, and the two that do, CM-LISP andPARALATION LISP, do not implement nested collections in parallel, except for a subset ofPARALATION LISP [9].

We are currently embarking on a project to define a language with the best attributesof the languages discussed [8, 12]. A final goal is to implement a compiler that gener-ates efficient code for a variety of parallel machines and architectures, with performancethat approaches hand-optimized code. Our hope is to debunk the myth that programmingfor parallel machines is necessarily more complex than programming for serial machines.Collection-oriented languages are one mechanism for accomplishing this goal.

Acknowledgements

We wish to thank Siddhartha Chatterjee, Stewart Clamen, Gary Sabot, Alan Sussman andSkef Wholey for their comments, conversations and ideas regarding the contents of thispaper.

34

Page 38: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

References

[1] Harold Abelson and Gerald Jay Sussman. Structure and Interpretation of ComputerPrograms. MIT Press, Cambridge, MA, 1985.

[2] Selim G. Akl. The Design and Analysis of Parallel Algorithms. Prentice Hall, EnglewoodCliffs, N.J., 1989.

[3] ANSI. ANSI Fortran Draft S8, Version 111. ANSI.

[4] J. Backus. Can Programming be Liberated from the von Neumann Style? A FunctionalStyle and Its Algebra of Programs. Communications of the ACM, 21(8):613–641, August1978.

[5] Kenneth E. Batcher. The Flip Network of STARAN. In Proceedings International Con-ference on Parallel Processing, pages 65–71, 1976.

[6] Kenneth E. Batcher. The Massively Parallel Processor System Overview. In J. L. Potter,editor, The Massively Parallel Processor, pages 142–149. MIT Press, Cambridge, MA,1985.

[7] Guy E. Blelloch. Vector Models for Data-Parallel Computing. MIT Press, Cambridge,MA, 1990.

[8] Guy E. Blelloch and Siddhartha Chatterjee. VCODE: A data-parallel intermediate lan-guage. In Proceedings Frontiers of Massively Parallel Computation, October 1990.

[9] Guy E. Blelloch and Gary W. Sabot. Compiling Collection-Oriented Languages ontoMassively Parallel Computers. Journal of Parallel and Distributed Computing, 8(2),February 1990.

[10] Timothy A. Budd. An APL Compiler for a Vector Processor. ACM Transactions onProgramming Languages and Systems, 6(3):297–313, July 1984.

[11] T. Busse. MPP Pascal. In The 2nd Symposium on the Frontiers of Massively ParallelComputation, pages 595–599, 1988.

[12] Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha. Scan primitives for vectorcomputers. In Proceedings Supercomputing ’90, November 1990.

[13] David Christman. Programming the Connection Machine. Master’s thesis, Massachus-sets Institute of Technology, January 1984.

[14] Janice Glasgow, Michael Jenkins, Carl McCrosky, and Henk Meijer. Expressing ParallelAlgorithms in Nial. Parallel Computing 11, pages 331–347, 1989.

[15] R. Greenlaw and L. Snyder. Achieving Speedups for APL on an SIMD Parallel Computer.APL Quote Quad, 18(4):3–8, June 1988.

[16] Leonard G. C. Hamey, Jon A. Webb, and I-Chen Wu. An Architecture IndependentProgramming Language for Low-Level Vision. Computer Vision, Graphics, and ImageProcessing, 48:246–264, 1989.

35

Page 39: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

[17] W. Daniel Hillis. The Connection Machine. MIT Press, Cambridge, MA, 1985.

[18] W. Daniel Hillis and Guy L. Steele Jr. Data Parallel Algorithms. Communications of theACM, 29(12), December 1986.

[19] Paul Hudak and Philip Wadler. Report on the Functional Programming LanguageHASKELL. Technical Report 1.0, Yale University, New Haven, April 1990.

[20] J. Hughes. Why functional programming matters. The Computer Journal, 32(2), 1989.

[21] IBM. APL2 Programming: Language Reference, first edition, August 1984. OrderNumber SH20-9227-0.

[22] Kenneth E. Iverson. A Programming Language. Wiley, New York, 1962.

[23] Kenneth E. Iverson. A Dictionary of APL. APL Quote Quad, 18(1):5–40, September1987.

[24] M. A. Jenkins, J. I. Glasgow, and C. McCrosky. Programming Styles in Nial. IEEE Tr.On Software Engineering, January 1986.

[25] Richard E. Ladner and Michael J. Fischer. Parallel Prefix Computation. Journal of theAssociation for Computing Machinery, 27(4):831–838, October 1980.

[26] Clifford Lasser. The Essential *Lisp Manual. Technical report, Thinking MachinesCorporation, Cambridge, MA, July 1986.

[27] Trenchard More. The Nested Rectangular Array as a Model of Data. In APL 79 Confer-ence Proceedings, pages 55–73. ACM, 1979.

[28] Rishiyur S. Nikhil. ID Reference Manual. Technical Report Computation StructuresGroup Memo 284-1, Laboratory for Computer Science, Massachusetts Institute of Tech-nology, July 1990.

[29] John Rose and Guy L. Steele Jr. C*: An Extended C Language for Data Parallel Pro-gramming. Technical Report PL87-5, Thinking Machines Corporation, April 1987.

[30] Gary Sabot. Paralation Lisp Reference Manual, May 1988.

[31] J. Schwartz. Set Theory as a Language for Program Specification and Programming.Technical report, Computer Science Department, Courant Institute of MathematicalSciences, New York University, 1970.

[32] J. T. Schwartz, R.B.K. Dewar, E. Dubinsky, and E. Schonberg. Programming with Sets:An Introduction to SETL. Springer-Verlag, New York, 1986.

[33] Jacob T. Schwartz. Ultracomputers. ACM Transactions on Programming Languagesand Systems, 2(4):484–521, October 1980.

[34] Guy L. Steele Jr. CM-Lisp. Technical report, Thinking Machines Corporation, 1986.

[35] Guy L. Steele Jr., Scott E. Fahlman, Richard P. Gabriel, David A. Moon, and Daniel L.Weinreb. Common LISP: The Language. Digital Press, Burlington, MA, 1984.

36

Page 40: Collection-Oriented Languagesfateman/264/papers/...Collection-Oriented Languages Jay M. Sipelstein Guy E. Blelloch March 18, 1991 CMU-CS-90-127 School of Computer Science Carnegie

[36] Thinking Machines Corporation. Model CM-2 Technical Summary. Technical ReportHA87-4, Thinking Machines Corporation, Cambridge, Massachusetts, April 1987.

[37] Ping-Sheng Tseng. A Parallelizing Compiler for Distributed Memory Parallel Comput-ers. PhD thesis, Department of Electrical and Computer Engineering, Carnegie MellonUniversity, Pittsburgh, PA, May 1989.

[38] Hai-Chen Tu and Alan J. Perlis. FAC: A Functional APL Language. IEEE Software,pages 36–45, January 1986.

[39] David Turner. An Overview of MIRANDA. SIGPLAN Notices, December 1986.

[40] C. Walinsky and D. Banerjee. A Functional Programming Language Compiler for Mas-sively Parallel Computers. In ACM Conference on Lisp and Functional Programming,pages 131–138, 1990.

37