group algebras and speckle imaging

William DeMeo Paul Billings

[email protected] [email protected]

Group Algebras and Speckle Imaginganisoplanatism, blind deconvolution, generalizations

14 March 2004

TEXTRON Systems Corporation

Hawaii Operations

535 Lipoa Parkway, Suite 149

Kihei, HI 96753

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Group Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Finite Abelian Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Quotient groups and periodic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Abelian group convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.3 Cyclic groups, binary composition, general notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Finite Nonabelian Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.1 Two distinctions of consequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Nonabelian group convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Abelian by abelian semidirect product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 The Group Algebra CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Ideals of CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Nonabelian Group Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Speckle Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.1 Traditional Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Poisson noise model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.2 Gradient with respect to f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.3 Gradient with respect to sk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Some New Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.1 The spatially varying case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 New Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.1 Ideal image in CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.2 Noisy image in CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

A Mathematical Supplement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.1 Real Analysis, Probability and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

A.1.1 Topologies, open sets, measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.1.2 State space, events, random quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27A.1.3 Distributions, densities, parametric families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28A.1.4 Exponential families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29A.1.5 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

A.2 Matrix Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32A.2.1 Subspace, span, basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.2.2 Linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34A.2.3 Similar matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35A.2.4 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

A.2.5 The four fundamental subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.3 Abstract Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A.3.1 Partitions and equivalence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38A.3.2 Groups and subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38A.3.3 Orbits, cycles, and the alternating groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41A.3.4 Cosets and the theorem of Lagrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A.3.5 Structure-relating maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43A.3.6 Factor groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44A.3.7 Series of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.3.8 Rings and fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3

Summary. Multi-frame blind deconvolution (MFBD) is a technique for reducing image degradation due to atmo-spheric turbulence. In this work we describe some mathematical methods for generalizing the MFBD algorithm. Inparticular, we describe nonabelian group translations, the superposition of which result in a more general definitionof convolution. Such a “generalized convolution” may be used to model the effects of anisoplanatism, or space-variantblur.1 Our basic atmospheric turbulence model takes the observed image to be a convolution of the true object with aspatially-invariant point spread function. A more general definition of convolution enables the basic model to accountfor more complex spatial and temporal variations of the point spread function. Since the transformation is still writtenand implemented as a convolution, the increase in model complexity comes at no increase in computational cost.

Section 1 briefly discusses the role that translation and translation-invariance play in DSP and how finite groupsprovide insights into these important concepts. Section 2 sets down some notation and mathematical prerequisites inpreparation for section 3 which describes how to formulate the MFBD algorithm using this mathematical formalism.

1 Space-variant blur (or spatially varying blur), is discussed in [8], among other places.

4

1

Introduction

The translation-invariance of most classical signal processing transforms and filtering operations is largelyresponsible for their widespread use, and is crucial for efficient algorithmic implementation and interpretationof results. Underlying most digital signal processing (DSP) algorithms is the group Z/N of integers moduloN , which serves as the data indexing set. Translations are defined using addition modulo N , and basicoperations, including convolutions and Fourier expansions, are developed relative to these translations.

DSP on finite abelian groups such as Z/N is well-understood and has great practical utility. Recently,however, interest in the practical utility of nonabelian groups has grown significantly. Although the theoreticalfoundations of finite nonabelian groups is well established, application of the theory to DSP has yet to becomecommon-place; cf. the NATO ASI “Computational Non-commutative Algebras,” Italy, 2003. Another notableexception is [1], which develops theory and algorithms for indexing data with nonabelian groups, definingtranslations with a (non-commutative) group multiply operation, and performing typical DSP operationsrelative to these translations. The work of An and Tolimieri demonstrates that including nonabelian groupsamong the possible data indexing strategies significantly broadens the range of useful signal processingtechniques.

The present work reviews some of the group theory that is useful for applications and then applies thetheory to the problem of imaging through turbulence. In particular, we demonstrate that nonabelian-groupindexing sets generalize the translation operator which, by superposition, yields a more general convolution.Our basic atmospheric turbulence model takes the observed image to be a (noisy) convolution of the trueobject with a spatially-invariant point spread function. A more general definition of convolution enables thebasic model to account for more complex spatial and temporal variations of the point spread function. Sincethe transformation is still written and implemented as a convolution, this increase in model complexity comesat no increase in computational cost.

2

Group Algebras

2.1 Finite Abelian Groups

We first collect some required notations and definitions. The presentation style is terse, but more detailsare provided in Section A.3. The books [10] and [1] treat similar material in a more thorough and rigorousmanner.

Recall that, by isomorphism, we take Z/N to denote the group of elements 0, 1, . . . , N−1 with additionmodulo N , and

Z/N1 × Z/N2 = (x1, x2) : 0 ≤ x1 < N1, 0 ≤ x2 < N2

is the direct product group, each element of which represents a 2-dimensional spatial coordinate.Multiplication modulo N is a ring product on the group of integers Z/N . An element m ∈ Z/N is called

a unit if there exists an n ∈ Z/N such that mn = 1. The set U(N) of all units in Z/N is a group with respectto multiplication modulo N , and is called the unit group of Z/N . The unit group U(N) can be characterizedas the set of all integers 0 < m < N such that m and N are relatively prime. For example, U(8) = 1, 3, 5, 7.

For a finite set A of order N , let L(A) denote the vector space of all complex valued functions on A withaddition and scalar multiplication defined by

(f + g)(a) = f(a) + g(a), f, g ∈ L(A), a ∈ A,

(αf)(a) = αf(a), f, g ∈ L(A), α ∈ C, a ∈ A.

The space L(A) has dimension N .

2.1.1 Quotient groups and periodic functions

Suppose A is an abelian group of order N and B is a subgroup of A. The vector space L(A/B) is the subspaceof all functions in L(A) which are constant on the B-cosets in A,

f(a + B) = f(a), f ∈ L(A), a ∈ A.

Such functions are called B-periodic. If B has order M , where LM = N , and al : 0 ≤ l < L is a completesystem of B-coset representatives in A, then functions in L(A/B) are completely determined by their valueson the set al : 0 ≤ l < L.

f(al + b) = f(al), 0 ≤ l < L, b ∈ B.

Since the collection of B-cosets, A/B = al + B : 0 ≤ l < L, is a partition of A, it determines a direct sumdecomposition,

L(A) =L⊕

l=0

L(al + B).

Such decompositions underlie many divide-and-conquer strategies; e.g., the fast Fourier transform (FFT).The following is a generic example of a periodic function in two variables. Such functions are useful in

image processing applications.

Example 2.1. Identify L(Z/N1 × Z/N2) with the space of all N1 × N2 complex matrices M(N1, N2). Inparticular, represent f ∈ L(Z/N1 × Z/N2) by the N1 ×N2 matrix

Mat(f) = [f(n1, n2)]0≤n1<N10≤n2<N2

.

Suppose A = Z/N1 × Z/N2 and B = L1Z/N1 × L2Z/N2, where N1 = L1M1 and N2 = L2M2. The set

(l1, l2) : 0 ≤ l1 < L1, 0 ≤ l2 < L2

is a complete system of B-coset representatives in A. Therefore, the (L1Z/N1 ×L2Z/N2)-periodic functionsin L(Z/N1 × Z/N2) can be identified with the collection of all complex N1 ×N2 matrices of the formM · · · M

.... . .

...M · · · M

, M = [f(l1, l2)]0≤l1<L10≤l2<L2

having block matrix structure with M1M2 identical copies of the matrix M ∈ CL1×L2 .

2.1.2 Abelian group convolution

Image data are often indexed by elements of the abelian group A = Z/N1 ×Z/N2. The following definitionsspecialize translation and convolution operations for such groups. Unless otherwise noted, we assume Adenotes the direct product group Z/N1 × Z/N2 throughout this section.

Definition 2.2 (Translation by y ∈ A).For A = Z/N1 × Z/N2 and y ∈ A, the mapping T(y) of L(A) takes f ∈ L(A) to a function T(y)f ∈ L(A)having the following values

(T(y)f)(x) = f(x− y) = f(x1 − y1, x2 − y2), x ∈ A.

T(y) is a linear operator on L(A) called translation by y.

Definition 2.3 (Convolution by g ∈ L(A)).Let A = Z/N1 × Z/N2 and g ∈ L(A). The mapping C(g) of L(A) defined by

C(g)f =∑y∈A

g(y)T(y)f f ∈ L(A)

is a linear operator of L(A) called convolution by g. Evaluated at a point,

(C(g)f)(x) =∑

y1∈Z/N1

∑y2∈Z/N2

g(y1, y2)f(x1 − y1, x2 − y2), x = (x1, x2) ∈ A

Finally, we define the characters for our special case. First recall the general case. The group of all N th

roots of unity is the setUN = ei2π n

N : 0 ≤ n < N

If τ is a character of the group A, then τ is a group homomorphism from A into UN , where N is the leastcommon multiple of the orders of the elements in A.

7

Definition 2.4 (Characters of Z/N1 × Z/N2).Let A = Z/N1 × Z/N2. For each a = (a1, a2) ∈ A, the mapping τa : A → U[N1,N2] defined by

τa(x) = ei2πx1a1N1 ei2π

x2a2N2 , x = (x1, x2) ∈ A.

is a character of A.

Any isomorphism from a finite abelian group A onto its character group A∗ is called a presentation of A.The presentation given by Definition 2.4 is called the standard presentation of A. We use this presentationexclusively, and for the remainder assume the identifications

A = Z/N1 × Z/N2 = (Z/N1 × Z/N2)∗ = A∗.

2.1.3 Cyclic groups, binary composition, general notations

Most of the groups discussed above have elements in the set Z/N , and we wrote the binary composition on thegroup as addition modulo N . Since our work involves nonabelian groups, for which the binary compositionis usually written as multiplication, it is simpler to write the binary operations of all groups, abelian ornonabelian, as multiplications. As the following discussion illustrates, groups such as Z/N with additionmodulo N have a simple multiplicative representation.

Definition 2.5 (Cyclic group).A group C is called a cyclic group if there exists x ∈ C such that every y ∈ C has the form y = xn for someinteger n. In this case, we call x a generator of C.

Cyclic groups are frequently constructed as special subgroups of arbitrary groups. Throughout this discussionG is an arbitrary group, not necessarily abelian.

Definition 2.6. For x ∈ G, the set of powers of x

gpG(x) = xn : n ∈ Z

is a cyclic subgroup of G called the group generated by x in G. When G is understood, we will write gpG(x)as gp(x).

It will be convenient to have notation for a cyclic group of order N without reference to a particularunderlying group. Denote by CN (x) the set of formal symbols

xn : 0 ≤ n < N

with binary composition satisfying

xmxn = xm+n, 0 ≤ m,n < N,

where m+n is addition modulo N . Then CN (x) is a cyclic group of order N having generator x. The identityof CN (x) is x0 and the inverse of xn is xN−n. Clearly the cyclic group CN (x) is isomorphic to the group Z/N ,and it is by this identification that the binary composition on Z/N is equivalently written as multiplication.

For an integer L ∈ Z/N , denote by gpN (xL) the subgroup generated by xL in CN (x). If L divides N ,then

gpN (xL) = xmL : 0 ≤ m < M, LM = N

and gpN (xL) is a cyclic group of order M .In image processing applications the set used to index the data is an important factor influencing perfor-

mance of the resulting algorithms. Typically image data are indexed by elements of direct products of cyclicgroups, such as

CN (x)× CN (y) = xmyn : 0 ≤ m,n < N,

8

Implicit in our discussions of this set will be some standard identifications, such as x1y0 = (x, 1) = x,x0y1 = (1, y) = y. Though they should not cause any ambiguity, such conventions may take some gettingused to and unfamiliar readers are encouraged to consult [1].

Let A = CN (x)× CN (y) and suppose B is the subgroup of A with elements in

gpN (xL)× gpN (yL) = xpLyqL : 0 ≤ p, q < M, where LM = N.

The group B is a direct product of cyclic groups of order M . The quotient group A/B is given by

A/B = xjykB : 0 ≤ j, k < L.

Each element xjykB ∈ A/B is a direct product of cyclic subgroups of A. The element xjykB is called theB-coset of A with representative xjyk. The elements within a particular coset are called equivalent moduloB. A complete set of B-coset representatives in A is

H = xjyk : 0 ≤ j, k < L = CL(x)× CL(y).

Thus, H is a direct product of cyclic groups of order L. Furthermore, any element a ∈ A can be uniquelywritten as

a = hb, h ∈ H, b ∈ B

where h specifies that a belongs to the coset hB, and b identifies a within that coset. We give concreteexamples of B-cosets for a few special cases.

Example 2.7. For N = 8, M = 2, L = 4,

A = C8(x)× C8(y) = xmyn : 0 ≤ m,n < 8, (2.1)

andB = gp8(x4)× gp8(y4) = xp4yq4 : 0 ≤ p, q < 2.

In the following figure, the numbers denote exponents mn on the elements xmyn ∈ A in (2.1).

nm

0 1 2 3 4 5 6 7

0 00 01 02 03 | 04 05 06 071 10 11 12 13 | 14 15 16 172 20 21 22 23 | 24 25 26 273 30 31 32 33 | 34 35 36 374 40 41 42 43 | 44 45 46 475 50 51 52 53 | 54 55 56 576 60 61 62 63 | 64 65 66 677 70 71 72 73 | 74 75 76 77

The boldface exponents comprise the B-coset with representative x0y0; that is,

x0y0B = B =[00 0440 44

]The B-coset with representative x1y0 is

x1y0B = xB =[10 1450 54

]A few more examples are the sets

yB =[01 0541 45

], xyB =

[11 1551 55

]x2B =

[20 2460 64

], x2yB =

[21 2561 65

]which are the B-cosets with representatives x0y1, x1y1, x2y0, and x2y1, respectively.

9

2.2 Finite Nonabelian Groups

This section describes a few tools from nonabelian group theory that can be usefully applied to the space-varying blur problem. Some of the material merely re-iterates or emphasizes previously stated facts aboutgroups and quotient groups, reinforcing these concepts and making this section is less dependent on thosepreceding it.

2.2.1 Two distinctions of consequence

Abelian group DSP can be completely described in terms of a special class of signals called the charactersof the group. (For Z/N , the characters are simply the exponentials.) Each character of an abelian grouprepresents a one-dimensional translation-invariant subspace, and the set of all characters spans the space ofsignals indexed by the group; any such signal can be uniquely expanded as a linear combination over thecharacters.

In contrast, the characters of a nonabelian group G, do not determine a basis for expanding signalsindexed by G. However, a basis can be constructed by extending the characters of an abelian subgroupA of G, and then taking certain translations of these extensions. Some of the characters of A cannot beextended to characters of G, but only to proper subgroups of G. This presents some difficulties involvingthe underlying translation-invariant subspaces, some of which are now multi-dimensional. However, it alsopresents opportunities for alternative views of local signal domain information on these translation-invariantsubspaces.

The other abelian/nonabelian distinction of primary importance concerns translations defined on thegroup. In the abelian group case, translations represent simple linear shifts in space or time. When nonabeliangroups index the data, however, translations are no longer so narrowly defined.

2.2.2 Nonabelian group convolution

Above we defined translation and convolution of L(A) for A an abelian group. We now define these for L(G)for the general (nonabelian) group G. (See [1] for elaboration.) Throughout, C denotes complex numbers, Gan arbitrary (nonabelian) group, and L(G) the collection of complex valued functions on G.

Definition 2.8 (Translation by y ∈ G).For y ∈ G, the mapping T(y) of L(G) defined by

(T(y)f)(x) = f(y−1x), x ∈ G

is a linear operator of L(G) called left translation by y.

Definition 2.9 (Convolution by f ∈ L(G)).The mapping C(f) of L(G) defined by

C(f) =∑y∈G

f(y)T(y), f ∈ L(G)

is a linear operator of L(G) called left convolution by f .

By definition,(C(f)g)(x) =

∑y∈G

f(y)g(y−1x), g ∈ L(G), x ∈ G

For f, g ∈ L(G), the compositionf ∗ g = C(f)g

is called the convolution product. The vector space L(G) paired with the convolution product is an algebra,the convolution algebra over G.

10

2.2.3 Abelian by abelian semidirect product

To determine whether a particular group is useful for a DSP application, we must specify exactly how thisgroup represents the data. The group representation may reduce computational complexity, or it may simplymake it easier to state, understand, or model a given problem.

In this section we describe procedures for specifying and studying a simple class of nonabelian groupsthat have proven useful in applications – the abelian by abelian semidirect products. These are perhaps thesimplest extension of abelian groups and DSP over such groups closely resembles that over abelian groups.However, the resulting processing tools can have vastly different characteristics.

Definition 2.10 (Internal semidirect product).Let G be a finite group of order N , K a subgroup of G, and H a normal subgroup of G. If G = HK andH ∩K = 1, then we say that G is the internal semidirect product G = HK.

It can be shown that G = HK if and only if every x ∈ G has a unique representation of the formx = yz, y ∈ H, z ∈ K.

The mapping Ψ : K → Aut(H), defined by

Ψz(x) = zxz−1, z ∈ K, x ∈ H

is a group homomorphism. Define the binary composition in G in terms of Ψ as follows:

x1x2 = (y1z1)(y2z2) = y1Ψz1(y2)z1z2,

y1, y2 ∈ H, z1, z2 ∈ K.

If G = HK and K is a normal subgroup of G, then G is the “usual” direct product (i.e., the cartesianproduct H × K along with component-wise multiplication). What is new in the semidirect product is thepossibility that K acts nontrivially on H.

The mapping Ψ : U(N) → Aut(CN (x)) is a group isomorphism. Under this identification, we can formCN (x)K for any subgroup K of U(N). A typical point in CN (x)K is denoted (xn, u), 0 ≤ n < N, u ∈ Kwith multiplication given by

(xm, u)(xn, v) = (xm+un, uv), 0 ≤ m,n < N, u, v ∈ K

where m + un is taken modulo N .

Action group

Recall that GL(2, Z/N) denotes the set of all 2 × 2 invertible matrices with coefficients in Z/N . For c ∈GL(2, Z/N) such that cM = 1 is the identity, consider the action group Kc with elements

CM (kc) = kmc : 0 ≤ m < M, c =

(c0 c1

c2 c3

)∈ GL(2, Z/N).

The semidirect product of H and Kc has elements

HKc = xjykkmc : 0 ≤ j, k < L, 0 ≤ m < M

and binary composition satisfying the following relations:

xL = yL = kMc = 1,

x−1 = xL−1, y−1 = yL−1, k−1c = kM−1

c ,

kcxjyk = xc0j+c1kyc2j+c3kkc.

where the summands in the exponents are modulo |H| = L.

11

Rotation.

Let A = CN (x)× CN (y) with binary composition satisfying

(xmyj)(xnyk) = xm+n mod Nyj+k mod N ,

xN = yN = 1, x−1 = xN−1, y−1 = yN−1.

Consider the action group Kc, c ∈ GL(2, Z/N), with cM = 1. The group generated by kc is the cyclic groupof order M with elements CM (kc) = km

c : 0 ≤ m < M. Now suppose

c(θ) =(

cos θ − sin θsin θ cos θ

).

Example 2.11 (Rotation by π/2). The action group Kc(π/2) has

c(π/2) =(

0 −11 0

).

Since c4(π/2) is the identity, the group has order M = 4.

The semidirect product AKc(θ) has elements xjykkmc(θ) : 0 ≤ j, k < N, 0 ≤ m < M, and binary

composition satisfyingxN = yN = kM

c(θ) = 1,

x−1 = xN−1, y−1 = yN−1, k−1c(θ) = kM−1

c(θ) ,

andkc(θ)x

jyk = xj cos θ−k sin θyj sin θ+k cos θkc(θ),

Additive operations in the exponents are modulo |A| = N .

2.3 The Group Algebra CG

The group algebra CG is the space of all formal sums

f =∑x∈G

f(x)x, f(x) ∈ C

with the following operations:

f + g =∑x∈G

(f(x) + g(x))x, f, g ∈ CG,

αf =∑x∈G

(αf(x))x, α ∈ C, f ∈ CG,

fg =∑x∈G

∑y∈G

f(y)g(y−1x)

x, f, g ∈ CG.

For g ∈ CG, the mapping L(g) of CG defined by

L(g)f = gf, f ∈ CG

is a linear operator on the space CG called left multiplication by g.Since y ∈ G can be identified with the formal sum ey ∈ CG consisting of a single nonzero term,

12

yf = L(ey)f =∑x∈G

f(y−1x)x (2.2)

In relation to translation of L(G), (2.2) is the CG analog.The mapping Θ : L(G) → CG defined by

Θ(f) =∑x∈G

f(x)x, f ∈ L(G)

is an algebra isomorphism of the convolution algebra L(G) onto the group algebra CG. Thus we can identifyΘ(f) with f , using context to decide whether f refers to the function in L(G) or the formal sum in CG.

An important aspect of the foregoing isomorphism is the correspondence between the translations of thespaces. Translation of L(G) by y ∈ G corresponds to left multiplication of CG by y ∈ G. Convolution ofL(G) by f ∈ L(G) corresponds to left multiplication of CG by f ∈ CG. To put it symbolically,

L(G) ↔ CG

T(y) ↔ L(y)C(f) ↔ L(f)

2.3.1 Ideals of CG

Definition 2.12. A subspace V of the space CG is called a left ideal if

uV = uf : f ∈ V ⊂ V, u ∈ G.

A left ideal of CG corresponds to a subspace of L(G) invariant under all left translations. If V is a left ideal,then, by linearity, gV ⊂ V, g ∈ CG. For g ∈ CG, the set CGg defined by fg : f ∈ CG is called the leftideal generated by g in CG. A left ideal V of CG is called irreducible if the only left ideals of CG containedin V are 0 and V.

The sum of two distinct, irreducible left ideals is always a direct sum. For an abelian group A, the groupalgebra CA is decomposed into a direct sum of irreducible ideals. The ideals are one-dimensional translation-invariant subspaces. Similarly, for a nonabelian group G, the group algebra CG is decomposed into a directsum of left ideals. Again, the ideals are translation-invariant subspaces, but some of them must now be multi-dimensional, and herein lies the potential advantage of using nonabelian groups for indexing the data. Theleft translations are more general and represent a broader class of transformations. Therefore, projections ofdata into the resulting left ideals can reveal more complicated partitions and structures as compared withthe Fourier components in the abelian group case.

2.4 Nonabelian Group Fourier Analysis

A character of G is a group homomorphism of G into C×, where C× = C \0. In other words, the mapping% : G → C× is a character of G if it satisfies %(xy) = %(x)%(y), x, y ∈ G.

By the aforementioned identification between L(G) and CG, a character % ∈ G∗ can be viewed as aformal sum,

% =∑x∈G

%(x)x, (2.3)

and therefore G∗ ⊂ CG.

Theorem 2.13. If G has a nontrivial character %, then

1N

∑x∈G

%(x) = 0

13

Proof. For any y ∈ G, by a change of variables,

%(y)Xx∈G

%(x) =Xx∈G

%(yx) =Xx∈G

%(x)

Therefore, either (a) %(y) = 1,∀y ∈ G, or (b)P

%(x) = 0. Since (a) contradicts the hypothesis, (b) must be true. ut

Theorem 2.14. Suppose % is a character of G. If y ∈ G,

y% = %y = %(y−1)%

Proof. As above, write % as a formal sum and change variables.

Corollary 2.15. Suppose % is a character of G. If f ∈ CG, then

f% = %f = f(%)%

where f(%) =∑

y∈G f(y)%(y−1).

Proof. By Theorem 2.14,

f% =Xy∈G

f(y)y% =

Xy∈G

f(y)%(y−1)

!%

Similarly for %f , mutatis mutandis. ut

14

3

Speckle Imaging

We present two descriptions of one image processing problem – reducing turbulence-induced image blur. Thefirst description (Section 3.1) is developed using mathematics that are common to this problem area, whilethe second description (Section 3.3) presents a more general framework that we believe is better suited tothe problem, especially in the presence of spatially varying (anisoplanatic) turbulence. To the best of ourknowledge, the problem has never before been approached from the point of view we describe in Section 3.3.

We have retained the traditional description of the problem for two reasons. First, it demonstrates the(well-known) fact that the computational cost incurred in the anisoplanatic case can be prohibitive, makingit infeasible to process reasonably-sized images under this paradigm. Second, by juxtaposing the traditionalapproach with our new approach, we hope readers will find the new methods more accessible, and theircomputational advantages more obvious.

3.1 Traditional Imaging Model

This section is based on Paul Billing’s notes [2], but the present exposition is set in the algebraic frameworkdescribed above. Throughout this section, unless otherwise noted, a 2-dimensional spatial coordinate planewill be represented by the indexing set

A = 0, 1, . . . , N1 × 0, 1, . . . , N2 ' Z/N1 × Z/N2,

though it will often be the case that N1 = N2.The most basic components of our model will be denoted by the following symbols:

f = object s = incoherent PSFg = noiseless image d = detected image

We assume that f, s, g, d ∈ L(A). By an image “ensemble” we mean a set of K images, or frames, indexedby k ∈ B = 0, 1, 2, . . . ,K − 1. For the kth frame in the ensemble, suppose1

gk = C(f)sk =∑y∈A

f(y)T (y)sk (3.1)

where C(f)sk denotes convolution of sk by f , and T (y) denotes translation of sk by y. In particular, at anypoint x in the image plane,1 We previously used gk = t ·C(f)sk, where t ∈ L(A) was a truncation window, and · denotes point-wise multiplica-

tion. However, since the function t does not influence any derivations of this section, it is notationally cleaner topostpone inclusion of the truncation factor.

gk(x) =∑y∈A

f(y)sk(x− y) =∑y∈A

f(y)T (y)sk(x), k ∈ B,

To be even more explicit, for x = (x1, x2) ∈ A,

gk(x1, x2) =N1−1∑y1=0

N2−1∑y2=0

f(y1, y2)sk(x1 − y1, x2 − y2), k ∈ B.

3.1.1 Poisson noise model

Recall the Poisson distribution with mean λ has a probability mass function (pmf) given by:

p(x|λ) =λxe−λ

x!

Suppose that, at each point x ∈ A, the value dk(x) is a realization of a Poisson process with mean gk(x).Then

p (dk(x)|gk(x)) =gk(x)dk(x)e−gk(x)

dk(x)!

To be more precise, we could suppose dk(x) is a realization of a doubly stochastic Poisson process2 withrate function λ(t;x) and mean,

gk(x) =∫ tk+1

tk

∫Nδ(x)

λ(t; z) dz dt

where Nδ(x) = z : ‖x− z‖ < δ is a δ-neighborhood3 of the spatial coordinate x, and (tk, tk+1] representsthe interval of time over which the image was exposed. Then we could take the (continuous) rate function λto represent the superposition of the object and the point spread function as follows:

λ(t;x) =∫

f(t; z)s(t;x, z) dz

Though this point of view may provide a more precise representation of reality, we must settle for theapproximation that results from estimating the discrete values gk(x). Interpolation of such values could giverise to an estimate of the rate function λ, but instead we take gk(x) as an approximation of the discretesuperposition of object and point spread function.

Denote the set of all pixels in images belonging to the detected ensemble by

d = dk(x)k∈Bx∈A

Similarly, denote the set of all pixels in images from the noiseless ensemble by

g = gk(x)k∈Bx∈A

Assuming independence, the joint pmf for pixels in the detected ensemble is

p (d|g) =∏k

∏x

p (dk(x)|gk(x)) =∏k

∏x

gk(x)dk(x) e−gk(x)

dk(x)!(3.2)

Here – and throughout unless otherwise noted – k ranges over B and x ranges over A.2 See Section A.1.4 for more about Poisson processes.3 Here we assume the (continuous) 2-dimensional image plane has been equally partitioned into δ-neighborhoods

about the points in our discrete index set.

16

Given that we observed d, we want to find the values g which yield a joint density, p (·|g), from whichthe observed data set, d, is the most likely outcome. In other words, we seek a maximum likelihood estimate(MLE) of g.

Given the data d, we maximize the right hand side of (3.2) over all values of g. Therefore, it makes senseto write (3.2) as a function of g given d:

L (g|d) =∏k

∏x

gk(x)dk(x) e−gk(x)

dk(x)!

This is typically called the likelihood function. It’s easier, and equivalent, to maximize the natural log of thelikelihood function, which is:

` (g|d) =K−1∑k=0

∑x

dk(x) log gk(x)− gk(x)− log dk(x)! (3.3)

The last term on the right hand side is constant, so it can be ignored for the purposes of maximizing the socalled “log likelihood function,” `.

Recall from (3.1), gk is defined as C(f)sk. Here, t is known and we are trying to estimate f and sk. Asa consequence, we arrive at an estimate of gk. Thus, instead of maximizing the likelihood over values of gk,we maximize over values of f and sk simultaneously.

3.1.2 Gradient with respect to f

Maximizing the log likelihood function over possible values of the object leads to the MLE of the object,which we denote by f . To derive this estimate we find that value of f at which the derivative, with respectto f(z), equals 0 for each z ∈ A.

∂` (g|d)∂f(z)

=K−1∑k=0

∑x∈A

∂

∂f(z)[dk(x) log gk(x)− gk(x)]

=K−1∑k=0

∑x∈A

∂

∂gk(x)[dk(x) log gk(x)− gk(x)]

∂gk(x)∂f(z)

=K−1∑k=0

∑x∈A

[dk(x)gk(x)

− 1]

∂gk(x)∂f(z)

and∂gk

∂f(z)=

∂

∂f(z)

∑y∈A

f(y)T(y)sk = T(z)sk

Therefore,∂` (g|d)∂f(z)

=K−1∑k=0

∑x∈A

[dk(x)gk(x)

− 1]

T(z)sk(x) (3.4)

In simple optimization problems, the gradient is a function of the variable over which we optimize. In thepresent case, the gradient is a function of gk, which in turn is a function of f via equation (3.1). Therefore,given a set of K PSF estimates sk0≤k<K , we seek a vector f , which, through (3.1), minimizes the magnitudeof (3.4) for each z ∈ A. That is, we must find the vector f which minimizes∣∣∣∣∣

K−1∑k=0

∑x∈A

[dk(x)gk(x)

− 1]

T(z)sk(x)

∣∣∣∣∣ , z ∈ A

17

3.1.3 Gradient with respect to sk

In order to maximize the likelihood function with respect to the PSF, sk, we could proceed by estimatingsk directly, but it is often useful to write sk as a function of phase-aberration, and then estimate the formof this function.

PSF phase parameterization

There are unknown phase errors across the aperture that distort the image. We represent this distortionwith a phase-aberration function, Φk, which may vary with k; in other words, the phase-aberration may bedifferent for each frame.

In deriving the MLE of sk using the second approach mentioned above, we first expand the phase-aberration function in a suitable basis. (Often a Zernike basis is used.) If we denote the set of basis functionsby ϕj0≤j<J , then the phase-aberration for the kth frame is expanded as follows:

Φk =J−1∑j=0

αkjϕj

We usually take the basis functions ϕj as given and not depending on k. On the other hand, the coefficientsαkj must be estimated for each k ∈ B.

Remark 3.1. If the choice of basis was physically motivated, it can help our intuition to think of αkj as theprojection of Φk onto the basis function ϕj . In fact, when ϕj0≤j<J is an orthonormal basis, the coefficientsreally are projections. In that case, we often write the expansion as follows:

Φk =J−1∑j=0

〈Φk, ϕj〉ϕj

The incoherent PSF for the kth frame is

sk = |hk|2 = hkh∗k (3.5)

where hk is the coherent PSF, which is the inverse Fourier transform of the coherent transfer function, orpupil function.

3.2 Some New Developments

Above we defined a character τa : A → UN by the mapping

τa(x) = ei2πx1a1N1 ei2π

x2a2N2 , x ∈ Z/N1 × Z/N2

Using this notation, we state some standard Fourier transform relations on which our model depends.

hk =1

N1N2F−1Hk =

1N1N2

∑u∈A∗

Hk(u)τu (3.6)

The index set of the summation in (3.6) is A∗, the group of characters of A. However, as noted above, weidentify the elements of A∗ with those of A by the standard presentation, and simply write the charactergroup as A∗ = Z/N1 × Z/N2.

At any x ∈ A, (3.6) evaluates to

18

hk(x) =1

N1N2

∑u∈A∗

Hk(u) ei2πx1u1

N1 ei2πx2u2

N2

The transfer function is the Fourier transform of the point spread function.

Hk = Fhk =∑x∈A

hk(x)τ∗x (3.7)

At any u ∈ A∗, (3.7) evaluates to

Hk(u) =∑x∈A

hk(x) e−i2πx1u1

N1 e−i2πx2u2

N2

Again, the summations are over the group A = Z/N1×Z/N2 and its character group, A∗. By isomorphism,the character group is the same index set, A∗ = Z/N1 × Z/N2.

The pupil function is complex-valued, and we take as its complex argument the phase-aberration function.That is,

Hk(u) = |Hk(u)| eiΦk(u) = |Hk(u)| eiP

j αkjϕj(u), u ∈ A (3.8)

From equations (3.5)–(3.8) we can write sk as a function of αkj .

sk = hkh∗k

=

(1

N1N2

∑u∈A∗

Hk(u)τu

)(1

N1N2

∑v∈A∗

H∗k(v)τ∗v

)

=1

(N1N2)2∑

u∈A∗

∑v∈A∗

Hk(u)H∗k(v)τuτ−v (3.9)

Remark 3.2. At this point, we diverge slightly to make an observation which simplifies equation (3.9). Sucha simplification may only hold when working in the group algebra C(Z/N1 × Z/N2) (see [1]). Therefore, wemake this point to indicate the potential benefits of switching to the C(Z/N1 ×Z/N2) framework for futureanalysis. Thereafter, we resume without benefit of this simplifying assumption.Assume the following character formula:

τuτv =

o(A)τu, τv = τu

0, otherwise(3.10)

Expression (3.10) holds for τu, τv ∈ CA. Under this assumption, (3.9) simplifies to

sk =1

N1N2

Xu∈A∗

Hk(u)H∗k (−u)τu

Returning to expression (3.9), since H∗k(u) = |Hk(u)| e−iϕk(u), we have

sk =1

(N1N2)2∑

u∈A∗

∑v∈A∗

|Hk(u)||Hk(v)| eiP

j αkj [ϕj(u)−ϕj(v)]τu−v (3.11)

Now that we have written sk as a function of αkj0≤j<J , we can maximize the likelihood function (3.3)with respect to αkj . For each index pair (a, b), we have

∂` (g|d)∂αab

=K−1∑k=0

∑x∈A

[dk(x)gk(x)

− 1]

∂gk(x)∂αab

19

and∂gk

∂αab=∑y∈A

∂gk

∂sk(y)∂sk(y)∂αab

(3.12)

By the definition of gk and commutativity of convolution,

gk = C(f)sk = C(sk)f =∑x∈A

sk(x)T(x)f

Therefore,∂gk

∂sk(y)= T(y)f (3.13)

It remains only to compute the partial of sk with respect αab. From (3.11),

∂sk

∂αab=

1(N1N2)2

∑u∈A∗

∑v∈A∗

|Hk(u)||Hk(v)| i[ϕb(u)− ϕb(v)]δ(k − a) eiP

j αkj [ϕj(u)−ϕj(v)]τu−v

Over the domain (u, v) ∈ A∗ ×A∗, define

Hab(u, v) =1

(N1N2)2|Ha(u)||Ha(v)| [ϕb(u)− ϕb(v)] ei

Pj αaj [ϕj(u)−ϕj(v)]+i π

2

Then,∂sa

∂αab=∑

u∈A∗

∑v∈A∗

Hab(u, v)τu−v, and∂sk

∂αab= 0, k 6= a (3.14)

Inserting equations (3.13) and (3.14) into equation (3.12) yields

∂ga

∂αab=∑y∈A

∑u∈A∗

∑v∈A∗

Hab(u, v)τu−v(y)T(y)f

=∑

u∈A∗

∑v∈A∗

Hab(u, v)C(τu−v)f

=∑

u∈A∗

∑v∈A∗

Hab(u, v)f(u− v)τu−v

wheref(u) =

∑y∈A

f(y)τu(y)

denotes the Fourier coefficient of f at u.We can now formally express the gradient of ` with respect to αab as follows:

∂` (g|d)∂αab

=∑x∈A

∑u∈A∗

∑v∈A∗

[da(x)ga(x)

− 1]Hab(u, v)f(u− v)τu−v(x) (3.15)

If we let ra = da

ga− 1, another Fourier coefficient in expression (3.15) becomes apparent.

ra(u− v) =∑x∈A

ra(x)τu−v(x)

from which∂` (g|d)

∂αab=∑

u∈A∗

∑v∈A∗

Hab(u, v)f(u− v)ra(u− v) (3.16)

20

Equation (3.16) provides a formal expression of the gradient. However, we require an expression whichbetter facilitates algorithmic implementation. Note that

sa = hah∗a = Re[ha]2 + Im[ha]2

is real valued. Therefore, we can write equation (3.11) as

sk(x) = Re[sk(x)] =1

(N1N2)2∑u,v

|Hk(u)||Hk(v)| cos

∑j

αkj [ϕj(u)− ϕj(v)] + 2π(u1 − v1)x1

N1+ 2π

(u2 − v2)x2

N2

from which, we have

∂sa(y)∂αab

=1

(N1N2)2∑u,v

|Ha(u)||Ha(v)| [ϕb(u)−ϕb(v)] sin

∑j

αaj [ϕj(u)− ϕj(v)] + 2π(u1 − v1)y1

N1+ 2π

(u2 − v2)y2

N2

(3.17)

Substituting (3.17) for∑

u,v Hab(u, v)τu−v in equation (3.15) yields

∂` (g|d)∂αab

=1

(N1N2)2∑x,y

∑u,v

[da(x)ga(x)

− 1]

t(x)f(x− y)|Ha(u)||Ha(v)| [ϕb(u)− ϕb(v)] (3.18)

× sin

∑j

αaj [ϕj(u)− ϕj(v)] + 2π(u1 − v1)y1

N1+ 2π

(u2 − v2)y2

N2

By (3.18) we see that one way to find a parameter set αajj yielding a zero gradient is to find one whichsatisfies

J−1∑j=0

αaj [ϕj(u)− ϕj(v)] + 2π(u1 − v1)y1

N1+ 2π

(u2 − v2)y2

N2= nπ, n ∈ Z (3.19)

3.2.1 The spatially varying case

The foregoing assumes that the point spread function is spatially invariant. We now state the problem forthe more general spatially varying case. Focusing at first on a single image frame, we need not involve theframe index k until later.

As before, identify A = Z/N1 × Z/N2 with its character group by the standard presentation.

A = Z/N1 × Z/N2 = (Z/N1 × Z/N2)∗ = A∗

Let y ∈ A be a spatial coordinate in the object plane. Then we model the noiseless image as thesuperposition,

g =∑y∈A

f(y)sy (3.20)

where g, f, sy ∈ L(A). Note that in this model our point spread function varies with y.Assume sy = hyh∗y, where

hy =1

N1N2F−1Hy =

1N1N2

∑u∈A∗

Hy(u)τu (3.21)

At any point x in the focal plane, (3.21) evaluates to

hy(x) =1

N1N2

∑u∈A∗

Hy(u) ei2πx1u1

N1 ei2πx2u2

N2 , x ∈ A

21

Thus far, the equations defining the PSFs and transfer function are identical to the invariant case, withthe frame index k replaced by a spatial coordinate in the object frame. However, a basic difference arises inthe functional form of the coherent transfer function.

Hy(u) = |Hy(u)| eiΨy(u) (3.22)

where

Ψy(u) =L−1∑`=0

Φ`(β`y + (1− β`)u)

The index ` represents locations along the elevation axis. Thus the total phase-aberration for the pair u, ytakes contributions from phase-aberration functions Φ` at various altitudes, evaluated at a point along theline connecting u and y.

If we denote the set of basis functions by ϕj0≤j<J , then the phase-aberration for a point ` along theelevation axis is represented by the following expansion:

Φ` =J−1∑j=0

α`jϕj

The basis functions ϕj usually do not depend on `, whereas the coefficients α`j must be estimated for each` ∈ 0, 1, . . . , L− 1. Finally, we have

Ψy(u) =L−1∑`=0

J−1∑j=0

α`jϕj(β`y + (1− β`)u) (3.23)

Remark 3.3. During implementation, we should be able to exploit periodicities of the basis functions. Thiswould facilitate, among other things, estimation of the coefficients α`j in equation (3.23). In particular,suppose ϕj is B-periodic; that is,

ϕj(x) = ϕj(x + y), y ∈ B (3.24)

If the order of B divides L, say M = L/o(B), then

L−1∑`=0

α`jϕj(β`y + (1− β`)u) =M−1∑m=0

∑y∈B

α(m+y)j

ϕj(βmy + (1− βm)u) (3.25)

We see that, without the benefit of periodicity, there are L coefficients to compute. Exploiting B-periodicity,we compute only M = L/o(B) coefficients.

N.B. the form of (3.25) is probably wrong due to the form of the operand of ϕj . To correct for this, weneed to represent the periodicities of ϕj along the line connecting y and u, rather than in the simple formgiven by (3.24).

Periodic PSF

Since s is ultimately a function of the basis ϕj, it should be possible to exploit periodicities in ϕj whenworking with s. This would greatly facilitate algorithmic implementation of the space-variant model.

Suppose the group A has order N . Let B be a subgroup of A with order L = o(B), where L = N/M .Later we address the problem of writing sy as a periodic function. For now, assume sy is B-periodic in y;that is, for each x ∈ A,

sa(x) = sa+b(x), a ∈ A, b ∈ B

Then, by (3.20),

22

g =∑y∈A

f(y)sy

=∑

a∈A/B

∑b∈B

f(a + b)sa+b

=∑

a∈A/B

(∑b∈B

f(a + b)

)sa

This reduces the order of spatial variance to o(A/B) = M .In order to find periodicities in sy, consider its functional form. By 3.21

sy = hyh∗y

=1

(N1N2)2

(∑u∈A∗

Hy(u)τu

)(∑v∈A∗

H∗y (v)τv

)

=1

(N1N2)2∑

u∈A∗

∑v∈A∗

Hy(u)H∗y (v)τu−v

Thus, we must locate periodicities inHy(u) = |Hy(u)| eiΨy(u)

where

Ψy(u) =L−1∑`=0

Φ`(β`y + (1− β`)u)

3.3 New Imaging Model

3.3.1 Ideal image in CG

Assume the object of interest is f ∈ CG, where G = HKc is as defined above. Then,

f =∑a∈G

f(a)a =∑j,k

∑m

f(xjykkmc )xjykkm

c

Take the function g ∈ CH to be some imperfect representation of f . For example, a blurry image is formedby mixing the object with a point spread function s ∈ CH.

g = fs =

∑j,k

∑m

f(xjykkmc )xjykkm

c

s (3.26)

This is a superposition in which the function s is left-multiplied by elements of G. Recall that left-multiplication of s ∈ CH generalizes translation of s ∈ L(H). Thus, the product fs in CG generalizesthe convolution f ∗ s in L(G). The following further elucidates.

Consider a single left-multiply of s ∈ CH by hk ∈ HKc.

L(hk)s = xjykkmc s

=∑p,q

s(xpyq)(xjykkmc )xpyq

=∑p,q

s((xjykkmc )−1xpyq)xpyq

23

So, by (3.26), the function g ∈ CH is defined by the values

g(xpyq) =∑j,k

∑m

f(xjykkmc )s((xjykkm

c )−1xpyq) (3.27)

The important thing to take from equation (3.27) is that left-multiplying by an element of a nonabeliangroup is a linear transformation that is quite general and includes classical translation as a special case.When the operand is an element of the abelian group H, left-multiply is a simple shift operator; e.g.,

L(xjyk)s(xpyq) = s((xjyk)−1xpyq) = s(xp−jyq−k) (3.28)

where, as usual, arithmetic in exponents is performed modulo |H|. A superposition of such left-multiplies ofCH is equivalent to classical convolution of L(H),

(C(f)s)(x) =∑y∈H

f(y)ys(x) =∑y∈H

f(y)s(y−1x)

However, when the operand is an element of the action group Kc, a much richer class of transformations isavailable, and this class can be constructed to include rotations, scale changes, and other revealing trans-formations. This is an important and powerful consideration since it allows us to apply our existing fastalgorithms for the standard convolution to operations that are more general than simple spatial or temporalshifts.

3.3.2 Noisy image in CG

This physical model underlying the algorithms of this section is based on theory found in the book [6], thepaper [3] and the notes [2]. However, the present exposition is set in the mathematical framework describedabove.

Let f denote the object of interest and define

g = ideal (noiseless) image, d = detected image,

h = coherent OTF, s = incoherent OTF.

and assume f, g, d ∈ CG, and h ∈ CG∗; that is the domain of h is G∗, the character group of G, which canbe interpreted as frequency space. The incoherent OTF, s ∈ CG∗, is defined as the autocorrelation of thecoherent OTF. We denote this autocorrelation by h ? h and define it as follows:

s(τ) = (h ? h)(τ) =∑

λ∈G∗

h(λ)h(τ−1λ), τ ∈ G∗.

As an element of the group algebra CG∗, s is represented as the formal sum

s =∑

τ∈G∗

s(τ)τ.

The coefficients s(τ) in this basis can be derived using the following identities:

h(λ) =∑x∈G

h(x)λ(x−1), τ(x) = τ−1(x) = τ(x−1), (τ−1λ)(x) = τ−1(x)λ(x)

The coefficient of s at τ ∈ G∗ is now readily derived as follows:

24

s(τ) = (h ? h)(τ) =∑

λ∈G∗

h(λ)h(τ−1λ)

=∑

λ∈G∗

∑y∈G

h(y)λ(y−1)

(∑x∈G

h(x)(τ−1λ)(x−1)

)

=∑x∈G

∑y∈G

h(y)h(x) τ−1(x−1)∑

λ∈G∗

λ(y−1)λ(x−1)

=∑x∈G

∑y∈G

h(y)h(x)τ(x−1)∑

λ∈G∗

λ(y−1x)

= |G|∑x∈G

h(x)h(x)τ(x−1)

The last equality holds by the following character formula:

∑λ∈G∗

λ(y−1x) =

|G|, x = y,

0, x 6= y.

In our case the rate function is g, the noiseless image. Thus λ(t) = g(t) is a sample function of a randomprocess Λ(t) and, according to Section A.1.4, U(t) is called a doubly stochastic Poisson process.

25

A

Mathematical Supplement

Here we present some background, notation, tools, and conventions that are useful for mathematical modelingand analysis.

A.1 Real Analysis, Probability and Statistics

What follows is a sparse collection of results from the theory of measures, probabilities and statistics.The presentation style may seem terse and incoherent, especially to those without some background inprobability. There are many other treatments of this material, some of which are far more comprehen-sive and rigorous; my personal favorite is the book by Schervish [9]. Durrett [4] is a standard probabilitytextbook aimed at the first-year graduate student. A set of notes by Varadhan [11] is available online athttp://www.math.nyu.edu/faculty/varadhan/limittheorems.html.

A.1.1 Topologies, open sets, measures

The concept of a measure enables us to assign numerical values to the “sizes” of sets. Thus the domain ofa measure is a collection of sets. In general, we cannot assign sizes to all sets, but we need enough sets sothat we can take unions and complements. The first step is to agree on what we mean by an “open set on atopological space.”

Definition A.1 (Topological space, topology, open set).A space T is a topological space if it has a collection τ of subsets which satisfies the following conditions:

1. ∅ ∈ τ2. T ∈ τ3. τ is closed under finite intersections.4. τ is closed under arbitrary unions.

Conditions 3 and 4 are equivalently stated as follows:

N−1⋂n=0

An ⊂ τ,⋃n

An ⊂ τ, for all An ∈ τ.

The collection τ is called a topology, and the sets in τ are called open sets.

For the remainder let Ω denote a nonempty set.

Definition A.2 (σ-field).A collection of sets that is closed under complements and finite unions is called a field. A field that is closedunder countable unions is called a σ-field.

Example A.3 (Trivial σ-field, power set).Let Ω be any set. Let A = Ω, ∅. This is called the trivial σ-field. The largest σ-field is the collection of allsubsets of Ω, called the power set of Ω and denoted 2Ω. Further examples of σ-fields includeA = Ω, A, AC , ∅for A ⊂ Ω. If B is another subset of Ω, then A = Ω, A, B, AC , BC , A ∩B,A ∩BC , . . ., etc.

It is easy to prove that the intersection of an arbitrary collection of σ-fields is itself a σ-field. Since 2Ω

is a σ-field, it is easy to see that, for every collection of subsets C of Ω, there is a smallest σ-field A thatcontains C, namely the intersection of all σ-fields that contain C. This smallest σ-field is called the σ-fieldgenerated by C.

Definition A.4 (Borel σ-field).The σ-field generated by the collection C of open subsets of a topological space is called the Borel σ-field.

Definition A.5 (Measure, measure space).If A is a σ-field of subsets of a set Ω, then a measure µ on Ω is a function from A to the extended realnumbers, R ∪ ±∞, that satisfies

• µ(∅) = 0,• An∞n=0 mutually disjoint implies µ (∪∞n=0An) =

∑∞n=0 µ (An) .

If µ is a measure, the triple (Ω,A, µ) is called a measure space.

Definition A.6 (Probability, probability space).If (Ω,A, µ) is a measure space and µ(Ω) = 1, then µ is called a probability and (Ω,A, µ) is called a probabilityspace.

A.1.2 State space, events, random quantities

A concept which is often useful for describing natural phenomena is the “state of nature.” Typically, ωdenotes a particular state of nature. For example, ω may represent the population size, positions of planets,etc. We denote by Ω the set of all possible states of nature, and refer to this as the state space. If A is aσ-field of subsets of Ω, we call each element A ∈ A an event. Thus, an event may contain multiple states ofnature; for instance,1 A = ωα(0), ωα(1), ωα(2), . . . , ωα(N−1) ∈ A.

As the foregoing discussion suggests, given a probability space (Ω,A, µ), we take Ω to be the state space,and A a collection of events in Ω. The function µ, constructed to measure “sizes” of sets in A, also has anobvious probabilistic interpretation; the “size” µ(A) of the set A ∈ A represents the probability that eventA occurs, and is sometimes denoted P (A).

Definition A.7 (Random quantity). Suppose (Ω,A, µ) is a probability space and (X,X ) is a measurablespace. A measurable function X : Ω → X is called a random quantity. Often the measurable space is(X,X ) = (R,B) – where B is the Borel σ-field– in which case X is called a random variable.

Usually the set X of all possible values of X is called the sample space, or outcome space. We denoterandom quantities by capital letters, and their sample spaces by the corresponding letters in fraktur font.For example

X : Ω → X, U : Ω → U, N : Ω → Nn.

As in the third example, bold font is used to indicate that the random quantity is vector-valued. Commonsample spaces are R, C, Zn, subsets thereof, etc.

To each state of nature ωk ∈ Ω we associate a particular realization of the random quantity, whichwe denote by the corresponding lower case symbol. For example, a realization of the random quantityX : Ω → RN is

x = (x0, x1, . . . , xN−1) = (X0(ωk), X1(ωk), . . . , XN−1(ωk)) = X(ωk) (A.1)

1 The symbol α indicates that the set from which indices are chosen may be uncountably infinite.

27

The sample space is usually some subset of a Euclidean space, but will always be a Borel space.2 Let Bdenote the Borel σ-field of all measurable subsets of X, and let AX denote the sub-σ-field of A generated byX. That is, AX = X−1(B). Since (X,B) is a Borel space, B contains all singletons. This allows us to claimthat functions of X qualify as random functions if and only if they are measurable with respect to AX . Wewill often refer to particular realizations of X as the data.

Recap.

As described thus far, the probability modeling process reduces to only a few main ideas.

1. Conduct experiments and/or make observations; the observed state of nature is ω ∈ Ω.2. Evaluate the function X at the point ω; the result is a particular realization of the random quantity X,

namely the random sample x = X(ω).

A.1.3 Distributions, densities, parametric families

Recall that, if (Ω,A, µ) is a probability space and (X,B) is a measurable space, then the measurable functionX : Ω → X is a random quantity. If X = R and B is the Borel σ-field, then X is called a random variable.To compute the probability that a random quantity X has a particular realization requires the distributionof X.

Definition A.8 (Distribution).The probability measure µX induced on (X,B) by X from µ is called the distribution of X. The distributionis called discrete if there exists a countable set A ⊆ X such that µX(A) = 1, and continuous if µX(x) = 0for all x ∈ X.

For example, if B ∈ B, the probability that X(ω) ∈ B is given by

µX(B) = µω : X(ω) ∈ B = P (X ∈ (B))

The symbol P (·) denotes the probability µ of those ω ∈ Ω for which the expression in the parentheses istrue.

The distribution of a random variable is often described by its cumulative distribution function.

Definition A.9 (Cumulative distribution fucntion).A function F is called a cumulative distribution function (CDF) if it has the following properties:

• F is nondecreasing;• limx→−∞ F (x) = 0;• limx→+∞ F (x) = 1;• F is continous from the right.

If X is a random variable, then the function FX(x) = P (ω : X(ω) ≤ x) is a CDF. In this case, FX iscalled the CDF of X.

Definition A.10 (Density).Let (Ω,A, µ) be a probability space, and let (X,B, ν) be a measure space. Suppose that X : Ω → X ismeasurable. Let µX be the measure induced on (X,B) by X from µ. Suppose µX ν. Then the density ofX with respect to ν is the Radon-Nikodym derivative fX = dµX/dν.

Proposition A.11. If h : X → R is measurable and fX is the density of X with respect to ν, then∫h(x) dFX(x) =

∫h(x)fX(x)dν(x).

2 Definition B.31. Schervish [9], page 618.

28

Definition A.12 (Expected value).If X is a random variable with CDF FX(·), then the expected value (or mean, or expectation) of X is

E (X) =∫

xdFX(x)

If X is a random vector, then E (X) denotes the vector with coordinates equal to the means of the coordinatesof X.

Following most paradigms for statistical inference, we suppose that some random quantities X0, X1, . . . , XN−1

all have the same distribution. However, we may be unwilling or unable to precisely state what that dis-tribution is. Instead, we create a collection of distributions, called a parametric family and denoted Π. Forexample Π might consist of all normal distributions, or just those normal distributions with variance 1, orall Poisson distributions, etc. Each of these cases has the property that the collection of distributions can beindexed by a finite dimensional real quantity, which is commonly called a parameter. We let P denote the setof all possible parameter values, and (P,P), the associated measurable space, which we call the parameterspace.

It can be useful – especially from a Bayesian point of view – to adopt the modern convention of writingparameters in upper case; The reason becomes apparent when we view parameters as random quantities. Assuch, they are measurable functions from Ω to P. For example, the parameter-space analog of (A.1) is

θ = (θ0, θ1, . . . , θN−1) = (Θ0(ωk), Θ1(ωk), . . . , ΘN−1(ωk)) = Θ(ωk)

As mentioned, the collection of distributions of a parametric family is indexed by elements of P, and itis often useful to make this explicit. Thus, to each value θ ∈ P, corresponds the following:

• a probability measure µθ, defined on (Ω,A),• a distribution µX|θ, induced on (X,B) by X from µθ,• a density (or mass) function fX|Θ (·|θ).

For example, if X is a single random variable with continuous distribution, then the probability that X takeson a value in the interval [x0, x1) is denoted

P (x0 ≤ X(ω) < x1) (A.2)

With respect to a probability µ on (Ω,A), such as that of Definition A.6, the quantity (A.2) is found bymeasuring the set of all ω for which X(ω) ∈ [x0, x1). That is,

P (x0 ≤ X(ω) < x1) = µω : x0 ≤ X(ω) < x1.

P· (θ)

Similarly, we can add a subscript to E, the expectation operator, to make it more clear that E· (θ) refers tointegration with respect to probability measure µθ.

A.1.4 Exponential families

Definition A.13 (Exponential family).(Schervish [9], p. 102) A parametric family with parameter space P and density fX|Θ(x|θ) with respect toa measure ν on (X,B) is called an exponential family if

fX|Θ(x|θ) = c(θ)h(x) exp

[K−1∑k=0

πk(θ)tk(x)

](A.3)

for some measurable functions πk0≤k<K , tk0≤k<K , and some integer K.

29

Since fX|Θ in Definition A.13 is a probability density, the function c(θ) can be written as

c(θ) =

∫X

h(x) exp

[K−1∑k=0

πk(θ)tk(x)

]dν(x)

−1

so that dependence on θ is through the vector π(θ) = (π0(θ), . . . , πK−1(θ)) ∈ RK .

Example A.14 (Gaussian).Suppose Xn∞n=0 are i.i.d. normal random variables with mean µ and variance σ2; let θ = (µ, σ2) andx = (x0, x1, . . . , xN−1). Then,

fX|Θ(x|θ) =(σ√

2π)−N

exp

[− 1

2σ2

N−1∑n=0

(xn − µ)2]

= (2π)−N/2σ−N exp(−Nµ2

2σ2

)exp

(µ

σ2N x− 1

2σ2

N−1∑n=0

x2n

)

In this form it is clear that fX|Θ is expressed in terms of equation (A.3) with the following definitions:

K = 2, c(θ) = σ−N exp(−Nµ2

2σ2

), h(x) = (2π)−N/2,

π0(θ) =µ

σ2, π1(θ) = − 1

2σ2, t0(x) = N x, t1(x) =

N−1∑n=0

x2n

Example A.15 (Poisson).Suppose Xn∞n=0 are i.i.d. Poisson λ random variables; let θ = λ and x = (x0, x1, . . . , xN−1). Then,

fX|Θ(x|θ) =N−1∏n=0

e−θθxn

xn!=

exp(−Nθ + log θ ·

∑N−1n=0 xn

)∏N−1

n=0 xn!

= e−Nθ

(N−1∏n=0

xn!

)−1

exp (log θ ·N x)

In this form it is clear that K = 1,

c(θ) = e−Nθ, h(x) =

(N−1∏n=0

xn!

)−1

, π0(θ) = log θ, t0(x) = N x.

A particular exponential family of distributions has a fixed K and defines the (vector-valued) functionsionsπ : P → RK and t : X → RK . By composition with the functions Θ : Ω → P and X : Ω → X, we arrive atthe following functions of Ω:

π(Θ) : Ω → RK , t(X) : Ω → RK

The inner product of the resulting vectors is a mapping from Ω to R, and the result is the value in theexponent of (A.3). In the statistics literature the function π(Θ) = (π0(Θ), . . . , πK−1(Θ)) is called the naturalparameter and

Π =

π ∈ RK :∫

X

h(x) exp[π(θ)tt(x)

]dν(x) < ∞

the natural parameter space. The function t(X) is a sufficient statistic – the natural sufficient statistic –and is usually denoted T (X). We adopt this notation in the sequel. We make two further abuses of notationletting t denote a particular realization of t(X) and θ a particular realization of π(Θ).

30

Lemma A.16. (Schervish [9], p. 103) If X has an exponential family distribution, then so does the naturalsufficient statistic T (X) and the natural parameter space for T is the same as that for X. In particular, thereexists a dominating measure νT such that

fT |Θ(t|θ) =dPΘ,T

dνT(t) = c(θ) exp(θtt)

is the density of T given Θ = θ, with respect to νT .

Theorem A.17. (Schervish [9], p. 105)Let the density of T with respect to a measure νT be c(θ) exp(θtt). If φ : T → R is measurable and∫

|φ(t)| exp(θtt) dνT (t) < ∞

thenf(z) =

∫φ(t) exp(ztt) dνT (t)

is an analytic function of z in the region where the real part of z is interior to the natural parameter space,and

∂

∂zkf(z) =

∫tkφ(t) exp(ztt) dνT (t)

Theorem A.17 allows us to calculate moments of sufficient statistics in exponential families by taking deriva-tives of the function log c(θ).

Example A.18. Let φ(t) = 1. Then,

Eθ(Tk) =∫

c(θ)tk exp(θtt) dνT (t) = c(θ)∂

∂θk

∫exp(θtt) dνT (t)

= c(θ)∂

∂θk

1c(θ)

(by Lemma A.16)

= − 1c(θ)

∂c(θ)∂θk

= − ∂

∂θklog c(θ).

Poisson impulse processes

In many studies of physical phenomena the number of times something occurs during a given interval oftime is of primary importance. Often it is appropriate to model such counting processes as stochastic witha Poisson distribution. The counting of photon emissions is an important example.3

Definition A.19 (Poisson process).Let U(t) be a random process with sample functions u(t). We call U(t) a Poisson process if

1. The probability that k events occur within the time interval (t0, t1] is given by

P (k; t0, t1) =

(∫ t1t0

λ(t) dt)k

k!exp

(−∫ t1

t0

λ(t) dt

)where λ(t) ≥ 0 is called the rate of the process.

2. The number of events occurring in any two non-overlapping time intervals are statistically independent.3 A good reference covering statistical optics, in general, and photon counts in particular, is [6].

31

Example A.15 showed that a family of Poisson distributions, with parameter space P = λk, is anexponential family of distributions.

For a given rate function λ(t), the mean and second moment of the number of events occurring in theinterval (t0, t1] is readily seen to be

EK|Λ(k|λ) =∫ t1

t0

λ(t) dt (A.4)

andEK|Λ(k2|λ) = EK|Λ(k|λ) +

[EK|Λ(k|λ)

]2,

respectively.

Definition A.20 (Doubly stochastic Poisson process).If U(t) is a Poisson process with a rate function λ(t) that is itself a sample function of a random processΛ(t), then U(t) is called a doubly stochastic Poisson process.

Starting from (A.4), we arrive at the unconditional mean of K by integrating with respect to the probabilitymeasure µΛ as follows:

E (K) = EEK|Λ(k|λ) (Λ) =∫ ∫ t1

t0

λ(t) dt dµΛ(λ) =∫ t1

t0

E (Λ(t)) dt.

If we assume Λ(t) is a stationary random process, then

E (K) = E (Λ) (t1 − t0).

A.1.5 Maximum likelihood estimation

In exponential families, there is a simple method for finding MLEs in most cases. The logarithm of thelikelihood function will be

log L(θ) = log c(θ) + xtθ

If the MLE occurs in the interior of the parameter space, it occurs where the partial derivatives of log L(θ)are 0; that is, where

xk = − ∂

∂θklog c(θ)

By the result of Example A.18, the MLE is that θ for which x = EθX.

A.2 Matrix Analysis

The main source of most material in this section is Horn and Johnson, Matrix Analysis, [7].

Definition A.21 (Scalar field).Underlying a vector space is the field, or set of scalars, on which addition and multiplication occurs. Forour purposes, that underlying field will usually be the real numbers R or the complex numbers C under theusual addition and multiplication operations, but it could be the rational numbers Q, the integers moduloa specified number Z/NZ, or some other field. When the field is unspecified, we use the symbol F. Toqualify as a field, a set of scalars must be closed under two specified binary operations (“addition” and“multiplication”); both operations must be associative and commutative and have an identity element inthe set; inverses must exist in the set for all elements under the addition operation and for all elementsexcept the additive identity (0) under the multiplication operation; the multiplication operation must alsobe distributive over the addition operation. (See section A.3.8 for a more formal definition of a field.)

32

Definition A.22 (Vector space).A vector space V over a field F is a set V of objects (“vectors”) which is closed under a binary operation(“addition”) and such that an identity (0) and additive inverses exist in the set V. The set is also closedunder an associative and commutative operation of left multiplication of vectors by elements of the scalarfield, with the following properties: for all α, β ∈ F and all x,y ∈ V:

1. α(x + y) = αx + αy2. (α + β)x = αx + βx3. α(βx) = (αβ)x4. ex = x, for the multiplicative identity e ∈ F

For a given field F, the set Fn of n-tuples (n a positive integer) with components from F forms a vectorspace over F under the obvious operations (component-wise addition in Fn). The special cases with whichwe are usually concerned are the n-dimensional vector spaces over the real and complex fields, Rn and Cn,respectively.

A.2.1 Subspace, span, basis

Definition A.23 (Subspace).A subspace U of a vector space V is a subset of V that is, by itself, a vector space over the same scalar field.

Usually a subspace of a vector space V is defined by some relation that identifies particular elements ofV in such a way that the resulting set is closed under the addition operation – for example, the elements ofR3 with the last component 0, denoted (α, β, 0) : α, β ∈ R, is a subspace of R3. It is in this regard thatwe find it useful to think of the resulting set as a subspace rather than as a vector space in its own right. Inany event, the intersection of two subspaces is again a subspace.

Definition A.24 (Span, n.).If S is a subset of a vector space V, denoted S ⊂ V, then the span of S is the set

span(S) = α0v0 + · · ·+ αk−1vk−1 | α0, . . . , αk−1 ∈ F,v0, . . . ,vk−1 ∈ S, k = 1, 2, . . .

Notice that span(S) is always a subspace even if S is not a subspace.The subset S ⊂ V is said to “span” the vector space V if span(S) = V. In other words, span is also a

verb, defined as follows:

Definition A.25 (Span, v.).The subset S spans V if every element of V may be written as a linear combination of elements of S.

For example, the set 1

00

,

010

,

001

,

111

(A.5)

spans R3.

Definition A.26 (Basis).A a linearly independent set which spans a vector space V is called a basis for V.

The set (A.5) does not comprise a basis since the vectors are linearly dependent. However, three vectorsfrom that set do form a basis.

The basis for Rn given by

10...0

,

01...0

, . . . ,

0...01

(A.6)

33

is called the “elementary basis,” often denoted e0, e1, . . . , en−1.Bases are highly non-unique, but are very efficient in that each element of V can be represented uniquely

in terms of the basis. In this sense a basis is a minimal spanning set. For example, if B0 = v0,v1, . . . ,vn−1is a basis for V, and x ∈ V, then x can be written uniquely as the linear combination

x = α0v0 + α1v1 + · · ·+ αn−1vn−1 (A.7)

Exercise A.27. Show that the set of coefficients describing x in terms of B0 is unique.

Bases are important because we can represent all the objects of our vector space in terms of the elements ofa given basis for the space. Thus, the choice of basis will determine how the objects under investigation appear,and an analysis that appears complex or intractable in one basis might appear simple when represented inanother basis.

A.2.2 Linear transformation

Throughout, M(m,n, F) denotes the set of all m × n matrices with elements in F, and M(n, F) the set ofn×n matrices over F. When the field is of no consequence, or is clear from the context, these are abbreviatedto M(n, m) and M(n). The subset of all invertible elements, or units, in M(n, F) is denoted GL(n, F).

Let U be an n-dimensional vector space, V an m-dimensional vector space over the same scalar fieldF, and let B0 and B1 be bases for U and V, respectively. Recall that, as a linear combination of the basiselements, ui ∈ B0,

x = α0u0 + α1u1 + · · ·+ αn−1un−1 =n−1∑k=0

αkuk (A.8)

We denote the vector of coefficients by [x]B0 = (α0, α1, . . . , αn−1) ∈ Fn. Similarly, any vector y ∈ V repre-sented in terms of B1 is denoted [y]B1 .

Definition A.28 (B0-basis representation).The vector [x]B0 is called the B0-basis representation of x.

It is sometimes helpful to let x(uk) denote the coefficient αk, by which equation (A.8) becomes

x =n−1∑k=0

x(uk)uk

The scalar quantities x(uk) ∈ F are called the coordinates of x with respect to B0.

Definition A.29 (Linear transformation).A mapping T from a vector space U to a vector space V is a linear transformation if

T(αx0 + βx1) = αTx0 + βTx1 (A.9)

for all vectors x0,x1 ∈ U and all scalars α, β ∈ F.

A linear transformation T : U → V always has a matrix representation, A, which satisfies the followingcorrespondence:

Tx = y ⇔ A[x]B0 = [y]B1

The matrix A represents the linear transformation T relative to the bases B0 and B1. When we study aparticular matrix, we are studying a linear transformation relative to a particular choice of bases. Thuswe see that, when changing the basis used to represent objects, though the appearance of the objects maychange, some fundamental structure is preserved.

34

A.2.3 Similar matrices

The notion that two matrices have a common “fundamental structure” is an important idea in linear algebra,and it has a precise definition.

Definition A.30 (Similar matrices).A matrix B ∈ M(n) is similar to a matrix A ∈ M(n) if there exists a nonsingular matrix S ∈ M(n) suchthat

B = S−1AS

The transformation A → S−1AS is called a similarity transformation and S is called the similarity matrix.The relation “A and B are similar” is sometimes abbreviated A ∼ B. Similarity is an equivalence relation onM(n). Equivalence relations are discussed further in section A.3.1. The important point here is that similarity,like any equivalence relation, partitions the set M(n) into disjoint equivalence classes. Each equivalence classis the set of all matrices in M(n) that are similar to a given matrix, a representative of the class.

To summarize the foregoing, the statement that two distinct matrices are similar means that they aremerely different basis representations of the same linear transformation – they have a common “fundamentalstructure.” Such structure partitions the set of all matrices into classes, thus reducing it to a subset consistingof one “canonical” matrix per class.

A.2.4 Change of basis

Let U be an n-dimensional vector space over the field F. Let B0 = u0,u1, . . . ,un−1 be a basis for U . Asremarked above,

x =n−1∑k=0

x(uk)uk, x ∈ U

The linear mapping

x → [x]B0 =

x(u0)x(u1)

...x(un−1)

from U to Fn, is well defined, one-to-one, and onto.

Given a linear transformation, T : U → U , the action of T on x ∈ U is determined once we know [x]B0

and the n vectors Tu0,Tu1, . . . ,Tun−1. This is clear by the linearity – equation (A.9) – according to which,

Tx = T

(n−1∑k=0

x(uk)uk

)=

n−1∑k=0

x(uk)Tuk

Let B1 = v0,v1, . . . ,vn−1 be another basis for U . For j = 0, 1, . . . , n− 1, let

[Tuj ]B1 =

Tuj(v0)Tuj(v1)

...Tuj(vn−1)

=

t0j

t1j

...tn−1j

denote the B1-basis representation of Tuj . Then, for x ∈ U ,

35

[Tx]B1 =

n−1∑j=0

x(uj)Tuj

B1

=n−1∑j=0

x(uj) [Tuj ]B1

(A.10)

=n−1∑j=0

x(uj)

t0j

t1j

...tn−1j

=

t00 · · · t0n−1

.... . .

...tn−10 · · · tn−1n−1

x(u0)

...x(un−1)

The n-by-n array tij0≤i,j<n in equation (A.10) depends on T and on the choice of bases B0 and B1, butit does not depend on x. The following definition generalizes this discussion slightly by letting T take x ∈ Uinto another vector space V.

Definition A.31. Given a basis B0 = u0,u1, . . . ,un−1 for U , a basis B1 = v0,v1, . . . ,vm−1 for V, anda linear transformation T : U → V, the B0-B1-basis representation of T is given by the m× n array

B1[T]B0 =([Tu0]B1 · · · [Tum−1]B1

)=

Tu0(v0) · · · Tun−1(v0)...

. . ....

Tu0(vm−1) · · · Tun−1(vm−1)

By the foregoing definition and equation (A.10),

[Tx]B1 =B1 [T]B0 [x]B0 , x ∈ U (A.11)

In practice, the case B1 = B0 is the most common one for presenting a basis representation of T, and thearray B0[T]B0 is called the B0-basis representation of T.

Consider the identity transformation I : U → U , defined by Ix = x, for all x ∈ U . By (A.11),

[x]B1 = [ Ix]B1 =B1 [ I ]B0 [x]B0

Thus, for x ∈ U , the B0-B1-basis representation of I maps [x]B0 to [x]B1 , and the matrix B1[ I ]B0 is called theB0-B1 change of basis matrix. Furthermore,

B1[T]B1 [x]B1 = [Tx]B1 = [ I T Ix]B1

= B1[ I ]B0 B0[T]B0 [ Ix]B0

= B1[ I ]B0 B0[T]B0 B0[ I ]B1 [x]B1

Hence,B1[T]B1 =B1 [ I ]B0 B0[T]B0 B0[ I ]B1

andB0[ I ]B1 B1[T]B1 =B0 [T]B0 B0[ I ]B1

Exercise A.32. Given two bases, B0 = u0,u1, . . . ,un−1 and B1 = v0,v1, . . . ,vm−1, let U and V bethe matrices in GL(n, F) having the basis vectors as their columns; that is,

U = (u0 u1 · · · un−1 ), V = (v0 v1 · · · vm−1 )

Show that the B0-B1 change of basis matrix satisfies

V B1[ I ]B0 = U (A.12)

36

As Exercise A.32 shows, given two matrices, U and V , whose column vectors define bases B0 and B1,respectively, it is trivial to write down the B0-B1 change of basis matrix in terms of these two matrices:

B1[ I ]B0 = V −1U

Every invertible matrix is a change-of-basis matrix, and every change-of-basis matrix is invertible. Thus,if B0 is a basis for vector space U , if T is a linear transformation on U , and if A =B0 [T]B0 denotes theB0-basis representation of T, then the set of all possible basis representations of T is

B1[ I ]B0 B0[T]B0 B0[ I ]B1 : B1 is a basis of U =S−1AS : S is an invertible matrix

(A.13)

This is just the set of all matrices that are similar to the given matrix A. Therefore, two distinct matrices thatare similar are different basis representations of a single linear transformation. We use the term similaritytransformation when referring to the operation A → S−1AS.

Exercise A.33. 4 Generalize the set in (A.13) to include the case: A =B1 [T]B0 is an m × n matrix repre-sentation of the transformation T : U → V.

One would expect similar matrices to share some significant properties – at least, those properties thatare intrinsic to the underlying linear transformation. In fact, this is an important theme in linear algebra. Itis often useful to step back from a question about a matrix to a question about some intrinsic property of theunderlying linear transformation, of which this matrix is only one particular manifestation.

A.2.5 The four fundamental subspaces

Let U be an n dimensional vector space and let V be an m dimensional vector space. Suppose we representan operator T : U → V by the m× n matrix A ∈ M(m,n), using the elementary basis (A.6). That is

T(x) = y ⇔ Ax = y

Then the matrix A is a linear transformation with domain U . The domain is one of four fundamentalsubspaces associated with A. Another one is the range. The range of A is the subspace of V given by

R(A) = Ax : x ∈ U ⊂ V

A third fundamental subspace is the nullspace of A,

N(A) = x : Ax = 0 ⊂ U

The nullspace of A is sometimes called the kernel of T. The range of A is sometimes called the column spaceof A because is represents the space of all linear combinations of the columns of A. Similarly, the space of alllinear combinations of the rows of A is called the row space of A and is essentially the same space as R(At)– the column space of the transpose of A.

A.3 Abstract Algebra

The main source of most material in this section is Fraleigh, A First Course in Abstract Algebra, [5].

4 Hint: see Definition A.31, and the comment preceding it.

37

A.3.1 Partitions and equivalence relations

Definition A.34 (Partition).A partition of a set S is a decomposition into nonempty subsets such that every element of the set is in oneand only one of the subsets. We call these subsets the cells of the partition.

Theorem A.35 (Equivalence class).Let S be a nonempty set and let ∼ be a relation between elements of S that satisfy the following propertiesfor all a, b, c ∈ S:

1. (reflexive) a ∼ a.2. (symmetric) If a ∼ b, then b ∼ a.3. (transitive) If a ∼ b and b ∼ c, then a ∼ c.

Then ∼ yields a natural partition of S, where

a = x ∈ S : x ∼ a

is the cell containing a for all a ∈ S. Conversely, each partition of S gives rise to a natural relation ∼satisfying the reflexive, symmetric, and transitive properties if a ∼ b is defined to mean a ∈ b.

Definition A.36 (Equivalence relation).A relation ∼ on a set S satisfying the reflexive, symmetric, and transitive properties described in The-orem A.35 is an equivalence relation on S. Each cell a in the natural partition given by an equivalencerelation is an equivalence class.

(Notation: The symbol ∼ is usually reserved for an equivalence relation. We will use R for a relation betweenelements of a set S which is not necessarily an equivalence relation on S.)

Definition A.37 (Congruence modulo n).Let h and k be two integers in Z and let n be any positive integer. We define h congruent to k modulo n,written h ≡ k (mod n), if h − k is evenly divisible by n, so that h − k = ns for some s ∈ Z. Equivalenceclasses for congruence modulo n are residue classes modulo n.

Definition A.38 (Addition modulo n).Let n be a fixed positive integer and let h and k be any integers. The remainder r when h + k is divided byn is called the sum of h and k modulo n.

A.3.2 Groups and subgroups

Definition A.39 (Binary operation).A binary operation ∗ on a set S is a rule that assigns to each ordered pair (a, b) of elements of S someelement of S.

Note that a binary operation on S must assign to each ordered pair (a, b) an element that is again in S. Thisrequirement that the element be again in S is known as the closure condition; we require that S be closedunder a binary operation on S.

Example A.40. Our usual addition operator, +, is not a binary operation on the set R+ = (0,∞) of positivereal numbers because 2 + (−2) is not in the set R+; that is, R+ is not closed under +.

There are two important points to remember when defining a binary operation ∗ on a set S. They are:

1. exactly one element is assigned to each possible ordered pair of elements of S2. for each ordered pair of elements of S, the element assigned to it is again in S.

38

Definition A.41 (Commutative operation).A binary operation on a set S is commutative if a ∗ b = b ∗ a for all a, b ∈ S.

Definition A.42 (Associative operation).A binary operation on a set S is associative if (a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ S.

Theorem A.43 (Associativity of function composition).f (g h) = (f g) h whenever this composition is defined.

Proof. If this composition is defined, then for each x in the domain of h, we have

(f (g h))(x) = f((g h)(x)) = f(g(h(x)))

= (f g)(h(x)) = ((f g) h)(x)

ut

Remark A.44. This is important because, for some operations (e.g., tranlation and scaling) that we use inFourier analysis, order may matter. That is, the operations may not be commutative. However, the foregoingshows that composition of functions is always associative.

One motive for defining a binary operation on a set is the desire to solve simple linear equations involvingthat binary operation. For example, we might expect that the binary operation permits a solution x to theequation 2∗x = 3. A little experimentation will show that such equations are solvable when there are specialconditions on the set S and the operator ∗. A most basic set of conditions is given by the requirement that〈S, ∗〉 define a group.

Definition A.45 (Group).A group 〈G, ∗〉 is a set G, closed under a binary operation ∗, such that the following axioms are satisfied:

1. (associativity) The binary operation ∗ is associative.2. (identity) There is an element e in G such that e ∗ x = x ∗ e = x for all x ∈ G.3. (inverse) For each a in G, there is an element a′ in G with the property that a ∗ a′ = a′ ∗ a = e.

Theorem A.46 (Unicity of identity and inverse).The identity and inverses are unique in a group. To be precise, in a group 〈G, ∗〉, there is only one identitye such that

e ∗ x = x ∗ e = x

for all x ∈ G. Likewise, for each a ∈ G, there is only one element a′ such that

a ∗ a′ = a′ ∗ a = e

Definition A.47 (Abelian group).A group G is Abelian if its binary operation ∗ is commutative.

Example A.48. The axioms for a vector space V pertaining just to vector addition can be summarized byasserting that V with the operation of vector addition is an Abelian group.

Example A.49. The set Mm×n(R) of all real-valued m × n matrices with binary operation matrix additionis a group. The m× n matrix with all entries 0 is the identity matrix. This group is Abelian.

Example A.50. The set Mn(R) of all real-valued n× n matrices with operation matrix multiplication is nota group. The n× n matrix with all entries 0 has no inverse.

Thus far our work deals primarily with integers, so the following example has special relevance:

Example A.51. The set of non-negative integers, 0, 1, 2, . . ., with the usual addition operation + is not agroup. There is an identity element 0, but no inverse for, e.g., 2.

39

The previous example is not meant to disparage this particular set of integers or binary operation.Algebraic structures consisting of sets with binary operations for which not all of the group axioms hold canalso be useful. Of these weaker structures, the semigroup, a set with an associative binary operation, hasperhaps had the most attention in the math literature. Another example is the monoid – a semigroup thathas an identity element for the binary operation.

Definition A.52 (Subgroup).If a subset H of a group G is closed under the binary operation of G and if H itself is a group, then H isa subgroup of G. We shall let H ≤ G or G ≥ H denote that H is a subgroup of G, and H < G or G > Hshall mean H ≤ G but H 6= G.

The elements of Z4 = 0, 1, 2, 3 are the possibilities for the remainder when an integer is divided by4. With the operator + denoting addition modulo 4, 〈Z4,+〉 satisfies the three group properties given indefinition (A.45).5 The only nontrivial proper subgroup of Z4 is 0, 2 Note that 0, 3 is not a subgroup ofZ4, since 0, 3 is not closed under + (addition modulo 4). For example, 3 + 3 = 2 and 2 6∈ 0, 3. Similarly,the set H = 0, 1, 2 does not satisfy the definition of a subgroup. For 1 + 2 = 3 6∈ H.

If a is a member of the group 〈G, ∗〉 then it is not hard to show that all elements of the set an |n ∈ Zare also members of 〈G, ∗〉. The box below contains the (trivial) demonstration of this fact. A few otherpropositions about the set H ≡ an |n ∈ Z are

1. H is a subgroup of G.2. H is the smallest subgroup containing a.3. H is called the cyclic subgroup of G generated by a, denoted 〈a〉.

Also, suppose we are given a group G and an element a ∈ G, such that

G ≡ an |n ∈ Z

Then a is the generator of G and the group G = 〈a〉 is cyclic.Notice that, if a is a member of the group 〈G, ∗〉 then the closure property guarantees that a ∗ a is also

a member of G. Denote this element by a2 ≡ a ∗ a. Similarly, it must be the case that a2 ∗ a ≡ a3 ∈ G.Proceeding by induction, it is clear that

a ∈ G ⇒ an ∈ G, for n = 1, 2, . . . (A.14)

For any member a of the group 〈G, ∗〉 there exists an inverse a−1, also a member of G. By an argumentanalogous to that of the preceding paragraph, it must be the case that a−1 ∗ a−1 ∈ G, and, in general,

a ∈ G ⇒ a−n ∈ G, for n = 1, 2, . . . (A.15)

By convention, let a0 = 1. Then equations (A.14) and (A.15) prove that, for any member a of the group〈G, ∗〉 it must be the case that all elements of the set an |n ∈ Z are also members of 〈G, ∗〉

Below we will use the direct product of groups. The definition is made evident in the following theorem.

Theorem A.53 (Direct product of groups).Let G1, G2, . . . , Gn be groups and let · denote their respective binary operators. For (a1, a2, . . . , an) and(b1, b2, . . . , bn) in ∏

i

Gi ≡ G1 ×G2 × · · · ×Gn

define (a1, a2, . . . , an) · (b1, b2, . . . , bn) to be (a1 · b1, a2 · b2, . . . , an · bn). Then∏

i Gi is a group, the directproduct of the groups Gi, under this binary operation.

The following theorem is a fundamental result about direct products of integer groups.5 To simplify notation, when the definition of the operator + is clear from the context, let the group 〈G, +〉 be

denoted simply by G, e.g., we will speak of the group Z4, when technically we mean 〈Z4, +〉.

40

Theorem A.54 (Direct product cyclic group decomposition).The group Zm × Zn is isomorphic to Zmn if and only if m and n are relatively prime, that is, if and only ifthe gcd of m and n is 1.

As a result, when m and n are relatively prime, Zm × Zn is cyclic of order mn.

A.3.3 Orbits, cycles, and the alternating groups

Definition A.55 (Permutation).A permutation of a set A is a function φ : A → A that is both one to one and onto. In other words, apermutation of A is a one-to-one function from A onto A.

Function composition is a binary operation on the collection of all permutations of a set A. We callthis operation permutation multiplication. Thus, if σ and τ are both permutations of the set A, then thecomposite function τσ is also a permutation of A.

Theorem A.56 (Permutation group).Let A be a nonempty set, and let SA be the collection of all permutations of A. Then SA is a group underpermutation multiplication.

Definition A.57 (Symmetric group).Let A be the finite set 1, 2, . . . , n. The group of all permutations of A is the symmetric group on n letters,and is denoted by Sn.

Note that Sn has n! elements.Each permutation σ of a set A determines a natural partition of A into cells with the property that

a, b ∈ A are in the same cell if and only if b = σn(a) for some n ∈ Z. We establish this partition using anappropriate equivalence relation:

For a, b ∈ A, let a ∼ b if and only if b = σn(a) for some n ∈ Z (A.16)

Definition A.58 (Orbits of σ).Let σ be a permutation of a set A. The equivalence classes in A determined by the equivalence relation (A.16)are the orbits of σ.

Definition A.59 (Cycle).A permutation σ ∈ Sn is a cycle if it has at most one orbit containing more than one element. The length ofa cycle is the number of elements in its largest orbit.

Theorem A.60. Every permutation of a finite set is a product of disjoint cycles.

Definition A.61 (Transposition).A cycle of length 2 is a transposition.

Thus, a transposition leaves all but two elements fixed. A computation shows that

(a1, a2, . . . , an) = (a1, an)(a1, an−1) · · · (a1, a3)(a1, a2)

Therefore, any cycle is a product of transpositions. We then have the following as a corollary to Theorem A.60.

Corollary A.62. Any permutation of a finite set of at least two elements is a product of transpositions.

Corollary A.62 simply states that any rearrangement of n objects can be achieved by successively interchang-ing pairs of them.

Lemma A.63. Let σ ∈ Sn and let τ be a transposition in Sn. The number of orbits of σ and the number oforbits of τσ differ by 1.

41

Theorem A.64. No permutation can be expressed both as a product of an even number of transpositionsand as a product of an odd number of transpositions.

A permutation of a finite set is called even (resp. odd) if it can be expressed as a product of an even(resp. odd) number of transpositions.

Proposition A.65. If n ≥ 2, then the collection of all even permutations of 1, 2, . . . , n forms a subgroupof order n!/2 of the symmetric group Sn. The same is true for the subgroup of all odd permutations.

Definition A.66 (Alternating group).The subgroup of Sn consisting of the even permutations of n letters is the alternating group An on n letters.

A.3.4 Cosets and the theorem of Lagrange

Let H be a subgroup of a group G, which may be of finite or infinite order. We exhibit two partitions of Gby defining two equivalence relations, ∼L and ∼R on G.

Theorem A.67. Let H be a subgroup of G. Let the relation ∼L be defined on G by

a ∼L b ⇔ a−1b ∈ H

Let ∼R be defined bya ∼R b ⇔ ab−1 ∈ H

Then ∼L and ∼R are both equivalence relations on G.

Definition A.68 (Cosets).Let H be a subgroup of a group G. The subset aH = ah : h ∈ H of G is the left coset of H containing a,while Ha = ha : h ∈ H is the right coset of H containing a.

Proposition A.69. For an abelian subgroup H of G, the partition of G into left cosets of H and the partitioninto right cosets are the same.

Proposition A.70. Every coset (left or right) of a subgroup H has the same number of elements as H.

Theorem A.71 (Theorem of Lagrange).Let H be a subgroup of a finite group G. Then the order of H is a divisor of the order of G.

Proof. Let n be the order of G and let m be the order of H. By the preceding proposition, every coset of H has melements. Let r be the number of cells in the partition of G into cosets of H. Then n = rm, so m is indeed a divisorof n. ut

Corollary A.72. Every group of prime order is cyclic

Proof. Let G be of prime order p and let a be an element of G different from the identity. Then the cyclic subgroup〈a〉 of G generated by a has at least two elements, a and e. But by Theorem A.71, the order m ≥ 2 of 〈a〉 must dividethe prime number p. Thus, we must have m = p and 〈a〉 = G, so G is cyclic. ut

Theorem A.73. The order of an element of a finite group divides the order of the group.

Proof. Recalling that the order of an element is the same as the order of the cyclic group generated by that element,we see that the theorem follows directly from Theorem A.71. ut

Definition A.74 (Index of H in G).Let H be a subgroup of a group G. The number of left cosets of H in G is the index (G : H) of H in G.

The index may be finite or infinite. If G is finite, then (G : H) is finite and

(G : H) = |G|/|H|

since every coset of H contains |H| elements.

Theorem A.75 (Subgroup index theorem).Suppose H and K are subgroups of a group G such that K ≤ H ≤ G, and suppose (H : K) and (G : H) areboth finite. Then (G : K) is finite and (G : K) = (G : H)(H : K).

42

A.3.5 Structure-relating maps

Homomorphism

Let G and G′ be groups. We are interested in a map φ : G → G′ that relates the group structure of G to thegroup structure of G′. Such a map often gives us information about the structure of G′ from known structuralproperties of G, or information about the structure of G from known structural properties of G′. Now groupstructure is completely determined by the binary operation on the group. We define such a structure-relatingmap for groups, and then point out how the binary operations of G and G′ are related by such a map.

Definition A.76 (Homomorphism).A map φ of a group G into a group G′ is a homomorphism if

φ(ab) = φ(a)φ(b) (A.17)

for all a, b ∈ G.

In equation (A.17), the product ab on the left-hand side takes place in G while the product φ(a)φ(b) on theright takes place in G′. Thus φ gives a relation between the two binary operations, and hence between thetwo group structures.

Example A.77 (Reduction Modulo n). Let γ be the natural map of Z into Zn given by γ(m) = r where r isthe remainder when m is divided by n. Then γ is a homomorphism.

In more familiar notation, this example implies that, for all s, t ∈ Z,

(s + t)(modn) = s(modn) + t(modn)

Remember: the addition on the right hand side takes place in Zn, thus it is addition modulo n. Forinstance, if we were to implement this example in Matlab, where the syntax mod(s,n) yields s(modn), wewould observe the following:

>> mod((s+t),n) == mod(s,n) + mod(t,n)

ans: 0

>> mod((s+t),n) == mod(mod(s,n)+mod(t,n),n)

ans: 1

In the first line of code, we used the standard Z addition operator on the right hand side, which does not yieldthe desired equality. The second line uses the appropriate addition modulo n operator. We turn to structuralfeatures of G and G′ that are preserved by a homomorphism φ : G → G′. We first give a set-theoreticdefinition. Note the use of square brackets when we apply a function to a subset of its domain.

Definition A.78 (Image and inverse image).Let φ be a mapping of a set X into a set Y , and let A ⊆ X and B ⊆ Y . The image φ[A] of A in Y under φis φ(a) : a ∈ A. The set φ[X] is sometimes called the range of φ. The inverse image φ−1[B] of B in X isx ∈ X : φ(x) ∈ B.

Theorem A.79. Let φ be a homomorphism of a group G into a group G′.

1. If e is the identity in G, then φ(e) is the identity in G′.2. If a ∈ G, then φ(a−1) = φ(a)−1.3. If H is a subgroup of G, then φ[H] is a subgroup of G′.

43

4. If K ′ is a subgroup of G′, then φ−1[K ′] is a subgroup of G.

Loosely speaking, the theorem states that a homomorphism preserves the identity, inverses, and subgroups.

Definition A.80 (Fibre).Let φ : G → G′ be a group homomorphism. For each a′ ∈ G′ the inverse image φ−1[a′] is the fibre over a′

under φ.

When the argument of a set function is a singleton, e.g., a′, we will omit the curly braces in order tosimplify notation. Thus, φ−1[a′] will appear as φ−1[a′].

Definition A.81 (Kernel).Let φ : G → G′ be a group homomorphism. The subgroup φ−1[e′] = x ∈ G : φ(x) = e′ is the kernel of φ,denoted by ker(φ).

Theorem A.82. Let φ : G → G′ be a group homomorphism, and let H = ker(φ). For a ∈ G, the set

φ−1[φ(a)] = x ∈ G : φ(x) = φ(a)

is the left coset aH of H, and is also the right coset Ha of H. Consequently, the two partitions of G intoleft cosets and into right cosets are the same.

Corollary A.83. A group homomorphism φ : G → G′ is a one-to-one map if and only if ker(φ) = e.

Definition A.84 (Normal subgroup).A subgroup H of a group G is normal if its left and right cosets coincide, that is, if gH = Hg for all g ∈ G.

Note that all subgroups of abelian groups are normal. Also, Theorem A.82 shows that the kernel of ahomomorphism φ : G → G′ is a normal subgroup of G.

Isomorphism and Cayley’s theorem

Definition A.85 (Isomorphism).An isomorphisms φ : G → G′ is a homomorphism that is one-to-one and onto G′. The relation G ' G′

denotes the existence of an isomorphism from G onto G′, in which case we call G and G′ isomorphic.

Theorem A.86 (Equivalence of isomorphic groups).Let G be any collection of groups, and define G ' G′ for G and G′ in G if there exists an isomorphismφ : G → G′. Then ' is an equivalence relation.

Theorem A.87 (Cayley’s theorem).Every group is isomorphic to a group of permutations.

A.3.6 Factor groups

Theorem A.82 shows that for each a ∈ G, the fibre φ−1[φ(a)] = x ∈ G : φ(x) = φ(a) is the left coset aHof H and is also the right coset Ha of H. Since these left and right cosets coincide, we will simply refer tothem as the cosets of H.

Now φ[G] is a group by Theorem A.79. We associate with each φ(a) = a′ ∈ φ[G] the coset φ−1[a′] (thefibre over a′ under φ). By renaming a′ ∈ φ[G] by the name of the associated coset, φ−1[a′], we can considerthe cosets to form a group. This group of cosets will be isomorphic to the group φ[G] since its elements arejust the elements of φ[G] renamed.

In summary, the cosets of the kernel of a group homomorphism φ : G → G′ form a group isomorphic tothe subgroup φ[G] of G. The binary operation on the cosets can be computed in terms of the group operationof G′. This group of cosets is the factor group of G modulo H, and is denoted by G/H.

44

Theorem A.88. Let φ : G → G′ be a group homomorphism with kernel H. Then the cosets of H form agroup, G/H, whose binary operation defines the product (aH)(bH) of two cosets by choosing elements a andb from the cosets, and letting (aH)(bH) = (ab)H. Also, the map µ : G/H → φ[G] defined by µ(aH) = φ(a)is an isomorphism.

Example A.89. Example A.77 considered the map γ : Z → Zn where γ(m) is the remainder when m isdivided by n. We know that γ is a homomorphism and, of course, ker(γ) = nZ. By Theorem A.88, we seethat the factor group Z/nZ is isomorphic to Zn. The cosets of nZ are the residue classes modulo n describedin Definition A.37. For example, taking n = 12, we see the cosets of 12Z are

12Z = . . . ,−24,−12, 0, 12, 24, . . .1 + 12Z = . . . ,−23,−11, 1, 13, 25, . . .2 + 12Z = . . . ,−22,−10, 2, 14, 26, . . .

...11 + 12Z = . . . ,−13,−1, 11, 23, 35, . . .

Note that the isomorphism µ : Z/12Z → Z12 of Theorem A.88 assigns to each coset of 12Z its smallestnonnegative element. That is, µ(12Z) = 0, µ(1 + 12Z) = 1, etc.

The factor group Z/nZ in the preceding example is classic. Recall that we refer to the cosets of nZ as residueclasses modulo n. Two integers in the same coset are congruent modulo n. This terminology is carried overto other factor groups. A factor group G/H is called the factor group of G modulo H. Elements in the samecoset of H are called congruent modulo H. By abuse of notation, we may sometimes write Z/nZ = Zn andthink of Zn as the additive group of residue classes of Z modulo 〈n〉, or abusing notation further, modulo n.

Factor groups from normal subgroups.

So far, we have obtained factor groups only from homomorphisms. The following theorem again takes H tobe a subgroup of G, but it does not assume that H is the kernel of a homomorphism.

Theorem A.90 (Left coset multiplication).Let H be a subgroup of a group G. Then left coset multiplication is well defined by the equation

(aH)(bH) = (ab)H

if and only if left and right cosets coincide, so that aH = Ha for all a ∈ G.

Corollary A.91. Let H be a subgroup of a group G whose left and right cosets coincide. Then the cosetsof H form a group G/H under the binary operation (aH)(bH) = (ab)H.

Definition A.92 (Factor group).The group G/H in the preceding corollary is the factor group (or quotient group) of G modulo H.

In summary, if H is a normal subgroup of G, then the cosets of H form the factor group, G/H. Recallingthat the kernel of a homomorphism φ : G → G′ is a normal subgroup of G, we see again that G/ ker(φ) is agroup.

The following three conditions are equivalent characterizations for a normal subgroup H of a group G.

1. ghg−1 ∈ H for all g ∈ G and h ∈ H.2. gHg−1 = H for all g ∈ G.3. gH = Hg for all g ∈ G.

45

Condition 2 is often taken as the definition of a normal subgroup of H of a group G.The map ig : G → G defined by ig(x) = gxg−1 is a homomorphism of G into itself. Clearly

ig(a) = gag−1 = gbg−1 = ig(b)

if and only if a = b, so ig is one-to-one. Since g(g−1yg)g−1 = y, we see that ig is onto G, so it is anisomorphism of G with itself.

Definition A.93 (Automorphism).An isomorphism φ : G → G is an automorphism of G. The automorphism ig : G → G where ig(x) = gxg−1

is the inner automorphism of G by g.

The equivalence of conditions 2. and 3. states that gH = Hg for all g ∈ G if and only if ig[H] = H for allg ∈ G; that is, if and only if H is invariant under all inner automorphisms of G. It is important to realizethat ig[H] = H is an equation in sets; we need not have ig(h) = h for all h ∈ H. That is, ig may performa nontrivial permutation of the set H. Thus, we see that the normal subgroups of a group G are preciselythose that are invariant under all inner automorphisms.

Fundamental homomorphism theorem.

We have seen that every homomorphism φ : G → G′ gives rise to a natural factor group, namely G/Ker(φ).We now show that each factor group G/H gives rise to a natural homomorphism having H as kernel.

Theorem A.94 (Natural homomorphism theorem).Let H be a normal subgroup of G. Then γ : G → G/H given by γ(x) = xH is a homomorphism with kernelH.

We have seen in Theorem A.88 that if φ : G → G′ is a homomorphism with kernel H, then µ : G/H → φ[G]where µ(gH) = φ(g) is an isomorphism. Theorem A.94 shows that γ : G → G/H defined by γ(g) = gH is ahomomorphism. Thus, we see that the homomorphism φ can be factored, φ = µγ, where γ is a homomorphismand µ is a one-to-one homomorphism with image φ[G]. We state this as a theorem.

Theorem A.95 (Fundamental homomorphism thoerem).Let φ : G → G′ be a group homomorphism with kernel H. Then:

1. φ[G] is a group,2. the map µ : G/H → φ[G] given by µ(gH) = φ(g) is an isomorphism,3. the map γ : G → G/H given by γ(g) = gH is a homomorphism,4. for each g ∈ G we have φ(g) = µγ(g).

The isomorphism µ is sometimes called the natural or canonical isomorphism, and the same adjectives areused to describe the homomorphism γ.

To summarize the foregoing, every homomorphism with domain G and kernel H gives rise to a factorgroup G/H, and every factor group gives rise to a homomorphism γ : G → G/H.

Let N be a normal subgroup of G. In the factor group G/N , the subgroup N acts as the identityelement. We may regard N as being collapsed to a single element, either to 0 in additive notation or to e inmultiplicative notation. This collapsing of N together with the algebraic structure of G require that othersubsets of G, namely the cosets of N , also collapse into a single element in the factor group.

Theorem A.96. Let G = H × K be the direct product of groups H and K. Then H = (h, e) : h ∈ His a normal subgroup of G, and G/H ' K. Also, K = (e, k) : k ∈ K is a normal subgroup of G, andG/K ' H.

Example A.97. Let G = Z3 × Z4 and let H = 〈(1, 0)〉 = (0, 0), (1, 0), (2, 0) ' Z3. Then the factor groupG/H = (Z3 × Z4)/〈(1, 0)〉 is the collection of cosets (0, 0) + H, (0, 1) + H, (0, 2) + H, (0, 3) + H. Sincewe can compute in the factor group by choosing representative elements (0, 0), (0, 1), (0, 2), (0, 3) fromthe cosets, it is clear that the factor group is isomorphic to Z4. Thus,

46

(Z3 × Z4)/〈(1, 0)〉 ' Z4

as predicted by the preceding theorem. Similarly, letting K = 〈(0, 1)〉 = (0, 0), (0, 1), (0, 2), (0, 3) ' Z4, wesee that

G/K = (Z3 × Z4)/〈(0, 1)〉 ' Z3

A.3.7 Series of groups

Definition A.98 (Subnormal series).A subnormal (or subinvariant) series of a group G is a finite sequence Hkn

k=0 of subgroups of G such thatHk < Hk+1 and Hk is a normal subgroup of Hk+1 with H0 = e and Hn = G.

Definition A.99 (Normal series).A normal (or invariant) series of a group G is a finite sequence Hkn

k=0 of normal subgroups of G suchthat Hk < Hk+1 with H0 = e and Hn = G.

A.3.8 Rings and fields

Definition A.100 (Ring).A ring 〈R,+, ·〉 is a set R together with two binary operations + and · that satisfy the following threeconditions:

1. 〈R,+〉 is an Abelian group.2. The operator · is associative.3. For all a, b, c ∈ R the following left and right distributive laws hold:

a · (b + c) = a · b + a · c

(a + b) · c = a · c + b · c

We refer to 〈R,+〉 as the additive group of the ring.

Example A.101. Consider the cyclic group 〈Zn,+〉. For any a, b ∈ Zn, define multiplication, a · b, to be theremainder when the usual product, ab, is divided by n. Then 〈Zn,+, ·〉 is a ring. For instance, on Z10, wehave 3 · 7 = 1. Thus defined, the operator · is called “multiplication modulo n.”

Definition A.102 (Ring homomorphism).Let R and R′ be rings. A map φ : R → R′ is a homomorphism if the following hold for all a, b ∈ R:

1. φ(a + b) = φ(a) + φ(b)2. φ(a · b) = φ(a) · φ(b)

Definition A.103 (Ring isomorphism).An isomorphism φ : R → R′ from ring R to ring R′ is a homomorphism that is one-to-one and onto. Therings R and R′ are then isomorphic.

Definition A.104 (Commutative ring).A ring in which multiplication is commutative is a commutative ring.

Definition A.105 (Ring with unity).A ring with a multiplicative identity is a ring with unity. A multiplicative identity is called unity.

(In a ring with unity it is standard to let the symbol 1 denote unity.)

Example A.106. Denote the greatest common divisor of two integers r, s ∈ Z, by gcd(r, s). For integersr, s ∈ Z, where gcd(r, s) = 1, the cyclic rings Zrs and Zr × Zs are isomorphic.

47

Definition A.107 (Multiplicative inverse).A multiplicative inverse of an element a in a ring R is an element a−1 ∈ R such that a−1 · a = a · a−1 = 1

Finally, we arrive at a precise definition of a field.

Definition A.108 (Unit, division ring, field, skew field).Let R be a ring with unity. An element u ∈ R is a unit if it has a multiplicative inverse in R. If every non-zeroelement of R is a unit, then R is a division ring. A field is a commutative division ring. A non-commutativedivision ring is a skew field.

Example A.109. The ring Z12 is isomorphic to Z3 × Z4 (see Example A.106). The elements of Z12 are0, 1, 2, . . . , 11. Let us find the units. Of course 1 is a units. Since 72 is 49, in the modulo 12 arithmeticof the ring Z12, 7 · 7 = 1. Therefore, 7 is its own inverse and is a unit. The other units are 5 and 11. It isclear, however, that the remaining members of Z12 are not units. We conclude that Z12 is not a division ring(hence, not a field).

In general, the units of Zn are those m ∈ Zn satisfying gcd(m,n) = 1. An equivalent statement is that mis a unit of Zn if and only if m and n are relatively prime. This fact makes it easy to check whether Zn isa division ring. Since multiplication on this ring is always commutative, Zn is a field if and only if it is adivision ring.

Example A.110. The previous example showed that not all non-zero elements of Z12 are units and, thus,Z12 is not a field. The ring Z12 is isomorphic to the ring Z3 × Z4. The members of Z3 are 0, 1, 2. Clearly1 is a unit. Since 2 · 2 = 1, the element 2 is also a unit. Therefore, all non-zero members of Z3 are units,and it follows that Z3 is a field. On the other hand, Z4 is not a field since, for example, 2 does not have amultiplicative inverse in this ring.

To conclude the general discussion in the paragraph above example A.110, let + and · represent additionmodulo n and multiplication modulo n, respectively. That is, a · b is equal to the remainder when the usualproduct, ab, is divided by n. Let 0 be the additive identity, and −a the additive inverse of a, and 1 themultiplicative identity. To determine that Zn is not a field, it is sufficient to find one non-zero member thatdoes not have a multiplicative inverse (i.e., a member that is not a unit). This is equivalent to finding amember a of Zn such that a and n are not relatively prime.

Example A.111. Suppose n = 32 so that the elements of Zn are 0, 1, 2, . . . , 31. In this case it is easy to findmembers of this ring – e.g., 2, 4, 6, . . . – that are not relative primes of 32. That there exists a member withno multiplicative inverse proves Z32 is not a field.

48

References

1. Myoung An and Richard Tolimieri. Group Filters and Image Processing. Psypher Press, Boston, 2003.2. Paul A. Billings. Poisson noise model. personal notes, 2001.3. Paul A. Billings, Michael F. Reiley, and Bruce E. Stribling. Mitigating turbulence-induced image blur using

multiframe blind deconvolution. In Proc. AMOS Technical Conference, 2001.4. R. Durrett. Probability: Theory and Examples. Duxbury Press, second edition, 1996.5. John B. Fraleigh. A First Course in Abstract Algebra. Addison-Wesley Publishing Company, fifth edition, 1994.6. Joseph W. Goodman. Statistical Optics. John Wiley & Sons, Inc., 1985.7. R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, 1985.8. Richard Paxman, Brian Thelen, and John Seldin. Phase-diversity correction of turbulence-induced space-variant

blur. Optics Letters, 1994.9. Mark J. Schervish. Theory of Statistics. Springer Series in Statistics. Springer-Verlag, New York, NY, 1995.

10. Richard Tolimieri and Myoung An. Time-Frequency Representations. Birkhauser, Boston, 1998.11. S. R. S. Varadhan. Probability Theory. Courant Institute of Mathematical Sciences, New York, NY, August 2000.

group algebras and speckle imaging

Documents