introduction to quantum computing and quantum...

DRAFT COPY

QUANTUM COMPUTING AND QUANTUM INFORMATION

Pavithran S Iyer, 3rd yr BSc Physics, Chennai Mathematical InstituteH-1 SIPCOT IT-Park, Siruseri, Padur Post, Chennai - 603103

Email: [email protected] & [email protected]

Typeset Using LATEX

LAST UPDATED: December 26, 2010

mailto:[email protected]


DRAFT COPY

GO TO FIRST PAGE↑ 2

DRAFT COPY

Contents

I INTRODUCTION 7

1 Brief overview 91.1 Linear Alzebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Basic Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Qubit - basic unit of quantum information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Multiple Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5.1 Other single qubit gates: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.6 Bloch Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.6.1 Generalizing Quantum Gates - Universal Gates . . . . . . . . . . . . . . . . . . . . . . 151.7 Important conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.8 Measurement Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.8.1 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.8.2 Quantum Copying or Cloning circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.9 Quantum Teleportaion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.9.1 Bell States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.9.2 EPR paradox and Bell’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.9.3 Application of Bell States: Quantum Teleportation . . . . . . . . . . . . . . . . . . . . 221.9.4 Resolving some ambiguities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.10 Quantum Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.10.1 Simulating classical circuits using Quantum circuits . . . . . . . . . . . . . . . . . . . 25

1.11 Quantum Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.11.1 Example of Quantum Parallelism - Deutsch Jozsa Algorithm . . . . . . . . . . . . . . 28

II PREREQUISITES - MATHEMATICS 33

2 Linear Algebra 352.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.2 Linear dependence and independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.4 Dirac’s Bra Ket notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5 Inner and outer products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.6 Orthonormal Basis and Completeness Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 382.7 Projection operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.8 Gram-Schmidt Orthonormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.9 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.10 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.11 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.12 Operator functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.12.1 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.13 Simultaneous Diagonalizable Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3

DRAFT COPY

CONTENTS CONTENTS

2.14 Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.15 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.15.1 Proving the theorem for the special case of square matrices: . . . . . . . . . . . . . . . 482.15.2 Proving the theorem for the general case: . . . . . . . . . . . . . . . . . . . . . . . . . 48

3 Elementary Group Theory 513.1 Structure of a Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1.1 Cayley Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.1.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.1.3 Quotient Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.1.4 Normalizers and centralizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2 Group Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2.1 Direct product of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2.2 Homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2.3 Conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3 Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.1 Generating set of a group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.2 Symmetric group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.3 Action of Group on a set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3.4 Orbits and Stabilizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3.5 Orbit Stabilizer theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

III PREREQUISITES - QUANTUM MECHANICS 65

4 Identical Particles 674.0.6 Describing a two state system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.0.7 Permutation operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.0.8 Symmetry and Asymmetry in the wave functions . . . . . . . . . . . . . . . . . . . . . 684.0.9 Extending to many state systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.0.10 Bosons and Fermions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Angular Momentum 71

IV PREREQUISITES - COMPUTATION 77

6 Introduction to Turing Machines 796.0.11 Informal description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.0.12 Elements of a turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.0.13 Configurations and Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.0.14 Classes of languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.1 Examples: Turing machines for some languages . . . . . . . . . . . . . . . . . . . . . . . . . . 836.1.1 L(M) = ωω|ω ∈ a, b∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.1.2 ap| p is a prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2 Variations of the Turing Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.2.1 Multi-Track Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.2.2 Multi-Tape Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.2.3 Multi-Dimensional Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886.2.4 Non-Deterministic Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2.5 Enumeration Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2.6 Equivalence of the Turing machines and Enumeration machines . . . . . . . . . . . . . 89

6.3 Universal Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.3.1 Encoding Turing machines over 0, 1∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.3.2 Working of a Universal Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . 91


DRAFT COPY

CONTENTS CONTENTS

6.4 Set operations on Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.4.1 Union of two turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.4.2 Intersection of two turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.4.3 Complement of a Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.4.4 Concatenation of two Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.5 Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.5.1 Membership Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.6 Decidability and Undecidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.7 Quantum Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7 Computational Complexity 95

V INFORMATION THEORY 97

8 Fundamentals of Information Theory 998.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.2 Axiomatic Definition of the Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.3 Interpretations of the Uncertainty Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

VI CODING 109

9 Classical Coding Theory 1119.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

9.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119.1.2 Notations from graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1129.1.3 Unique Decipherability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

9.2 Classifying instantaneous and uniquely decipherable codes . . . . . . . . . . . . . . . . . . . . 1149.2.1 Part 1: Kraft’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1149.2.2 Part 2: Macmillan’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1159.2.3 Part 3: Converse of Kraft’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 1179.2.4 Bound on codeword length - Shannon’s Noiseless Coding Theorem[26] . . . . . . . . . 118

9.3 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1229.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1229.3.2 Code parameters for a good error correcting code . . . . . . . . . . . . . . . . . . . . . 1239.3.3 Bound on the code distance - The Distance Bound . . . . . . . . . . . . . . . . . . . . 1239.3.4 Bound on the number of codewords . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1259.3.5 Parity Check codes [25] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329.3.6 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

9.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1339.4.1 Repetition Code[24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

10 Quantum Codes 13510.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13510.2 Errors in Quantum codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

10.2.1 Operator sum representation [3] [22] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13610.2.2 Lindblad form using the Master Equation [24] . . . . . . . . . . . . . . . . . . . . . . . 13710.2.3 Error Correction Condition for Quantum Codes [24] . . . . . . . . . . . . . . . . . . . 139

10.3 Distance Bound [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14210.4 The Quantum Singleton Bound [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14210.5 Quantum Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14310.6 The Quantum Gilbert Varshamov Bound [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14410.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

10.7.1 Bit Flip code [24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


DRAFT COPY

CONTENTS CONTENTS

10.7.2 Phase flip Errors [24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14710.7.3 Bit-Phase flip errors - Shor Code [24][23] . . . . . . . . . . . . . . . . . . . . . . . . . . 148

11 Stabilizer Codes 15111.1 Pauli Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15111.2 Motivation for Stabilizer codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15211.3 Conditions on stabilizer subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

11.3.1 Generating set of the stabilizer subgroup . . . . . . . . . . . . . . . . . . . . . . . . . 15311.3.2 Structure of the stabilizer subgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

11.4 Error Correction for Stabilizer codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15411.4.1 Notion of an Error in a Stabilizer code . . . . . . . . . . . . . . . . . . . . . . . . . . . 15411.4.2 Measurement on the stabilizer code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15411.4.3 Error Correction condition for Stabilizer codes . . . . . . . . . . . . . . . . . . . . . . 156

11.5 Fault tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15711.5.1 Unitary gates in the stabilizer formalism . . . . . . . . . . . . . . . . . . . . . . . . . . 158

VII References 165

A Solutions to Exercises 171


DRAFT COPY

Part I

INTRODUCTION

7

DRAFT COPY

DRAFT COPY

Chapter 1

Brief overview

1.1 Linear Alzebra

1. Tensor Product: A Tensor product is a method of multiplying two tensors (matrices), given by thegeneral (pneumonic) form: (It is represented as ⊗)

a11 a12 a13 a14 . . .a21 a22 a23 a24 . . .a31 a32 a33 a34 . . .a41 a42 a43 a44 . . .. . . . . . .. . . . . . .. . . . . . .

⊗M =

a11M a12M a13M a14M . . .a21M a22M a23M a24M . . .a31M a32M a33M a34M . . .a41M a42M a43M a44M . . .

. . . . . . .

. . . . . . .

. . . . . . .

where, M is any matrix. If the matrices on the LHS have thedimensions (D1r ×D1c) and (D2r ×D2c)respectively,

then the result matrix, on the RHS will have the dimensions: [(D1r + D2r)× (D1c + D2c)]

It is really important to note that the above is NOT a formal definition for a tensor product, but itis just a pneumonic form. The result in this case seems to have the same dimensions as the originalmatrix in the LHS (first matrix). Only when we expand the RHS by putting various values of M, wewill get the matrix of the dimensions (as given above).

1.2 Basic Quantum mechanics

We shall look at fundamentals of quantum mechanics in greater detail in the next chapter.Some important definitions are:

1. State of a system:1 The state of a quantum system is a vector in the infinite dimensional complexvector space known as the Hilbert Space.The state of a classical system is represented by its position and momentum in a phase space. But thisis not possible in Quantum Mechanics because of the built in concept of the Heisenberg’s Uncertaintyprinciple. So, if we close in into one definite value of position for a position, the possible valuesthat its momentum can take is infinite. So, we can only close on a given area, and say , with someprobability that the particle lies within that area. So, the (classical) state of the particle lies anywhere

1A deeper picture of this is given in the quantum mechanics section. For now, this description would do.

9

DRAFT COPY

1.3. QUBIT - BASIC UNIT OF QUANTUM INFORMATION CHAPTER 1. BRIEF OVERVIEW

in that continuous area that we define. So, the state of the particle has infinite position and momentumcoordinates. Hence, it is represented as a vector in an infinite dimensional complex vector space (HilbertSpace). The state of a N-independent particle system where the individual particle wave functions are|ψ1, |ψ2, |ψ3 ... |ψN , the combined state of the N particle system is given by:

|ψ = |ψ1 ⊗ |ψ2 ⊗ |ψ3......⊗ |ψN

In general, if |ψ1, |ψ2, |ψ3, ..... are vectors in N1, N2, N2, ..... dimensional complex vector spacesrespectively, then |ψ is a vector in a (N1 + N2 + N2 + .....) dimensional complex vector space.

2. Dirac Bra-c-Ket Notation: The dirac ket notation is a well known and frequently used here. In thisnotation, every column vector

−→ψ is represented as: |ψ which is also called a Ket vector. Similarly, a

row vector is represented as ψ| which is called a Bra vector. Hence the name: BracKet notation.

3. Local and non Local processes: By “local ”process between two particles , it is meant that influ-ences between the particles must travel in such a way that they pass through space continuously; i.e.the simultaneous disappearance of some quantity in one place cannot be balanced by its appearancesomewhere else if that quantity didn’t travel, in some sense, across the space in between. In particular,this influence cannot travel faster than light, in order to preserve relativity theory.

4. Canonical Commutation Relations: The some observables in Quantum mechanics do not commute.That is, they have a non zero commutator. The commutator for a pair of operators is defined as:

[A, B] = AB −BA ——— CommutatorThe basic commutator relations in quantum mechanics are:

[xi, pj ] = i[xi, xj ] = 0[pi, pj ] = 0

1.3 Qubit - basic unit of quantum information

A ’bit’, a classical two state system, represents the smallest unit of information. A classical bit is representedby 1 or 0. It can be thought of as ’true’ and ’false’ or any two complementary quantities whose union is theuniverse, and intersection is the null set. There is a profound reason to why the smallest unit of informationis a 1 or 0, or ’true’ or ’false’. This is because any logical querry can be split into a series of ’yes’ or ’no’questions. That is, with a series of ’yes’ or ’no’ answers to questions, we can perform any logical query. Thisis why a ’bit’ (which can be thought of as a most general strcture of storing information) is a 1 or a 0.A quantum bit, is just an example for a two state quantum system. A qubit can can also be an electron (withspin up and down), an ammonia molecule, etc. In quantum mechanics, a two state quantum system doesnot mean that it has only two states. This is what distinguishes a qubit from a classical bit. The differencecomes due to a very important quantum mechanical phenomena known as interferrence. Just like how aclassical bit can take a value 0 or 1, a quantum bit can take values |0 or |1, and any value produced by theinterferrence between the states |0 and |1 (like α|0 + β|1). Since, there are infinite such superpositions(where each of the states have 0 and 1 given by some probability amplitude α and β respectively), a qubitcan exist in infinite states. If each state can store a unit of information, then the qubit can hold infinite unitsof information.

A Qubit, like any other two level quantum system, is (conventionally) represented by its state:

|ψ = α| ↑+ β| ↓|ψ = α|++ β|−|ψ = α|0+ β|1

where |α|2 + |β|2 = 1.


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.4. MULTIPLE QUBITS

This can be misleading, since it gives us the feeling that a classical bit can at most carry 2 units of informationwhereas a quantum bit can carry infinitely many units. But, if a measurement is done on the state of theQubit, it collapses into one of the eigen states of the measurement.So, if measurement of |ψ gives ’a’, then after this measurement, the state of the Qubit will remain |a (theeigen state of the measurement corresponding to the eigen value a). This new state |a now, will got give thesame measurement results as |a. In fact it will not respond to any other measurement.Why this happens → Postulates of Quantum Mechanics.Hence, only a single unit of information can be retrived from a Qubit.

1.4 Multiple Qubits

Any two state system (like the electron which has a spin) can be represented by a Qubit. But what aboutrepresenting the state of two electrons (which are independent of each other), using a qubit? Such a repre-sentation, we saw in the beginning of the chapter, was possible. If the state of one electron is |ψ1 and stateof the other electron is |ψ2, then the system of two independent electrons can be collectively represented bythe state |ψ, where:

|ψ = |ψ1 ⊗ |ψ2

Since, we take a direct product of the two states, we may represent the now state, which is the two qubitstate, as: (using the convention: |i ⊗ |j ≡ |ij)

|ψ = α00|00+ α01|01+ α10|10+ α11|11where : |α00|2 + |α01|2 + |α10|2 + |α11|2 = 1

and |αij |2 is the probability of the first qubit being in state |i, and the second Qubit being in state |j. Ifwe want only one of them, we must sum over the other. If we want the probability of system being in state|i only, we must sum |αij |2 over all j’s.So, the probability of measuring the first qubit to be 0 is = |α00|2 + |α01|2, and obviously, the measurement

will collapse the state of the qubit into |ψ =|α00|2 + |α01|2|α00|2 + |α01|2

.

Therefore, we can say that in ths 2 qubit system, we can retrieve 2 units of information. It certainlycan deliver more information than a single qubit system, but there are some difficulties too.In general, we need to carry the measurement process twice to determine the information stored in both thequbits. So, earlier we were carrying out the measurement only once, and now we need to do it twice. Canit be better? Can we get away in one measurement itself? In other words, can we store some amount ofinformation about one qubit in another, such that we can guess both the qubits, by measuring only one ofthem? The answer is yes. We can do such a trick that can, with certainty, retrieve the information stored inone qubit, by measuring the other. Such a two qubit state is called a Bell State or the EPR pair.

The Bell State is given by : |ψ =|00+ |11√

2Here, the first Qubit being is measured to be 0 with the Probability

12(changing the state to : |ψ = |00)

and 1 with Probability12(Changing the state to : |ψ = |11)

Hence, P(measuring the first qubit to be 0) = P(measuring the second qubit to be 0) , also state afterthe measurement of the first qubit to be 0 = state after the measurement of the second qubit to be 0.Similarly, P(measuring the first qubit to be 1) = P(measuring the second qubit to be 1) , also sate after themeasurement of the first qubit to be 1 = state after the measurement of the second qubit to be 1.Therefore the first qubit always gives the same result as the measurement of the second qubit. Hence, wecan say that: By knowing the result of measurement to the first qubit, we can tell by certainity,


DRAFT COPY

1.5. QUANTUM GATES CHAPTER 1. BRIEF OVERVIEW

the result of measurement of the second qubit.Also we can say ”The two states are perfectly correlated”. In the language of quantum mechanics, the twostates are ”Entangled”.

It is also easy to note that this property cannot be satisfied by any arbitrary state. Hence, we need touniquely classify these states.2 Its allpications and significance become more prominent as we procees in thelater sections.

1.5 Quantum Gates

Just as we have Classical Gates that operate on classical bit(s), we also have their quantum analogues.To Start with, consider a simple classical gate, the NOT gate:

We can now think of its quantum mechanical analogue:Consider the Gate G that is a ”quantum mechanical NOT gate”. G ”flips” the state of a qubit.

G : (α|0+ β|1) → (α|1+ β|0)

We can now try to see how this process of “flipping” the qubit is carrier out. We know that, in the case oftwo level systems, (or Spin Half) the state |0 can be changed to |1 and vice-versa using the ladder operatorsgiven by S− and S+. The action of these operators is given by:

S+ = Sx + iSy

S− = Sx − iSy

S+|0 = |1 —— Raising operatorS+|1 = 0 —— Raising operator

Simillarly:S−|1 = |0 —— Lowering operatorS−|0 = 0 —— Lowering operator

Therefore we can say: textbfG ≡ (S+ + S−)(S+ + S−)(α|0+ β|1) = α(S−|0+ S−|0) + β(S−|1+ S−|1)

⇒ α(|1+ 0) + β(|0+ 0)⇒ α|1+ β|0

Hence, G(α|0+ β|1) = α|1+ β|0

A Very important aspect to take note of is that: the state |0 is very different from 0. The latter means anull vector. It represents void. While the former represents some state in which a particle is present in. Thestate |0 does not mean void.

2I still cannot get what is so special, it almost seems like they are two identical quantum states. It is like taking two classicalbits 0 and 0 and saying that the result to measurement of 0 is = to the result of measurement of the other 0 state. Whatproperty of QM is being used?


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.6. BLOCH SPHERE

Since the state of the qubit is a ket vector, we can also have a column vector representation for it:

|ψ =

αβ

now the Gate G can be defined as:

G

αβ

=

βα

By looking at this property, we can ”guess” the matrix form of G, to be:

G =0 11 0

. We can verify that this is the σx + iσy or Sx + iSy operator, as computed above.

1.5.1 Other single qubit gates:

There are many single qubit gates. A major requirement for a quantum gate operator is that it must beUnitary. This has 2 consequences:

1. The conservation of probability.

2. Since the inverse of an unitary matrix is also an unitary matrix, each single bit quantum gate can be”undone” by some other single bit quantum gate. So, the input can be obtained by performing someoperation on the output. Therefore there is no ”loss of information”, unlike the classical case wherethe gates are not invertible.

Let us consider a Z Gate that ”leaves |0 unchanged, and flips the state of |1 to −|1.

Z (α|0+ β|1) = (α|0 − β|1)We can also guess that: Z =

1 00 −1

. This is similar to the σz or Sz operator (pauli spin matrix).

Let us now consider yet another important single qubit gate, the Hadamard Gate. This gate ”trans-forms the states |0 to a states between |0 and |1”. This gate can be equated to an operation which is

reflection of the qubit vector about the line θ =π

8. It changes |0 to

|0+ |1√2

and changes |1 to|0 − |1√

2.

H(α|0+ β|1) = (α|0+ |1√

2+ β

|0 − |1√2

)

The Matrix form of H can be guessed as: H =1√2

1 11 −1

1.6 Bloch Sphere

The state of any two state system is represented by a point in a two dimensional complex vector space. Since,this is a two dimensional complex vector space, each dimension (like we have x, y, and z dimensions in thecartesian frame) is complex. Each complex number needs two real numbers to represent itself. So, the state


DRAFT COPY

1.6. BLOCH SPHERE CHAPTER 1. BRIEF OVERVIEW

can now be described by four real quantities, rather that two complex quantities. Hence, we have:

The state in the two dimensional complex vector space is: |ψ = α|0+ β|1Since, α and β are complex, each of them can be described by two

real quantities: α = (rα, iα) and β = (rβ , iβ)

The normalization condition: |α|2 + |β|2 = 1 , now translates to the condition on the

four real quantities as: |rα|2 + |iα|2 + |rβ |2 + |iβ |2 = 1. (1.1)

Just like how the equation x2 + y2 + z2 = 1 represents the surface of a sphere, placed in a three dimensionalspace, the above equation (1.1) represents the surface of the sphere kept in a four dimensional space. Now,this is the motivation for us to try to describe the state of a two state system as a point on the sphere. Afour dimensional space is still a bizarre object for us to imagine. So, we need to try and remove one degreeof freedom here so that we get our usual two dimensional sphere in a three dimensional space. The twodimensional sphere hence obtained is known as the Bloch Sphere. For this purpose, we need to work outthe above process in the polar form. So, we shall have:

The complex numbers: α = rαeiφα and β = rβeiφβ

Therefore, the state of the system, |ψ = rαeiφα + rβeiφβ

So, till this point, we have been working with a sphere kept in a four dimensional space, as there are fourreal quantities in the equation of the state. Now, we need to remove one degree of freedom, that is eliminateone real quantity from the equation of the state.For doing so, we need to recollect a very important feature of quantum mechanics that any quantum mechan-ical system is invariant under rotation by a overall phase. The following sentence can be realized if we goback to our basic state of the quantum mechanical system. The state is represented by |ψ which is actuallya probability amplitude. But what we can measure is the probability density, denoted by |ψ|2. So, even ifthere is an overall phase factor, like γ, in the probability amplitude, it will not affect our measurement, andfor all values of γ, the system will be identical3. So, let us multiply the system by any an overall phase angle.The choice can be decided by us, to suit our requirements.

Since the choice of γ can be artitrary, let γ = e−iφα , so that, we can eliminate one real variable.

Therefore, the state of the system, e−iφα(|ψ) = rαeiφα(e−iφα) + rβeiφβ (e−iφα)Now, the LHS remains as |ψ since the system is invariant under overall phase change.

Let: φ = φβ − φβ . Then, on simplification we have: |ψ = rα|0+ rβeiφ

Now, in the above expression, the second term of the RHS has a complex coefficient.We can write this in the complex form (x + iy):

⇒ |ψ = rα|0+ (x + iy)|1 where x = rβ cosθ and y = rβ sinθ

Now, let us apply the normalization condition: |rα|2 + |x|2 + |y|2 = 1Therefore, we now get the equation of the Bloch sphere.

Hence, we got the equation of the Bloch sphere. We now need to find how the state of a system can berepresented on the bloch sphere. For this, let us go to spherical coridinates. Let us map x, y and rα on to asphere with unit radius.

Let us now make the transformations:x = cosφ sinθ

y = sinφ sinθ

rα = cosθTherefore, the state now becomes: |ψ = cosθ|0+ sinθ(cosφ + sinφ)|1

We can write this as: |ψ = cosθ|0+ (eiφ)sinθ|13In fact, this probability amplitude is the reason for interference.


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.6. BLOCH SPHERE

Now, by convention, in spherical polar coridinates, θ goes from 0 to 2π. But here, if we put θ = 0, we get|ψ = |0 and on putting θ =

π

2, we get |ψ = (eiφ)|1.

So, we see that θ = 0 to θ =π

2covers the entire sphere. So, we modify our θ by changing it to

θ

2. Now θ

= 0 to θ =π

2covers the entire sphere. So, the final equation of the state of the system in a Bloch sphere

is:

|ψ = cos

θ

2

|0+ (eiφ)sin

θ

2

|1 (1.2)

So, the above equation represents the single qubit in a Bloch Sphere. Now, what about multiple two statesystems, or multiple qubits ? Let us take a two qubit system represented by its state |ψ = |ψ1 ⊗ |ψ2. Thestate space of |ψ is now a vector in the four dimensional complex vector space because each of the statesare vectors in a two dimensional Hilbert space.Following the exact similar argument as above, we can see that the state |ψ can be represented as a vector ona seven dimensional sphere, kept in a eight dimensional space. This seems confusing because in the above case,we claimed that a two state system (represented as a vector in H2) can be represented on a two dimensionalsurface. Extending the same argument, one should say that a composition of two state system (representedas a vector in H2×H2) should be represented as a vector on a four dimensional surface (need not be a sphere,but still it must be some four dimensional surface). Hence, we see that the surface used to represent |ψhas more dimensions than expected. We know that a seven dimensional surface has more points than a fourdimensional one. We can now conclude that the seven dimensional surface can represent more states than afour dimensional one. Therefore, there is some missing information that cannot be represented on the fourdimensional surface. The seven dimensional surface carries more information than a four dimensional one.This means that when we take a tensor product of two 2-state systems, the new system formed containsinformation about each of the individual systems as well as some excess information that cannot be attributedto any single one of them. We arrived at a four dimensional surface from the assumption that all compositionsof two 2-state systems can be represented as a tensor product of some two 2-state systems. From the factthat the sphere which represents all the compositions has more states, we can say that our old assumptionthat all compositions of two 2-state systems can be represented as a tensor product, fails.So, there are some states that are composed of two 2-state systems, but cannot be expresses as a tensorproduct. These states have information (properties) that are not related to any one of the 2-state systems.This information is lost when the qubits are separated. Hence, this information is due to the tie, or (inmore sophisticated words) entanglement of the two 2-state systems. It is quite clear that all the statesdo not have this property. States that have this property are called entangled states. The Bell State

B00 =|00+ |11√

2.

1.6.1 Generalizing Quantum Gates - Universal Gates

Now, since gates are unitary operators, that act on the state of the qubit, we can think of them as rotationoperators on the Bloch Sphere. Now each gate performs some rotation. But any rotation, in general can bebroken into a sequence of standard rotations, about the x-y plane, y-z plane and the x-z plane. Since, anyrotation can be broken into a sequence of standard rotations, we can drawn the same analogy and say thatany Quantum gate can be represented by a sequence of standard gates that act on the state of the qubit |ψ,and produce the same answer as the original gate.Consider a rotation operator U. We can decompose U into several basic rotation operators:

So, U = eiα

eiβ

2 0

0 e−i

β

2

cos

γ

2sin

γ

2−sin

γ

2cos

γ

2

eiδ

2 0

0 e−i

δ

2

In other words, these are ”universal” operators. Similarly, we can consider universal quantum gates, that,


DRAFT COPY

1.6. BLOCH SPHERE CHAPTER 1. BRIEF OVERVIEW

when manipulated appropriately, can mimic any other quantum gate.Let us consider a simple classical universal gate, the NAND gate. The XOR Gate is not a universal gatebecause it fails to change the parity of the bits.4:

Figure 1.1: A classical univaersal gate. NAND gate.

Let us now consider a Universal Quantum Gate: A slight modification of the NOT gate, The controlled NOTgate or the CNOT Gate.

Figure 1.2: A CNOT Gate

A CNOT Gate is a ”two input NOT gate”. It takes two inputs: a DATA and CONTROL,flips DATA if CONTROL = 1. Classically, this is analogous to the XOR Gate.leaves DATA unchanged if CONTROL = 0.

The Gate operation is represented ast: CNOT:(|A+ |B) → |A, B ⊕AThe action of this Gate can be described explicitly:(|DATA,CONTROL → |DATA, CONTROL)|00 → |00|01 → |11|10 → |10|11 → |01

4This is not clear to me


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.7. IMPORTANT CONVENTIONS

In a Quantum circuit, the CNOT gate is represented as:

Figure 1.3: Circuit representation of a CNOT gate. The top wire has the control bit and the bottom has theTarget or the Data bit.

We can now look at the matrix representation of this Gate.

CNOT =

1 0 0 00 1 0 00 0 0 10 0 1 0

1.7 Important conventions

α|0 + β|1 —————– X NOT gate———— α|1 + β|0

α|0 + β|1 —————– Z Z gate—————– α|0 - β|1

α|0 + β|1 —————– H Hadamard gate—— α|0+ |1√

2+ β

|0 − |1√2

A most remarkable or unique feature that we notice about a quantum gate, is that it operates on single bits,which are like a superposition of 2 probabilistic classical bits. The unitary(existence of inverse) nature ofthese operators enables the input Qubit to be easily retrieved.

1.8 Measurement Basis

Measurement is an operation that is performed on system, to determine with certainty, the state of thesystem. Measurement also has a meaning in classical bits. Each classical bit had a even probability of itbeing 0 or 1. When a measurement is done, we know by certainty what value that bit has. The classical bitis 0 with probability one half or 1 with probability one half. Both these choices forever exclude each other.Whereas in the quantum case, the difference comes here. We have superposition states. For example, a qubitcan be |0 with probability one half and |1 with probability one half. Note the usage of and and or.The meaning of measuring some property (operator) of a state |ψ is nothing but finding the eigen valuesof that operator (corresponding to the property). The allowed values of any property (or, the result ofmeasurement of any property) are limited to the eigen values of the operator representing this property. Butit is not so straight forward: what if we are trying to measure the property corresponding to the operator A,and the given state |ψ is not an eigen state of A?


DRAFT COPY

1.8. MEASUREMENT BASIS CHAPTER 1. BRIEF OVERVIEW

For example, till now we took |ψ = α|0 + β|1. This means we took the ket vectors |0 and |1 as our basisstates. These vectors are nothing but the eigen vectors of the σz or the Sz operator. So, we were in the Sz

eigen basis. That is, the operator Sz is diagonal in this basis. Now, if we measure a property correspondingto the Sz operator, then the measurements results can have only two allowed values: 1 and -1, because theseare the eigen values of Sz. But what if we want to measure a property corresponding to the Sx operator ?Here, |0 and |1 are certainly not the eigen vectors of the Sx operator (they better not be because if theyare the eigen vectors of Sx also, that would mean Sx and Sz commute). Here, we make use of a key propertyof the eigen basis. They form a complete basis. So, any state |ψ can be expressed as a superposition of theSx eigen states, and the corresponding eigen values are the result of the measurement. So, to start with,we have a state |ψ in the Sz basis. Now when we want to measure a property corresponding to Sx, wefind the eigen states of Sx, and write |ψ as a superposition of those eigen states. Now the coefficients ofthe eigen states are the measurement results, and the eigen states are those to which |ψ shall collapse aftermeasurement.

So, whichever property (operator) we are measuring, we must expand |ψ in the eigen basis of that operator(correspond to the property), and that find the coefficients of those states in the expansion of |ψ.

1.8.1 Quantum Circuits

A quantum circuit, like any other, is read from left to right. A line in the circuit represents a wire. Awire here may just denote the path of the photon (or passage of time, etc. There is no physical path orchannel.) The input to a circuit is the qubit (represented by |ψ, which by convention, is assumed to be inthe computational basis state).An important feature not allowed in quantum circuits is loops. There is no feedback from one part of thecircuit to another. Thsee feedback forms are of 2 general types: several wires joined together FANIN , and acopy of the qubit in one wire, going to multiple wires FANOUT. These are not allowed. Refer to figure. Asan interesting consequence of this, the copying of a qubit is an impossible task.

Figure 1.4: FANIN and FANOUT characteristics:

Also, we can generalize the representation of a controlled not gate which takes N qubits. The generalizedrepresentation is shown in the figure:


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.8. MEASUREMENT BASIS

Figure 1.5: Generalized CNOT gate

Another important aspect of quantum circuits are meters used to measure the quantum bits.5 As we havealready discussed, the measurement will collapse the qubit into a probabilistic classical bit (distinguished bydrawing a double arrow line)6. |ψ(= α|0 + β|1), upon measurement, will change to a classical bit thatgives result 0 with probability |α|2 and 1 with probability |β|2. A representation of the quantum meter isshown below:

Figure 1.6: Representation of a Quantum Bit measuring device:

1.8.2 Quantum Copying or Cloning circuits

One of the key features of quantum computation is that it disallows the replication or copying of Qubits.We can see why this happens. Let us take up the issue of replicating or the copying of a classical bit. Toaccomplish this classically, one could do the following:

5measurement here means to identify the qubit. measuring a state α|0+ β|1 means to extract out α and β.6The subtle difference here is that a probabilistic classical bit is fundamentally different from a quantum bit. This is because

a classical has some finite probability of carrying only one piece of information, and it gives that upon measurement. Qubiton the other hand carries infinite information, but when measured, it has some probability of collapsing on to one piece


DRAFT COPY

1.8. MEASUREMENT BASIS CHAPTER 1. BRIEF OVERVIEW

Figure 1.7: Classical method of copying a bit. Here ⊕ stands for the XOR operation. Here, we take anarbitrary bit x and perform an XOR operation with that bit and y, where y is another bit that is constantly= 0.

Let us now try the following with the qubit also: Here, we blindly replace the classical XOR with thequantum XOR or ⊕. A valid question would be : have we copied all the information stored in |ψ. Theanswer obviously is NO. It is obvious because, we agreed that an infinite amount of information is stored in|ψ. So, all of it cannot possibly be copied. This theorem is called the no cloning theorem. There is ashort proof of this theorem.Suppose we have two states |x and |y, such that there exists a machine U which copies the state |x into|y (|y is called the target state), then the system of these two qubits is in the state: |x ⊗ |y. So, wehave:

U(|x ⊗ |y) = (|x ⊗ |x) (1.3)

Here U is copying the value of a state into another. We claim that U is some unitary, universal operatorthat can clone any quantum state. Let us not take two arbitrary quantum states |ψ and |φ. Let the targetstate be represented as |s. According to the above claim, we have:

U(|ψ ⊗ |s) = (|ψ ⊗ |ψ) (1.4)similarly, U(|φ ⊗ |s) = (|φ ⊗ |φ) (1.5)

We can take an inner product of equations (1.4) and (1.5). To take an inner product of the equations meansto take the inner products on the two RHS to form a new RHS and inner product of two LHS to form a newLHS. The inner product of any two matrices (applies to even column vectors - since they too are matrices)is nothing but the product of the hermitian conjugate of the first with the second matrix.The LHS of equation (1.4) is of the form A|x. This is nothing but a product A·B. The hermitian conjugateof (A·B) is B†·A†. The LHS will then be:

[U(|ψ ⊗ |s)]†[U(|φ ⊗ |s)]⇒ (|ψ ⊗ |s)†U†[U(|φ ⊗ |s)]⇒ (ψ|⊗ s|)U†U(|φ ⊗ |s)

In the last step, we see that (A ⊗ B)† = (A† ⊗ B†). Also since U is unitary, U†U = I. Hence the LHSbecomes:

(ψ|⊗ s|)(|φ ⊗ |s) ⇒ (ψ|φ)⊗ (s|s)

In deriving the above step, we have used the formula: (A ⊗ B)·(C ⊗ D) = (A · C) ⊗ (B · D). Also, sincethe target state |s is normalized, s|s =1.

∴ LHS: (ψ|φ)


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.9. QUANTUM TELEPORTAION

Coming to the RHS and applying similar simplifications, we have:

(|ψ ⊗ |ψ)†(|φ ⊗ |φ)⇒ (ψ|⊗ ψ|)(|φ ⊗ |φ)

⇒ (ψ|φ)2

Therefore, we can write the new equation formed as:

(ψ|φ) = (ψ|φ)2

This equation is of the form x = x2. The solutions to this equation are x = 1 and x = 0. The first case(ψ|φ) = 1 means that both |ψ and |φ are identical states. But then, this would mean that U only copiesstates that are identical. That is, U can only copy some specific state. But this contradicts our assumptionthat U is an universal operator. In the other case where (ψ|φ) = 0, we see that |ψ and |φ are bothorthogonal states. So, the cloning operator U will only clone orthogonal states.Therefore, we can conclude that there is no universal operator that can copy any arbitrary state. If it cancopy a given quatum state, then the only other state that it can copy is one which is orthogonal to the givenstate.In the figure given below:

Figure 1.8: Repeating the same above process for a quantum case

The process can be concisely described as:

[α|0+ β|1]|0 ——————— CNOT ——————— α|00+ β|11

The output however, is not equal to |ψ|ψ because when we multiply [α|0 + β|1] by [α|0 + β|1], weget: α2|00+ αβ|01+ αβ|10+ β2|11. But this certainly isn’t equal to our result obtained from the CNOTgate since the cross terms |01 and |10 are not present. But one can now say that we can copy the quantumbit if αβ = 0. But this must mean that at least one of them must be 0. If this is so, then the bit will nolonger be a “qubit”.Hence it is impossible to copy the state of a Qubit.

1.9 Quantum Teleportaion

1.9.1 Bell States

We have encountered these Bell States earlier. They are known to be the most correlated pair. The result ofthe measurement on the first qubit is the same as the result of the same measurement on the second qubit.In this case, on measuring the first qubit of the Bell State, one obtains 2 possible answers: 0 with probability12

and 1 with probability12. On measurement on the second qubit one obtains 2 possible answers: 0 with


DRAFT COPY

1.9. QUANTUM TELEPORTAION CHAPTER 1. BRIEF OVERVIEW

probability12

and 1 with probability12. The measurement on the always yields the same result. These Bell

states are formed by taking the two qubit system (in the computational basis state), and pass them into aHadamard gate followed by a CNOT gate. A simple table for this would be:

Table 1.1: Table Showing the input and output states of a Hadamard Transform, used to produce Bell States:In Process and Out

|00 —— H ——|00+ |10√

2—— CNOT ——

|00+ |11√2

≡ |B00

|01 —— H ——|01+ |11√

2—— CNOT ——

|01+ |10√2

≡ |B01

|10 —— H ——|00 − |10√

2—— CNOT ——

|00 − |11√2

≡ |B10

|11 —— H ——|01 − |11√

2—— CNOT ——

|01 − |10√2

≡ |B11

The generalized bell state is given by:

|Bx,y ≡|0, y+ (−1)x|1, y√

2

1.9.2 EPR paradox and Bell’s inequality

We shall look at these in detail in the next chapter: Fundamentals of Quantum mechanics.We just saw that if we have the result of measurement of one qubit in the Bell state, the result of measurementon the second qubit is determined. Not only this, but even if the two qubits are (practically) infinitely faraway, the result of measurement os determined instantaneously, after measuring one of them. It is surprisinghow this is possible because we know that information cannot be transmitted at a speed greater than thevelocity of light. Therefore, following this difficulty, EPR suggested that the two half’s of the Bell statesalready had some more information, that describes the state of the two partice system at any given time,and that we have not accounted for that information in our formulation of Quantum mechanics. Also, EPRcalled this information, hidden variables. Later on ,John Bell suggested a mechanism to test for the presenceof local hidden variables. He devised an inequality which and he claimed that, if a quantum system does notsatisfy, it is impossible for it to have a local hidden variable.

1.9.3 Application of Bell States: Quantum Teleportation

Consider the following problem:Bob and Alice are two friends living far apart. They together generated an EPR pair or a Bell State. Aliceand Bob have the 2 halfs of the state. Now Bob is hiding, and Alice’s mission is that she must deliver a state|ψ to Bob.From looking at the problem, we can see that things are bad for Alice. She can only send classical information.She cannot copy the state |ψ. Alice does not even know the state |ψ, and even if she knew it, it would takeinfinite amount of information (and time) to describe |ψ since it takes values in a continuous space.So, the only thing left for Alice is to utilize the EPR pair (take advantage of the property of coherence ofmeasurement). So, briefly, what Alice does is the following:


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.9. QUANTUM TELEPORTAION

1. Alice interacts |ψ with her half of the EPR pair, and gets one of the four possible results: 00, 01, 10,11.

2. She sends the obtained result as classical information to Bob.

3. Bob, knows that his measurement7, i.e; when he interacts |ψ with his half of the EPR pair, the resultwill be the same. So since Bob knows the result (sent by Alice), he decides an appropriate operation8

on his half of the EPR pair, and he gets |ψ.

Note that here the |ψ has been communicated from Alice to Bob, without being actually transmitted. Theinformation was conveyed without any transport. So, it can be called Teleported, and hence the name :Quantum Teleportation.

Now let us look into the process more closely:

let|00+ |11√

2be the Bell state or the EPR state that Alice and Bob together created. Let the state to be

conveyed to Bob be |ψ (the state to be teleported) = α|0+ β|1, where α and β are unknown amplitudes.Now, on acting the EPR pair, with the state |ψ, we get:

|ψ0 = |ψ|B00

|ψ0 =1√2

[α|0(|00+ |11) + β|1(|00+ |11)]

Now, the first qubit represents the message to be teleported and the secondis Alice’s half of the Bell state. The last is Bob’s half of the Bell state.

Alice’s now will get her qubits into a CNOT gate and obtain |ψ1, where:

|ψ1 =1√2

[α|0(|00+ |11) + β|1(|10+ |01)]

Now, on sending |ψ1 through a Hadamard gate, we get |ψ2 where:

|ψ2 =1√2

α|0+ |1√

2(|00+ |11) + β

|0 − |1√2

(|10+ |01)

⇒ 12

[α(|0+ |1)(|00+ |11) + β(|0 − |1)(|10+ |01)]

On expanding and rearranging the terms, we can obtain the following expressions:

|ψ =12

[|00(α|0+ β|1) + |01(α|1+ β|0) + |10(α|0 − β|1) + |11(α|1 − β|0)]

So, as per the convention,

Figure 1.9: The first two bits are Alice’s and the next is Bob’s as shown.7measurement means any interaction with the system8kind of inverse operation of what Alice did


DRAFT COPY

1.9. QUANTUM TELEPORTAION CHAPTER 1. BRIEF OVERVIEW

if Alice performs a measurement and obtains 00, she will send it to Bob, then Bob will see which Qubit ofhis in the expression will also give the same result. So, he will come to the conclusion that the state he wantsis with the Qubit |00. So, we have:

Table 1.2: Alice’s measurement and the corresponding state |ψ:Alice’s measurement result State |ψ3 that Bob shall recover and get as the final message from Alice:

00 |ψ3(00) ≡ [α|0+ β|1]

01 |ψ3(01) ≡ [α|1+ β|0]

10 |ψ3(10) ≡ [α|0 − β|1]

11 |ψ3(11) ≡ [α|1 − β|0]

From the measurement results, Bob will apply a certain operation and retrieve the corresponding Qubit. Forexample, if the measurement result is

1. 00, then Bob will let |ψ as it is.

2. 01, then he will act the |ψ state with an X gate9.

3. 10, he will apply Z gate to |ψ.

4. 11 then he will first apply X gate to |ψ and then Z gate on the rest.

Therefore, the resultant operation summarizes to:

ZM1XM2 |ψ2 = |ψ

where M1 and M2 are the two bits of information send by Alice. These are the results of Alice’s measurement.

Summarizing all the above operations, we can not draw a circuit diagram that Bob must follow, to retrievethe state |ψ:

Figure 1.10: The top 2 lines of input are for Alice (the first 2 qubits belong to Alice). The bottom qubitbelongs to Bob. The measurement operation that Bob must follow is described in the generalized fashion asZM1XM2 .

9NOT gate, these conventions will be used in most of the places.


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.10. QUANTUM ALGORITHMS

1.9.4 Resolving some ambiguities

The whole process is slightly surprising because it causes several doubts that imply that the teleportationviolates the laws of Quantum Computation, that we earlier agreed upon. The following are the ambigui-ties:

1. This seems to imply that |ψ is being copied by Bob from Alice. This is against the law of no cloningtheorem that we discussed. The subtlety that is hidden here is that, both copies of |ψ never coexistbecause as Bob gets the result of Alice’s measurement (in odder for him to create his |ψ, Alice hadalready destroy his copy due to the measurement.

2. The process of Teleportation seems to say that the information |ψ is being conveyed instantly, sinceit does not explicitly involve passage of any data. This is misleading because the fact that Bob mustget Alice’s measurement result, which is in the form of classical information, is a very very key conceptwithout which the teleportation is impossible. So, the speed of conveying the information is limited bythe speed of light10, and does not violate the theory of relativity.

1.10 Quantum Algorithms

Now, we can look at a few like why Quantum Computation is preferred to Classical computation, can aquantum computer do all that a classical computer is capable11, etc. However, Quantum gates cannotbe used directly as classical logic gates because the former are inherently reversible, while the later areirreversible. But still we can build classical reversible gates.

1.10.1 Simulating classical circuits using Quantum circuits

Any classical circuit or gate can be replaced by an equivalent reversible gate known as the Toffoli gate. ATafolli gate is has three input bits. The last input bit, the target bit is flipped if all the other bits are 1. Itleaves the first two bits unchanged and performs an XOR operation of (AND(all other bits)) with the lastbit. The Taffoli gate can now be used to simulate NAND12. This gate is now reversible and has an inversewhich is the Taffoli gate itself. The gate can be used as a quantum as well as a classical gate. In the quantumcase, the Taffoli gate takes a state |110 and gives |111. It simply permutes the computational basis states.

10speed of classical information cannot cross the velocity of light - Theory of relativity11It would be surprising if this was not possible because we know that all classical phenomena can be explained through

Quantum mechanics.12If it can simulate a NAND gate, which is a classical universal gate, then it can simulate all classical gates.


DRAFT COPY

1.11. QUANTUM PARALLELISM CHAPTER 1. BRIEF OVERVIEW

Figure 1.11: A Toffoli gate - Reversible classical gate: Truth table and circuit representations. The last bitis flipped if the first two bits are 1.

1.11 Quantum Parallelism

Parallelism has a different meaning in the quantum case, as compared to the classical case. In the classicalcase, at any given unit of time, only one unit of task is in execution. While, in the quantum case, parallelismactually takes its real meaning. At any given unit or instance in time, all the tasks run in parallel. Thisfeature in Quantum Mechanics (for the multiple qubit case) is achieved due to presence of the superpositionof bits like: [αij |ij]i,j . One of the most simple operations that can be performed on two qubits is an XORoperation (denoted by ⊕). For all practical purposes, let us assume XOR gate to be a universal gate. So, ifwe can do in parallel two XOR operations, then we have showed quantum parallelism in action. Let us nowclaim that corresponding to any function ’f’ where f:x → f(x) we can always define a unitary transformationUf , such that

Uf |x, y = |x, y ⊕ f(x)

We also impose a condition that for the quantum transformation or circuit given by Uf , the inputs shouldnot be in the computational basis.A rough circuit diagram of our set up is given below: (Here, since y is permanently set to |0 the value of|x, 0⊕ f(x) is equal to f(x) itself). In other words,

Uf |x, 0 = |x, 0⊕ f(x) ⇒ f(x)

Figure 1.12: Quantum circuit that computes f(x) (since |y = |0) simultaneously for 0 and 1


DRAFT COPY

CHAPTER 1. BRIEF OVERVIEW 1.11. QUANTUM PARALLELISM

The output, in this case is:|0, f(0)+ |1, f(1)√

2. Hence, in one run of the function Uf , we have computed

both f(0) and f(1). A single execution of Uf was able to compute f(x) for multiple values of x. Hence thename “parallelism”.This idea of parallelism can be extended to multiple qubit systems also. We can have a quick look at howthis is done. So, our main objective (when generalized to n qubit case) is that, given n qubit states, we mustcompute the values for f(x) for all the n states in parallel, that is, in a single evaluation of Uf . To do this,just like the previous case, let us take a (n+1) qubit system with the last qubit equal to |0 (It is similar tothe |y that we set to |0 in the two qubit case). We now need all permutations of the first n qubits (with eachqubit in the computation basis) in the system. To produce all the permutations of n qubit system (whereinitially, all qubits are set to |0) , we require n hadamard gates. We first set all the qubits to |0 initially.Each of the hadamard gate, acting on a qubit produces

H|0 =|0+ |1√

2

therefore: (H|0)(H|0) =|0+ |1√

2

|0+ |1√

2

⇒ (H|0)(H|0) =|0+ |01+ |10+ |11√

22

This can now be generalized to:

(H|0)(H|0)(H|0).......(H|0) =|0+ |1√

2

|0+ |1√

2

|0+ |1√

2

....

|0+ |1√

2

⇒ (H|0)(H|0)(H|0).......(H|0) =

i∈all permutations of n qubits |i√

2n

The above statement is in short represented as:

H⊗n|0⊗n =

i∈all permutations of n qubits |i√

2n

(1.6)

where H⊗n denotes the parallel action of n hadamard gates. This is also called a hadamard transform to thefirst n qubits.Now this (n+1) qubit system can be sent to the function Uf , and for each permutation, the function Uf

will produce f(x) where x ∈ permutations of n qubits. The final answer that will come as output from thefunction Uf is:

Uf (H⊗n|0⊗n|0) =1√2n

x∈all permutations of n qubits|x, f(x)

Therefore, we can compute 2n values of f(x) for different values of x, in one single evaluation of the functionf.It is clear that this parallelism could not be possible if the qubits were not in the superposition of states.In other words, we have shown that the parallelism works only in the quantum case and not in the case ofprobabilistic classical bits because the classical bits cannot be in superposition of states.But the story does not end here. There is a catch. The problem is that even if Uf computes all thevalues of f(x) in one single evaluation, and returns the result, as shown above, we are still restricted (by thenature of measurement that can be made in quantum mechanics) to only obtain one value of f(x). After thismeasurement, the entire output will collapse to an eigen state. So, though we have shown to compute f(x)for 2n values of x, we are allowed to look at f(x) for only one value of x. But there is a way of getting out ofhere, and we shall see how, in the later sections.


DRAFT COPY


1.11.1 Example of Quantum Parallelism - Deutsch Jozsa Algorithm

In the previous few sections, we have been talking about features that are present only on quantum computersand not in classical ones. In this section, we shall see how these features can come together and outperformthe classical computer (for this problem which is going to be presented). This problem is given by Deutschand Jozsa. The problem formulation is given as the following situation:

Alice in Amsterdam selects a number between 0 to 2n−1 and mails it in a letter to Bob in Boston. Bob takesthe number x (sent by Alice) and computes f(x) and replies with the result, which is either 0 or 1. Now Bobhas promised to use a function which is of one of the two kinds; either f(x) is constant for all values of x, orf(x) is balanced, i.e; equal to 1 for exactly half of all possible values and 0 for the other half. Alice’s goal is todetermine with certainty weather Bob has chosen a constant or a balanced function, corresponding with himas little as possible. How far can she succeed ?

Let us start by taking the naive classical method, which is when Alice shall query Bob at most 2n−1 + 1times (until she gets a different number from Bob). If she gets a different answer at any instant, she canconclude that f(x) is balanced. If she gets a particular 2n−1 + 1, then it means that the function is constant(this is because, if it hadn’t been constant, it must be balanced and hence it must have 2n−1 values of theother kind. But our function in this case has 2n−1 + 1 values of one kind.). So, in this solution, Alice andBob need to communicate for at most for 2n−1 + 1 values of x.Let us try to do better by taking the quantum analogue of this problem. Let Alice and Bob be allowed toexchange Qubits instead of classical bits, and let f(x) now be a function that operates on classical bits. Theoutput of the function, however the type of output remains the same. Let us now work through the processfor the simple n = 2 case. So, we need Alice to choose a number between 0 and 3. Alice has 4 choices, and weneed two qubits to distinctly represent 4 different quantities. So, let us suppose Alice chooses some numberthat is represented by the state |ψ0 = |01. Now, let us send each qubit through a Hadamard gate, andobtain the new state |ψ1. The parallel action of two Hadamard gates on the state |ψ0 produces |ψ1. So,we have:

input state (Alice’s number): |ψ0 = |01now:H⊗2|ψ0 = |ψ1

⇒ |ψ1 =|0+ |1√

2

|0 − |1√

2

Now, let us apply the transformation Uf to the two qubits obtained. (Uf is the quantum circuit that wediscussed in the last subsection). Just for recap,

Uf |x, y = |x, y ⊕ f(x)

We can now send the two qubits of |ψ1 into the quantum circuit Uf , as |x and |y respectively. Let usdefine the new qubit |ψ2, which is obtained after acting |ψ1 with Uf . So, we get:

|ψ2 = Uf

|0+ |1√

2

|0 − |1√

2

On expanding the product, we get: |ψ2 = Uf

|0(|0 − |1) + |1(|0 − |1)

2

On separating the terms: |ψ2 =Uf

|0(|0 − |1)2

+ Uf|1(|0 − |1)

2

(1.7)

Here, both the terms which are separated out and being added, are of the

same form: Uf |x|0 − |1√

2

(1.8)


DRAFT COPY


Therefore, let us first try to compute a general form for the expression: Uf |x|0 − |1√

2

Hence, we get: Uf |x|0 − |1√

2

⇒ Uf

|x, 0 − |x, 1√

2

On the action of Uf the expression becomes:|x, 0⊕ f(x) − |x, 1⊕ f(x)√

2

Which can be further simplified to: |x|0⊕ f(x) − |1⊕ f(x)√

2

Assuming that we know the property of the ⊕ (XOR) operation, let us consider two cases: f(x) = 0 and f(x)= 1.

Taking the first case: f(x) = 0:

⇒ |x|0⊕ 0 − |1⊕ 0√

2

⇒ |x

|0 − |1√

2

Similarly, taking the second case: f(x) = 1:

⇒ |x|0⊕ 1 − |1⊕ 1√

2

⇒ |x

|1 − |0√

2

Therefore, we can summarize by giving a general form for |x|0 − |1√

2

by inspecting the above two cases,

as the following:

|x|0 − |1√

2

= (−1)f(x)|x

|0 − |1√

2

(1.9)

Now, since we have obtained a general form for the expression, we can substitute values of x to get the termsfor the expressions for |ψ2, that we obtained earlier in (1.7). The first and second terms of the RHS (ofexpression in (1.7)) can be evaluated directly by substituting values of x (of expression in (1.9)) to be 0 and1 respectively.Therefore, we obtain:

Putting x = 0 in the first term and x = 1 in the second term, of the RHS:

|ψ2 = (−1)f(0)|0|0 − |1

2

+ (−1)f(1)|1

|0 − |1

2

The above expression gives the same value if f(0) = f(1), and a different value otherwise. So, we have twodifferent possibilities of evaluation: one is if f(0) = f(1), the other is when f(0) = f(1).

Let us consider the case f(0) = f(1). So, let f(0) = f(1) = c. Hence, we get:

|ψ2 = (−1)c

|0+ |1√

2

|0 − |1√

2

. Here, we can write (−1)c as ±

Similarly, for the other case, f(0) = f(1), we have the following argument.

If f(0) = f(1), then (−1)f(0) − (−1)f(1). So, we have: let (−1)f(0) = −(−1)f(1) = c

Hence, the expression becomes: |ψ2 = c

|0 − |1√

2

|0 − |1√

2

. Here, we can write c as ±

So, we obtain two expressions for |ψ2. Now we can write them as:

|ψ2 =

±|0+ |1√

2

|0 − |1√

2

f(0) = f(1)

±|0 − |1√

2

|0 − |1√

2

f(0) = f(1)

(1.10)


DRAFT COPY


Now, we have obtained |ψ2. So, let us now act the first qubit with |ψ2, to produce |ψ3. So, we get: H|ψ2= |ψ3. Let us see the action of H on the first qubit, separately for the two cases which are (1.10) and (1.10).Let us expand the product in (1.10) and see the action of H on the first qubit.

H|ψ2 = ±H|00+ |01+ |10+ |11

2

We can act H with the first qubit and leave the second one unchanged. So, we get:

H|ψ2 = ±12

|0+ |1√

2

|0 −

|0+ |1√

2

|1+

|0 − |1√

2

|0 −

|0 − |1√

2

|1

Hence, on simplifying: H|ψ2 = ± 12√

2[|00 − |01+ |10 − |11+ |00 − |10 − |01+ |11]

⇒ H|ψ2 = ±|0|0 − |1√

2

Similarly, on carrying out the calculations, for the other case of |ψ2, we get:

H|ψ2 = ±|1|0 − |1√

2

So, we obtain two expressions for |ψ3 just like how we got for |ψ2. Now we can write them as:

|ψ3 =

±|0|0+ |1√

2

f(0) = f(1)

±|1|0 − |1√

2

f(0) = f(1)

(1.11)

The above expression (1.11) can be writtern in a more shorter and concise manner, in the following way.:

|ψ3 = ±|f(0)⊕ f(1)|0+ |1√

2

(1.12)

Hence, by measuring the first qubit, Alice can, by certainty can say weather f is balanced or constant. Hence,by a single run of the function, Alice was able to determine a global property of the function. Earlier we sawthat though we could compute the function for lots of inputs, but we could measure only one result. Here,with the same constraint, we have accomplished our task. So, on summarizing we have done the followingsteps:

1. The input state was prepared.

2. Hadamard Transform on the input state was performed.

3. The qubit state was now passed to the Uf transformation.

4. The first qubit of resultant state was passed through the Hadamard gate.

We can now generalize the whole procedure for the multiple qubit case, that is Alice can choose any numberbetween 0 to 2n where n can be as large. So, in the multiple qubit case, Alice’s query register can berepresented by a n qubit state, all of which are initially set to zero. Alice’s input state is a (n+1) qubit statewith the last qubit set to |1.

Alice’s input state: |ψ0 = |0⊗n|1

Now, let us perform a Hadamard Transform on the query register and pass the last qubit (answer register,which Bob will modify and send |1 into a Hadamard gate, to get |ψ1 . (this is similar to (1.8)). In theprevious case, instead of the transform, we had only one gate. From (1.6), we know the result of Hadamard


DRAFT COPY


Transform on N qubits, where all are initially set to |0, gives

x∈0,1n

|x√2n

, and the hadamard gate on the

single qubit |1 gives|0 − |1√

2

. So, on their parallel action, we get:

|ψ1 =

x∈0,1n

|x√2n

|0 − |1√

2

(1.13)

where (x ∈ 0, 1n) means all permutations of 0 and 1 which are of length n.Bob now takes the input state sent by Alice and computes the function f using the transformation Uf . Bobnow sends the answer in |ψ2 (the form of |ψ2 resembles the equation in (1.9)):

|ψ2 =

x∈0,1n

(−1)f(x)|x√2n

|0 − |1√

2

(1.14)

Alice has now a set of Qubits in which the result of Bob’s function is stored in the amplitude of the superpo-

sition state:

x∈0,1n

(−1)f(x)|x√2n

|0 − |1√

2

. She now interferes the terms in superposition with a Hadamard

transform, to get |ψ3. To calculate the effect of the Hadamard transform, we can first see how the transformin general works with the n-qubit state |x1, x2, x3, ....xn:

Hence, we need to calculate: H⊗n|x1, x2, x3, ....xn

Let us now go one more level down, and first try to calculate the result of Hadamard transform on a singlequbit: H⊗n|x


DRAFT COPY



DRAFT COPY

Part II

PREREQUISITES - MATHEMATICS

33

DRAFT COPY

DRAFT COPY

Chapter 2

Linear Algebra

2.1 Vector Spaces

A vector space over a field is a set that us closed under the finite addition and scalar multiplication, wherethe scalar belongs to the field. In this context, we shall deal with vector spaces over the field of complexnumbers, denoted by C. A vector space has the following properties:

1. Closure under vector addition:(u + v) ∈ V ∀ (u,v) ∈ V

2. Closure under scalar multiplication:cu ∈ V ∀ u ∈ V , c ∈ F

3. Vector addition and scalar multiplication are commutativeu + v = v + u ∀ u,v ∈ V(cd)u = c(du) = d(cu) ∀ u ∈ V , (c,d) ∈ F

4. Vector addition distributivec(u + v) = cu + cv ∀ (u,v) ∈ V , c ∈ F

5. Existance of an additive inverse∀ u ∈ V, ∃ v ∈ V such that u + v = 0The vector v is denoted as -u and called the inverse of u, under addition.

6. Existance of a scalar identity under multiplication∀ u ∈ V, ∃ c ∈ F such that cu = uThe identity is denoted by 1.

7. Existance of a zero vector∀ u ∈ V, ∃ v ∈ V such that u + v = uv is then called the zero vector and denoted by 0.

In this context, we shall deal with vector spaces over the field of complex numbers, denoted by C.A vector can be represented by a matrix. The entries of the matrix are called the components of the vector.

35

DRAFT COPY

2.2. LINEAR DEPENDENCE AND INDEPENDENCE CHAPTER 2. LINEAR ALGEBRA

For example:

v =

v1

v2

v3

.

.

.vn

, u =

u1

u2

u3

.

.

.un

, v + u=

v1 + u1

v2 + u2

v3 + u3

.

.

.vn + un

Here n is called the dimension of the vector space. The vector space can also be infinite dimensinal.

2.2 Linear dependence and independence

A set of vectors −→v1, −→v2, −→v3 . . . . −→vn are said to be linearly dependent iff ∃ some constants ci’s suchthat:

n

i=1

ci−→vi = 0.

Exception: The trivial case where all ci’s are = 0.

Each vector depends on the rest of the vectors by a linear relationship. Hence the name linearly dependent.The vectors are called linearly independent iff they do not satisfy the above conditions. For linearlyindependent vectors, no vector is related to the rest by a linear relationship. The maximum number oflinearly independent vectors in a vector space is called the dimension of the vector space, denoted by n.Any vector −→x in this vector space can be uniquely described as a linear combination of these n vectors of thevector space. The proof for this is quite simple. A n-dimensional vector space will have a set of n linearlyindependent vectors. If we add the vector −→x to this set, then we will still have a n-dimensional space. So,this means the set −→v1, −→v2, −→v3 . . . . −→vn, −→x is linearly dependent. So:

n

i=1

ci−→vi + cx

−→x = 0

so:n

i=1

ci−→vi = cx

−→x

⇒n

i=1

ci−→vi = −→x

Hence, we have shown that the general vector −→x can be expressed uniquely as a linear combination of then-linearly independent vectors of the vector space. The set of n linearly independent vectors is called thebasis of the vector space. The set of all the vectors represented by this basis is called the spanning set ofthe basis.

2.3 Dual Spaces

A linear function can be defined over the elements of the vector space. Let f be a function which is linear.f : −→x → f(−→x ), such that f(−→x ) ∈ C. The function maps an element in the vector space to an element inthe field over which the vector space is defined. By our convention we represent a vector −→x by a coloumn


DRAFT COPY

CHAPTER 2. LINEAR ALGEBRA 2.4. DIRAC’S BRA KET NOTATION

vector. So, we can see by inspection that f should be represented using a row vector. Then on multiplyingthe row and the column vector, we get a single scalar. The function satisfies the requirement:

f(a−→x + b−→y ) = af(−→x ) + bf(−→x )where: a,b ∈ F and −→x ,−→y ∈ V

So, we can propose a representation for the function:

if −→x =

x1

x2

x3

.

.

.xn

then we can say that f =f1 f2 f3 . . .fn

Hence, on matrix multiplication of f with−→x , we get f1x1 + f2x2 + f3x3....fnxn

which is a dimensionless number ∈ F.

The space of such functions is called the dual space to the corresponding vector space. If the vector spaceis V, then the corresponding dual space is represented by V∗.

2.4 Dirac’s Bra Ket notation

The Dirac bra-ket notation is widely used in Quantum Mechanics. Here any vector −→x (represented as acolumn matrix) is represented as |x and this vector is called a ket vector. The corresponding dual vectoris called the bra vector, denoted by x|. Hence the name bra ket notation. We would be following thismethod from now on. The reasons will become evident as we proceed. So, we have the conventions:

x|↔ |x (2.1)x|c ↔ c∗|x (where c is some complex number) (2.2)

2.5 Inner and outer products

Consider a vector |x and its dual vector x|. We saw that the former was a column matrix and the latterthe row matrix. x| is called the bra dual of |x. So, the product of a ket vector and its corresponding bravector is a dimensionless scalar quantity (as we saw this is multiplying a row matrix with a column matrix).The product is represented as: x|x. This product is called inner product. The inner product of a braand a ket vector is in short defined as:

x|y =

i

y∗i xi (2.3)

The inner product also satisfies some linearity properties:

x|c1y1 + c2y2 = c1x1|y+ c2x2|y (2.4)c1x1 + c2x2|y = c∗1x1|y+ c∗2x2|y (2.5)

The first can be derived by pulling the constant out of the ket vector. The second can be obtained by thesame technique, but using the rule given in (2.2). Another important property satisfied by the inner productis:

x|y = (y|x)∗ . (2.6)


DRAFT COPY

2.6. ORTHONORMAL BASIS AND COMPLETENESS RELATIONSCHAPTER 2. LINEAR ALGEBRA

This can be proved as follows:

given from equation (2.3): x|y =

i

y∗i xi

Consider the RHS: ⇒

i

y∗i xi

⇒

i

(yix∗i )∗

⇒

i

yix∗i

∗

coming back to the initial definition of the inner product, we get back:⇒ (y|x)∗

Therefore, hence we have proved that: x|y = (y|x)∗

The inner product of a vector and its bra dual gives the square of the norm of the vector. It is representedas:

x|x = || |x ||2 (2.7)

The norm is a scalar dimensionless quantity. However there is also another possibility which we haven’t yetexplored. What about the product |xx|. This is a product of a column matrix with a row matrix. Thisis a matrix whose dimensions will be (n×n) where n is the dimension of the vector space containing |x.This product is called an outer product. A matrix can be thought as any transformation on a vector. Inquantum mechanics, these transformations or matrices are called operators.

2.6 Orthonormal Basis and Completeness Relations

We saw that any set of n linearly independent vectors |v1, |v2, |v3, ...|vn form a basis for a n dimensionalvector space. We impose the condition that:

vi|vj = δij for all i , j ∈ 1..n

The basis that satisfies this condition is called the orthonormal basis. The orthonormal basis is conven-tionally represented by |e1, |e2, |e3, ...|en. The basis has some useful properties. Consider a vector |xin the vector space spanned by the basis ei. We can use this othonormality property to get the coefficientsmultiplying the basis states:

|x =

i

ci|ei

multiply both sides by ej |: ej |x =

i

ciej |ei

since ej |ei = δji, it will be 1 only when j = i

Hence, when summed over we get: ej |x = cj (2.8)

Therefore, to get the coefficient multiplying the jth basis (orthonormal) state, in some basis expansion for avector, we must take an inner product of this basis state with the vector. Now, we can see some results withthe outer product of the basis states. The outer product leads to an operator. Now, let us take the basis


DRAFT COPY

CHAPTER 2. LINEAR ALGEBRA 2.7. PROJECTION OPERATOR

expansion of the vector, and substitute the result obtained in (2.8):

|x =

i

ci|ei

from (2.8), |x =

i

ei|x|ei

The expression inside the summation is a scalar multiplication with a vector, which is commutative.

Hence: |x =

i

|eiei|x

Now, consider the RHS of the form A|x, where A is some operator:

Hence: |x =

i

|eiei||x

Now, looking at the form of this operator, we can easily guess that the operator is the identity operator.Therefore:

i

|eiei| = I (2.9)

This relation given above is called the completeness relation.

2.7 Projection operator

Consider the operator |ekek|. This operator when acts on any vector, gives the coefficient of |ek, in theek basis expansion of that vector. So, in other words, it takes any vector and projects the vector alongthe |ek direction. The projection occurs due to the formation of the inner product of the bra ek| with thearbitrary ket vector. Hence this operator |ekek| is called the projection operator. It is denoted by Pk.The projection operator satisfies some important properties:

(Pk)n = (Pk) (2.10)PiPj = 0 (2.11)

i

Pi = I (2.12)

The first (2.10) can be proved by considering Pk = |ekek|. So,

(Pk)n = |ekek|ekek|ekek| . . . |ekek|.

Now, we can simplify this statement by taking inner product. We use the fact that |ekek| = 1. Only theterminal terms |ek and ek| will remain after all the others become 1 through formation of inner product.Hence, the RHS will be Pk.The second statement (2.11) can be proved by considering: PiPj = |eiei|ejej |. Now the inner productei|ej is = 0. The entire expression reduces to 0. Hence the statement is proved.The third statement (2.12) is already proved in (2.9)when we were looking at the orthonormal basis andcompleteness relation.

2.8 Gram-Schmidt Orthonormalization

Given any linearly independent basis that span some vector space, we can construct an orthonormal basisfor the same vector space. If we could not do that, then the original set of vectors would not be a basis sincewe have a vector that cannot be uniquely described by them. Therefore, we proceed by trying to constructan orthonormal set of basis from a given basis. So, our objective is that we must construct a set Oi,


DRAFT COPY

2.8. GRAM-SCHMIDT ORTHONORMALIZATION CHAPTER 2. LINEAR ALGEBRA

corresponding to the set vi. By defenition of orthonormal basis, we need that Oi|Oi = 1 ; that is, thenorm of the vector must be = 1. So, we try to construct that basis. So, we define

|Oi =|vi|||vi||

(2.13)

Our set of vector are all of unit norm. So, we now need to construct orthogonal vectors out of these. Ouridea briefly is the following: We can illustrate this process taking an example of vectors on a 2 dimensionalreal vector space, which can be represented on a plane.

1. Without any loss of generality, take the first vector from the given basis (call it |x1) as the firstorthonormal vector |O1. Hence, we have:

Figure 2.1: V1 and v2 are the given basis vectors. choose O1 = v1

2. Take the second vector (|x2) from the given basis, and take the projcection of |x2 along this vector|O1. (take the projection by multiplying |O1O1| with |x2.) Now, we have a vector that is on |O1.Call this vector |PO1.

Figure 2.2: diagram for step 2. Project x2 along O1

3. Now, subtract PO1 from |v2 (using the triangle law of vector addition) to get |e2.


DRAFT COPY

CHAPTER 2. LINEAR ALGEBRA 2.9. LINEAR OPERATORS

Figure 2.3: diagram for step 3. get O2 = Po1 - O1. Dotted line indicates that we are subtracting vectorsusing the triangle law of addition

4. Now, by construction, this vector |e2 is orthogonal to |e1.

Figure 2.4: final diagram: O1 and O2 are formed.

5. Similarly, we go on to construct all ei. In the ith step, we need to subtract all the projections of vi

from vi.

So, summing up, we got |O2 from performing (|v2 - |PO1). We get |PO1 by performing (|O1O1|)|v2.Hence, we can write:

|Oj =|vj −

j−1i |OiOi|vj

|||vj −j−1

i |OiOi|vj||where (1 ≥ j ≥ n) (2.14)

The denominator is to normalize (setting its norm to 1) the resultant vector. n is the dimensionality of thevector space spanned by the given basis.

2.9 Linear Operators

A linear operator between two vector spaces V and U is any function A: V → U which is linear in itsinputs:

A

i

ai|vi

=

i

ai (A|vi) (2.15)

From the above definition we see that this function has a value in U for each element in V. If V andU are n and m dimensional respectively, then the function will transform each dimension out of n to a


DRAFT COPY

2.10. HERMITIAN MATRICES CHAPTER 2. LINEAR ALGEBRA

dimension out of m. In other words, if V is spanned by the basis |v1, |v2, |v3, ..., |vn and U is spannedby |u1, |u2, |u3, ..., |um, then for every i in 1 to n, there exists complex numbers A1i to Ami suchthat

A|vi =

i

Aij |ui

If V and U are n and m dimensional respectively, then A is m× n dimensional.

2.10 Hermitian Matrices

A matrix is called self adjoint or Hermitian if the matrix is equal to its own conjugate transpose. Thatis, the element

Aij = (Aji)∗ (2.16)

The congugate transpose of a matrix A is denoted as A†, where, the relation (2.16) holds. Hence, a matrixA is called Hermitian iff A = A†. Since we have this property for the Hermitian matrix, we can see that thediagonal entries of a Hermitian matrix have to be real. A very similar defenition holds also for Hermitianoperators as well. An operator is Hermitian if its matrix representation has a hermitian form. The conjugatetranspose for an operator is defined as:

u|A|v = (v|A†|u)∗ (2.17)

Therefore if an operator is Hermitian,

u|A|v = (v|A|u)∗ (2.18)

is satisfied. The defenition of a conjugate transpose applies to vectors as well. We have:

(|x)† ≡x∗1 x∗2 x∗3 . . . x∗n

≡ x| (2.19)

Let us look at some derivations that we need to know before proceeding:

(A + B)† = A† + B† (2.20)

(cA)† = c∗A† (2.21)

(AB)† = B†A† (2.22)

Let us consider the matrix A to have the elements Aij and the matrix B to have the elements Bij. Ifwe prove that the above laws hold for an arbitrary element in A and B, then we have proved the laws ingeneral.For the first case, (A + B) will have the elements (Aij + Bij). Let this sum be = Cij . Now, C† will havethe elements (Cji)∗. By laws of matrix addition,

(Cji)∗ = (Aji)∗ + (Bji)∗.Since, (Aji)∗ + (Bji)∗ = (Aji + Bji)∗ therefore, we can say

(Cji)∗ = (Aji + Bji)∗ . Since it is true for Cji it should also hold for Cij :Therefore (Cji)∗ = (Aji + Bji)∗

Now since the law holds for the individual elements, in should also hold for the matrix or the operator.Therefore, we can say:

C† = (A + B)†


DRAFT COPY

CHAPTER 2. LINEAR ALGEBRA 2.11. SPECTRAL THEOREM

2.11 Spectral Theorem

Theorem: For every normal operator M acting on a vector space V, there exits an orthonormal basis for Vin which the operator M has a diagonal representation.Proof : We try to prove this theorem by induction on the dimension of the vector space V. We know thatfor a single dimensional vector space, any operator is diagonal. So, let M be a normal operator on a vectorspace V which is n dimensional. Now, let the eigenvalues and the eigenvectors of M be λ and |a respectively.Let P be a projector on to the λ eigen space of M. Let Q be the complement of the projector operator.We have the relation (P + Q) = I and M|a = λ|a.We now make a series of manipulations that shall give us the operator A, that acts on V, in terms ofsome operator(s) that/those act on subspace(s) of V. This is because, we already assume that the spectraltheorem holds in the lower dimensional spaces (subspaces), i,e; Every normal operator in the subspace of Vhas diagonal representation in some orthonormal basis for that subspace.

M = IAIusing P + Q = I:

M = (P + Q)M(P + Q) ⇒ PMP + PMQ + QMP + QMQ

Now from these terms, we need to filter out those on the RHS which evaluate to 0. Taking QMP: P projectsa vector on to the λ eigen space. In other words, P acts on a vector to give the coefficient of the λ eigen state(when the vector is expressed as a linear combination of the λ eigenvectors). M Then acts on these eigenvectors (produced by P), giving the same eigen vector as the result. Q now acts on a vector and projectsit to the space which is an orthogonal complement of the λ eigen space. Since after the action of P and M,the result is an eigen vector of the lambda eigen space, the action of Q will now try to express this eigenvector as a linear combination of the eigen vectors which do not belong to the λ eigen space. But this is notpossible since we have assumed that the eigen vectors are orthonormal (linearly independent). So, the actionof Q, shall produce 0. Therefore, QMP = 0.We can now look at the term PMQ. Let us first consider that M is a normal matrix.

MM† = M†M

MM†|a = M†M|a⇒ M(M†|a) = λ(M†|a)

We now see that M†|a is also an eigenvector of M, with eigenvalue λ. Therefore, from the same aboveargument, M† acting on an eigenvector of the λ eigen space, gives another eigen vector in the λ eigen space.Hence, PA†Q = 0. Now, taking the adjoint of this equation gives: (PM†Q)† = 0 ⇒ QMP = 0 (as P andQ are Hermitian operators).Therefore, we have:

M = PMP + QMQ (2.23)

The operators on the RHS act on a subspace of V and we had already assumed that the spectral theoremholds for the subspace. So, if these operators are normal, then we can say that they are diagonal in somebasis.We can easily show that PMP is normal. Since, P projects any vector onto the λ eigen space, and thenthe vector is left unchanged by action of PM (except for a scalar multiplication of λ by M), we can saythat:

(PMP)† = λP (2.24)

Now since P is hermitian, it is obviously normal. Therefore we conclude that PMP is also normal.


DRAFT COPY

2.12. OPERATOR FUNCTIONS CHAPTER 2. LINEAR ALGEBRA

Similarly, we have that QMQ is also normal. (Note: Q2 = Q and Q is hermitian.) We also have:

QM = QMI ⇒ QM(P + Q)∴ QM = QMP + QMQ ⇒ QMQ (2.25)

Similarly: QM† = QM†Q (2.26)

So, using the above equations, we can prove that QMQ is normal.

(QMQ)† = QM†Q

∴ (QMQ)†(QMQ) = QM†QQMQ

since Q2 = Q: ⇒ QM†QMQ

since QM†Q = QM†: ⇒ QM†MQ

since MM† = M†M: ⇒ QMM†Q

since QM = QMQ: ⇒ QMQM†Q

since Q = Q2: ⇒ QMQQM†Q

Therefore, we have: ⇒ (QMQ)(QMQ)†

Therefore, QMQ is normal. Now since PMP and QMQ are normal operators on subspaces P, and Q, theyare diagonal in some orthonormal basis for those respective spaces.Now since PMP and QMQ are diagonal, their sum is also diagonal in the combined basis. Therefore,M = PMP + QMQ is diagonal in some basis for V. Hence, proved.

2.12 Operator functions

It is possible to extend the notion of functions, defined over complex numbers, to functions that are definedover operators. It is necessary that these operators are normal. A function on a normal matrix or an normaloperator is defined in the following way: Let A be some normal operator which has a spectral decompositionas

a

|aa|, then:

f(A) =

a

f(a)|aa| (2.27)

So, in the above equation, we try to represent the operator in a diagonal form and then apply the functionto each diagonal entry. We can try to prove the above equation for special functions. Take for example An

for some positive n.

An|aa| = λaAn−1|aa|⇒ λ2aA

n−2|aa|⇒ . . . · · ·⇒ λna |aa| (2.28)

From completeness theorem, we have:

a

|aa| = I. ∴ An = An

a

|aa|⇒

a

An|aa|

From equation (eq. 2.28), we see: An|aa| = λna |aa| ∴ An =

a

λna |aa| (2.29)

Hence, we prove equation (eq. 2.27) for the special case of the function being a power operation.

1. Why must we, in general, consider only normal operators or, operators having a spectral decomposition?

2. How do we prove equation (eq. 2.27) for the case of square-root or a logarithm operation, that isf(A) =

√A and f(A) = log A ?


DRAFT COPY

CHAPTER 2. LINEAR ALGEBRA 2.13. SIMULTANEOUS DIAGONALIZABLE THEOREM

2.12.1 Trace

The trace of a matrix is also a function on the matrix. The trace of a matrix A is defined as the sum of allthe diagonal elements of A. (A need not be a diagonal matrix ). The trace can be defined using the matrixnotation as well as the outer product form:

tr(A) =

i

Aii (2.30)

tr(A) =

i

i|A|i (2.31)

2.13 Simultaneous Diagonalizable Theorem

Theorem: Any two hermitian matrices A and B commute if and only if there exists an orthonormal basisin which the matrix representations of A and B are diagonal.We say that A and B are simultaneously diagonalizable if there exists some basis where the matrix represen-tations of A and B are diagonal.Proof : Let A and B be two operators that have a common orthonormal basis in which they are diagonal.This means they have a common eigen basis. Let the eigen basis be denoted by the set of eigenstates labelled|a, b. So, we have:

AB|a, b = bA|a, b = ab|a, b (2.32)BA|a, b = aB|a, b = ba|a, b (2.33)

On subtracting the above two equations: (eq. 2.32) - (eq. 2.33), we get

AB - BA|a, b = (ab− ba)|a, bAB - BA = 0 ⇒ [A,B] = 0 (2.34)

We have shown that A and B commute if and only if they have a common eigen basis. This proves thesimultaneous diagonalizable theorem.Proof of the converse: Let |a, j be the eigenstates of the operator A with the eigenvalue a. Here the index jdenotes the degeneracy. Let the vector space containing all eigenstates with eigenvalue a form the vector spaceVa. Let the projection operator onto the Va eigenspace be called Pa. Now let us assume that [A,B] = 0.Therefore, we have:

AB|a, j = BA|a, j = aB|a, j (2.35)

Therefore, we have that B|a, j is also an element of the eigenspace Va. Let us define an operator

Ba = PaBPa (2.36)

We can now try to see how Ba acts on an arbitrary vector. From definition (def. 2.36), we can see that Pa

will cut off all the components of a vector which does not belong to the Va eigen space. The vector, producedby action of Pa, shall belong to the Va eigenspace. The action of B on any vector inside the Va eigenspacewill leave the vector unchanged (except for multiplying it by a scalar). The action of Pa again on the vectorproduced by the action of B, will leave the vector unchanged. Now we can see how B†

a acts on any arbitraryvector. We have B†

a = PB†Pa. When an arbitrary vector is acted upon by Ba, the projection operatorPa shall produce a vector that entirely lies in the Va eigen space. The action of B†

a will now produce somearbitrary vector (since the vector produced by Pa may not be an eigenstate of B†

a. The projection operatorPa again acts on this arbitrary vector (produced by B†

a), taking it again to a vector that entirely lies in theVa eigen space.Therefore, summing up, we can say that the action of Ba and B†

a on any arbitrary vector, produce the samevector in the Va eigenspace. Therefore, we say that restricting Ba to subspace Va, Ba and B†

a are equivalent


DRAFT COPY

2.14. POLAR DECOMPOSITION CHAPTER 2. LINEAR ALGEBRA

operators acting on Va. In other words, the restriction of Ba to space Va is hermitian on Va.Since, Ba is hermitian operator on Va, it must have a spectral decomposition in terms of an orthonormal setof eigenvectors in Va. These eigenvectors are both eigenvectors A (since they belong to the Va eigenspace)and Ba (since they are part of the spectral decomposition of Ba). Let us call these |a, b, k where the indicesa, b denote the eigenstates of A and B operators respectively, and k denotes the degeneracy.We have the eigenvalue equation Ba|a, b, k = b|a, b, k. Since, |a, b, k is an element of the Va eigenspace, wehave:

Pa|a, b, k = |a, b, k (2.37)

From equation (eq. 2.35), we also have that B|a, b, k is an element of space Va. So, similarly we cansay:

PaB|a, b, k = B|a, b, k

We can now modify this (above) equation by replacing |a, b, k on the LHS with Pa|a, b, k (refer equation(eq 2.37)):

B|a, b, k = PaBPa|a, b, k (2.38)

If we compare the RHS of the above equation with equation (eq 2.36), we see that the RHS of the aboveequation is the same as Ba|a, b, k. Since |a, b, k is an eigenstate of Ba with eigenvalue b, we can rewriteequation (eq. 2.38) as:

B|a, b, k = b|a, b, k (2.39)

Therefore, in the above equation, we see that |a, b, k is also an eigenstate of B with eigenvalue b. Hence, wesee that the set of vectors |a, b, k form a common eigen basis for A and B. Hence, we have proved that if[A,B] = 0, then there exists an orthonormal basis where A and B are diagonal.

2.14 Polar Decomposition

Theorem: For every matrix A, there exists an unitary matrix U and positive (semidefinite) matrices J andK, such that:

A = UJ = KU (2.40)

The above theorem can be said as: for every positive operators J and K, there exists an unitary matrix Usuch that: KU = UJ .

Proof: Consider the operator J =

A†A. By construction, the operator J is Hermitian. Therefore, thereexists a spectral decomposition for A (involving its eigen values and its eigen states). Therefore, let J =

i

λi|ii| , where λi (≥ 0) and |i are the eigen values and the corresponding eigen states of J.

Let us define

|ψi = A|i (2.41)

From this, we can see that ψ|ψ = λ2i . Now, consider all the non-zero eigen values, that is all λi = 0.

Let

|ei =|ψiλi

(2.42)

Therefore, we have ei|ej =ψi|ψiλiλj

= δij (since all λi’s are real). We are still considering the λi = 0

case. Let us now construct an orthonormal basis using the gram-schmidt orthonormalization technique, by


DRAFT COPY

CHAPTER 2. LINEAR ALGEBRA 2.15. SINGULAR VALUE DECOMPOSITION

starting off with the first vector as |ei. We can now construct an orthonormal basis |ei. Let us now definean operator U =

i

|eii|. When λi = 0, we have UJ|i = U(λi|i) since |i is an eigenstate of J. Now,

U(λi|i) = λiU|i . On using the definition of the operator U, we have: U|i = |ei. Therefore, UJ|i = λi|ei. From definition, (def 2.42), λi|ei = |ψi, where again from definition (2.41), we have |ψi = A|i. So, wecan summarize to say that

UJ|i = A|i (2.43)

When λi = 0, we have as before, UJ|i = U(0), and even |ψi = 0 from relation (def. 2.42). So, we againhave UJ|i = A|i. Therefore, we see that operators UJ and A agree with each other on all values in the |ibasis. Hence we can say that there exists a basis where A = UJ.

A = UJ (2.44)

This gives the right polar decomposition of A. Also, we can prove that the matrix J is unique for every A.This can be done by multiplying the equation (eq. 2.44) with its adjoint. So, we have:

A†A = JU†UJ

Since J is hermitian: A†A = J2 ⇒ J

∴ J =

A†A (2.45)

We can also get an expression for the operator U from equation (eq. 2.44), by post-multiplying both sidesby J−1. We have:

U = AJ−1 (2.46)

The above definition is however possible only if J is invertible or non-singular. Also whether J is invertibleor not depends on the existence of inverse for A. Therefore, we see from the equation (eq. 2.46) that if A isinvertible, then the U is also uniquely defined for every A.We can also do the left polar decomposition, starting from equation (eq. 2.44). On post-multiplying the RHSwith U†U (since U is unitary, this will preserve the equality) we get:

A = UJU†U

Now, let UJU† = K. So, we can rewrite the above equation as:

A = KU (2.47)

This now gives us the left polar decomposition of A. We can also show that the matrix K is uniquely definedfor every A. For this, we multiply the equation (eq. 2.47) with its adjoint. We get

AA† = KUU†K

Since, U is unitary, we have

K =

AA† (2.48)

Therefore, we see that K is uniquely defined. Similarly, we can show that if A is invertible, then the matrixU is uniquely defined.Therefore, combining both the left the and right parts, we have proved the polar decomposition.

2.15 Singular value decomposition

Theorem: For every matrix A, there exist unitary matrices U and V and a diagonal matrix D with positiveentries, such that:

A = UDV (2.49)

The diagonal entries of the matrix D are known as the singular values of A.


DRAFT COPY

2.15. SINGULAR VALUE DECOMPOSITION CHAPTER 2. LINEAR ALGEBRA

2.15.1 Proving the theorem for the special case of square matrices:

Before proving, we can go through some simple assumptions:

1. If A and B are two unitary matrices, then their product: AB is also an unitary matrix.Proof: To prove (AB)†AB = I , given A†A = AA† = B†B = BB† = I:

(AB)†(AB) = B†A†AB

Since A and B are unitary: ⇒ B†IB ⇒ IB†B ⇒ I∴ (AB)†(AB) = I

Proof: From the polar decomposition of A, we know that there exist an unitary matrix S and a positiveoperator J such that A = SJ . Since J is hermitian (it is defined as

A†A (Refer equation (eq. 2.45)) ,

we know that there exits a spectral decomposition for J. So we have some unitary matrix T and a diagonalmatrix D (with non-negative entries) such that J = TDT†. Now, we can rewrite A as:

A = SJ = STDT†

Now, since S and T are unitary, from assumption (1), we can define another unitary operator U = ST.Also since T is unitary (which implies that T† too is unitary), we can define another new unitary operatorV = T†. Putting all the definitions together, we have:

A = STDT† ⇒ UDV (2.50)

The above equation (eq. 2.50) now, proves the singular value decomposition theorem.

2.15.2 Proving the theorem for the general case:

Proof: Before getting into this proof, we first need to make some assumptions.

1. For every operator A, the operator A†A is Hermitian.

2. The eigenvalues of a Hermitian operator are real positive numbers.

3. The eigenvectors, corresponding to different eigenvalues, of a Hermitian operator are orthogonal.

Let us now construct a hermitian operator A†A. Consider the eigenvalue equation for this operator:

A†A|λi = λi|λi (2.51)

Also, we claim (from assumption no.2) that the set of eigenvalues λi are all real positive numbers.That is if the set of eigenvalues λi are all > 0 for 0 < i ≤ r, then the set λi for (r+1) ≤ i ≤ n are all zero.Therefore, the set of eigenvectors |λ1, |λ2, |λ3, . . . |λr is orthogonal to the set of eigenvectors |λr+1, |λr+2, |λr+3, . . . |λn,as they correspond to different eigenvalues (see assumption 3).


DRAFT COPY

CHAPTER 2. LINEAR ALGEBRA 2.15. SINGULAR VALUE DECOMPOSITION

Let us now construct a few operators:

V =|λ1 |λ2 |λ3 . . . |λr |λr+1 . . . |λn

(2.52)

D =

λ1

λ2

..

λr

..

λn

(2.53)

U =|µ1 |µ2 |µ3 . . . |µr |µr+1 . . . |µm

(2.54)

where |µi =1√λi

A|λi (1 < i < r) (2.55)

The set |µ1, |µ2 . . . |µr is orthogonal to |µr+1 . . . |µm (2.56)

Now, having these definitions, we can perform the product UDV†:

UD =

λ1|µ1

λ2|µ2

λ3|µ3 . . .

λr|µr

λr+1|µr+1 . . .

λm|µm

(2.57)

From equation (eq. 2.55), we see that the entries (|µ1, |µ2, |µ3 . . . |µr) of the matrix UD can be simplified.The entries after |µr can be left unchanged.

∴ UD =

A|λ1 A|λ2 A|λ3 . . . A|λr

λr+1|µr+1 . . .

λm|µm

(2.58)

⇒ UDV† =

A|λ1 A|λ2 . . . A|λr

λr+1|µr+1 . . .

λm|µm

λ1|λ2|

.

.λr|λr+1|

.

.λn|

(2.59)

∴ UDV† = A|λ1λ1|+ A|λ2λ2|+ · · ·+ A|λrλr|+

λr+1|µr+1λr+1|+ · · ·+

λm|µmλm| (2.60)

Since, we assumed earlier that eigenvalues λi are all zero for i > r, can ignore the terms in equation (eq.2.60) that succeed A|λrλr| .

UDV† = Ar

i=1

|λiλi| (2.61)

Since, we know that all the eigenvalues after λr are 0, we have A|λi = λi|λi = 0 for r < i < n. If A|λi = 0,then A|λiλi| = λi|0 ⇒ 0 . So, we can write the above equation (eq. 2.61) as:

UDV† = Ar

i=1

|λiλi|+n

i=r+1

A|λiλi| (2.62)

⇒ UDV† = An

i=1

|λiλi| (2.63)

The above is the completeness relation. So, we can write:

UDV† = AI∴ UDV† = A (2.64)

Hence, we prove the singular value decomposition statement.


DRAFT COPY

2.15. SINGULAR VALUE DECOMPOSITION CHAPTER 2. LINEAR ALGEBRA


DRAFT COPY

Chapter 3

Elementary Group Theory

3.1 Structure of a Group

Group: A Group (G,) is a set G, along with a binary operation such that:

• Closure: It is closed under the binary operation: (a b) ∈ G ∀a, b ∈ G

• Associative: The binary operation on the elements of G is associative: a (b c) = (a b) c∀a, b, c ∈ G

• Identity: ∃ an unique element e ∈ G such that: a e = e a = a ∀a ∈ G. e is called the Identityelement of G.

• Inverse: ∃ b ∈ G such that: ab = ba = e ∀a, b ∈ G where e is the Identity element of G. b is calledthe inverse1 of a denoted as a−1.

Abelian: A group is called Abelian if, in addition to the above, the below property also holds:

• Commutative: a b = b a ∀a, b ∈ G.

Order of a Group: The order of a group (G,), denoted as |(G,)| is the cardinality of the set G: |(G,)| =#G. Order of an element of the group: The order of an element g ∈ (G,) is n where gn = e.

3.1.1 Cayley Table

As the group is defined along with a binary operation , we need to define this operation for every pair ofelements in G. To do this in a compact manner we have the following table.

1Note that the inverse of an element depends upon the individual element and is not unique for a group, unlike the identityelement.

51

DRAFT COPY

3.1. STRUCTURE OF A GROUP CHAPTER 3. ELEMENTARY GROUP THEORY

Table 3.1: Cayley Table for (G,). The (i, j) entry of the table gives gi gj . g1 is taken as the identityelement e.

X g1 g2 g3 g4 . . .

g1 g1 g1 g1 g2 g1 g3 g1 g4 . . .

g2 g2 g1. . . . . . g2 g4 . . .

g3 g3 g1 . . . g3 g3 . . . . . .

g4 g4 g1 . . . . . . g4 g4 . . .

......

......

.... . .

Constructing the Cayley Table

Note that:

1. a b = a c implies b = c ∀a, b, c ∈ G.proof: Pre-Multiplying a−1 on both sides: a−1 (a b) = a−1 (a c). Since ∃ a uniqe identity e,a−1a = e ∀a ∈ G. Hence eb = ec ⇒ b = c. Hence the result a binary operation on two elements willproduce distinct results, unless they are same. Since along a particular row (or a particular column),each pair of elements involved in the binary operation is distinct, so is the result of the binary operation.Hence each element along a particular row (or column) is unique, as there are #G positions along arow (or column).

2. Now in the previous case, putting a → a−1 and c → b−1, we see that a a−1 = a b−1 impliesa−1 = b−1. Hence each element in G has a unique inverse.

3. ∀a, b ∈ G for which [a, b] = 0, as a b = b a, (a b) is symmetric w.r.t the diagonal. Since ∀a ∈ G,a, a−1

= 0 and

a a−1

= e, all e’s are placed symmetric w.r.t the diagonal.

We now take a fixed example. Consider G = e, a, b, c, d, f with the binary operation ’·’. Let us constructthe cayley table for (G, ·).

1. As there are 5 elements (apart from e) and each of them must have an unique inverse, we see that (sincethere are an odd number of elements) at least one of them must be its own inverse. Just by choice wetake a, b, c to be thier own inverses: a−1 = a, b−1 = b, c−1 = c. For the other two elements d and f , weassume them to be inverses of each other: d−1 = f an f−1 = d. Hence a·a = b·b = c·c = d·f = f ·d = e.We now begin by placing e’s. Notice that they are symmetric about the diagonal.

2. Also note that g · e = g ∀g ∈ G. Hence the first row and first column are trivially filled.

3. Just by choice, we take a · b = c. From this, we get: a · b = c

a · b = c (3.1)a · c = a · (a · b)

= (a · a) · b= b ∴ a · c = b (3.2)

using (eq. 3.2): b · c = (a · c) · c= a · (c · c)= a ∴ b · c = a (3.3)


DRAFT COPY

CHAPTER 3. ELEMENTARY GROUP THEORY 3.1. STRUCTURE OF A GROUP

4. Notice that the first row has two vaccant positions. Using (statement. 1), these two positions must befilled by d and f . Hence consider the following two possibilities:

• a · d = d:

d = a · dd · f = (a · d) · f

e = a · (d · f)∴ e = a

The above statement is incorrect as it states that the identity element is not unique.

• Hence the only other option is a · d = f .

f = a · d (3.4)f · f = (a · d) · f

= a · (d · f)= a ∴ f · f = a (3.5)

Using (eq. 3.4): f · d = (a · d) · de = a · (d · d)

a · e = a · (a · (d · d))= (a · a) · (d · d)

a = d · d ∴ d · d = a (3.6)Using (eq. 3.6): a · (f · d) = d · d

(a · (f · d)) · f = (d · d) · f(a · f) · (d · f) = d · (d · f)

a · f = d ∴ a · f = d (3.7)

5. Similarly, we need to determine: (c · a), (c · b) and (b · a):

Using (eq. 3.1): c · a = (a · b) · a= a · (b · a)

a · (c · a) = (a · a) · (b · a)Using a · c = c · a : c = (b · a) ∴ b · a = c (3.8)

6. Hence the first row elements as well as thier symmetric couterparts are determined. Notice that thepositions marked by ’♣’ have to be filled either with d or f as the row containing them has the othersymbols. But neither can be used as the column containing each has both d and f . This states thatthe initial assumption a · b = c is wrong.


DRAFT COPY


Table 3.2: Cayley Table for G = e, a, b, c, d, f. Positions marked as ’−’ have not yet been filled.X e a b c d f

e e a b c d f

a a e c b f d

b b c e a ♣ ♣

c c − − e − −

d d − − − − e

f f − − − e a

7. Notice that a · b = a, a · b = b are invalid as we would then get e = a and e = b respectively. Witha · b = c also ruled out, we see that the only two options are a · b = d and a · b = f . Let us take a · b = d.With this we have:

a · b= d (3.9)Using (eq. 3.9): a · (a · b) = a · d

b = a · d ∴ a · d = b (3.10)Using (eq. 3.9): (a · b) · b = d · b

a = d · b ∴ d · b = a (3.11)Using (eq. 3.10): b · f = (a · d) · f

b · f = a ∴ b · f = a (3.12)Using (eq. 3.12): b · (b · f) = b · a

f = b · a ∴ b · a = f (3.13)Using (eq. 3.13): (b · a) · a = f · a

b = f · a ∴ f · a = b (3.14)

Filling the table now gives:


e e a b c d f

a a e d − b −

b b f e − − a

c c − − e − −

d d − a − − e

f f b − − e −


DRAFT COPY


8. Notice that in the row corresponding to a, we have two vaccancies for a · c and a · f . These must befilled using c or f , as the other elements are already contained in this row. Also as a · c = c (for if itwas, then we would have e = a, which is incorrect), the only assignments for the vaccancies are:

a · c= f (3.15)a · f= c (3.16)

Using (eq. 3.15): (a · c) · c = f · ca = f · c ∴ f · c = a (3.17)

Using (eq. 3.16): (a · f) · d = c · da = c · d ∴ c · d = a (3.18)

Using (eq. 3.18): c · (c · d) = c · ad = c · a ∴ c · a = d (3.19)

Using (eq. 3.20): (c · a) · a = d · ac = d · a ∴ d · a = c (3.20)

Filling the table, we get:


e e a b c d f

a a e d f b c

b b f e − − a

c c d − e a −

d d c a − − e

f f b − a e −

9. Notice that in the row corresponding to b, there are two vaccancies for b · c and b · d which must befilled using c and d as other elements in this row contain the rest of the elements. The option of puttingb · c = c (and hence b · d = d) is ruled out since we will then get b = e. Hence the only option of fillingthe two positions is to put:

b · c = d (3.21)b · d = c (3.22)

Using (eq. 3.21): (b · c) · c = d · cb = d · c ∴ d · c = b (3.23)

Now, the only vaccant position in the row corresponding to d (for d · d) must be filled with f .

d · d = f (3.24)Using (eq. 3.22): (b · d) · f = c · f

b = c · f ∴ c · f = b (3.25)

Finally, in the row corresponding to f , the only vaccancy (for f · f) must be replaced by d.

f · f = d (3.26)


DRAFT COPY


Hence we have the final cayley table:

Table 3.5: Final Cayley Table for G = e, a, b, c, d, f.X e a b c d f

e e a b c d f

a a e d f b c

b b f e d c a

c c d f e a b

d d c a b f e

f f b c a e d

3.1.2 Subgroups

Subgroup: (H,) is a subgroup of (G,) if H ⊆ G and (H,) satisfies the group properties.It is called an Abelian subgroup if it also satisfies the commutative law.We can now infer few general properties:

• Every subgroup of a group contains the identity element of the group. As (H,) is also a group, it hasan identity element. Moreover since the identity element of a group is unique, (G,) and (H,) mustcontain the same identity element, which is eG. Therefore (H,) contains eG.

• For every group, the set containing just the identity element (of a group) is a subgroup (of that group).This is becuase identity element is contained in the group (otherwise the group properties would otbe satisfied for this group). It has an identity by definition, it is its own inverse and is closed underthe group operation by definition. The subgroup containing just the identity element is called a trivialsubgroup.

Cosets

The coset of a subgroup (along with an element of the group) is the set containing the results of the binaryoperation between the given element and every element of the subgroup. Since the group in general is non-Abelian, we see that if (H,) is a subgroup of (G,) then for some g ∈ (G,) and h ∈ (H,), g h neednot be equal to h g. Therefore we need to specify which ’sided’ a particular binary operation is. Hencehave ’left’ and ’right’ cosets.Left and Right Cosets: The left and right cosets of (H,) in (G,) are defined by, for some g ∈ (G,):

Left Coset : g (H,) = g h|h ∈ (H,) (3.27)Right Coset : (H,) g = h g|h ∈ (H,) (3.28)

We can now look at a few properties of cosets.

1. From the definitions in (eq. 3.27) and (eq. 3.28), the number of elements in the left and right cosets ofa subgroup in a group is equal to the order of the subgroup.

2. Assumption: g (H,) = (H,) g = (H,) ∀g ∈ (H,).Justification: Since (H,) forms a group it is closed under the operation. Hence from (eq. 3.27)we see that ∀g, h ∈ (H,) g h ∈ (H,), and similarly from (eq. 3.28) we have ∀g, h ∈ (H,)


DRAFT COPY


h g ∈ (H,). Hence we see that all the elements in g (H,) and (H,) g are in (H,),∀g ∈ (H,). Hence we have the justification.

3. Assumption: ∀g ∈ (G,), the idenitity element of (G,): eG is contained in g (H,) and (H,)g.Justification: It suffices to show that ∃g ∈ (G,) such that eG = g h and eG = h g. This is trueas both (G,) and (H,) contain eG, so setting g = h = eG, we provide the justification.

4. Assumption: Every element of (G,) is present in exactly one of the left cosets of (H,) in (G,).Justification: If an element is present in two different left cosets, say g1 (H,) and g2 (H,), of(H,) in (G,) then ∃g1, g2, h ∈ (G,) with g1 = g2 such that g1h = g2h. From (eq. 1) of (sec. 3.1.1)we see that if g1 = g2, then g1h = g2h. Also note that g1h = g2h implies that g1 = g2, which meansthat the two left cosets are identical. Hence we have the justification.

5. Assumption: Every pair of cosets (of (H,) in (G,)) are either disjont or identical.Justification: In the previous statement we have showed that no two left cosets can share the sameelement unless the left cosets are identical, implying that every pair of left cosets is disjoint, unless theyare alike. Hence we have the justification.

6. Assumption: ∃ no g ∈ (G,) which is not present in any left coset of (H,) in (G,).Assumption: Consider the coset g (H,). It suffices to show that g ∈ g g (H,) ∀g ∈ (G,). Forg ∈ (H,) this statement is trivially true. Else, it means that ∃h ∈ (H,) such that g = g h. As(H,) contains the identity element eG, we see that the previous statement is true, thereby justifyingthe assumption.

7. From the assumptions in (statement. 5) and (statement. 6), we see that the cosets of a subgroup ina group, are all disjoint and they cover all the elements of the group. In other words, they partitionthe group, with each partition being a coset. Notice that # elements in (G,) = |(G,)|. From(statement. 1) we have that # elements in a left coset = |(H,)|. Hence we see that # left cosets

needed to partition (G,) =|(G,)||(H,)| . This quantity is denoted by [(G,) : (H,)] and called the

index of (H,). the theorem named after this equation is called Lagrange’s Theorem.Index of a Subgroup: The index of a subgroup of a group is the # left cosets of the subgroup requiredto partition the group.

[(G,) : (H,)] =|(G,)||(H,)| (3.29)

Normal Subgroup: A normal subgroup (N, ) of a group (G,) is such that the right and left cosets of x in(N, ) are equal ∀ (G,).

Normal Subgroup (N, ) of (G,) : a N = N a ∀a ∈ (G,) (3.30)

3.1.3 Quotient Groups

We see clearly that the coset does not form a group. So we consider the set of all cosets of a normalsubgroup (left or right, it does not matter). Let (N, ) be a normal subgroup of (G,). Cosider the setS = a N |a ∈ (G,). We claim this set forms a group. To check this we have:

• Identity, Inverse and Associativity are trivially satisfies as the one of the cosets contain the identityand as (G,) is a group, the elements in it have thier also inverses.

• Closure: We need to show that: (a N) (b N) ∈ S ∀(a N), (b N) ∈ S.For this it suffices to show that (a N) (b N) = (c N) for some c ∈ (G,). Now notice that:

(aN) (bN) = a ((Nb) N)normal subgroups coset = (a b) (N N)

= (a b) N


DRAFT COPY


Now since a, b ∈ (G,) we have: (a b) ∈ (G,), thereby justifying the assumption.Hence we see that S forms a group under the operation. This group formed by the elements of S iscalled the quotient group of (G,) and is represented as: (G,) / (N, ).Quotient Group: The quotient group of (G,) is the set g N |g ∈ (G,) containing all the cosets(right or left) of its elements in of its normal subgroup. It is represented as (G,).More generally, since the cosets of elements in a subgroup partition the group, the quotient group isthe group formed by the partition of (G,) along with the operation defined analogously.

3.1.4 Normalizers and centralizers

For any two elements A, B of a group: (G,), we note that unless the group is Abelian, the result of thebinary operation of the two elements depends upon the order in which they are considered: AB = BA.The difference between these two inequal quantites is defined at the commutator between A and B.Commutator: The commutator for any two elements A and B of a group (G,) is defined as: [A, B] =A B −B A.From the definition of the commutator we can verify that ∀g1, g2 ∈ (G,): [g1, g1] = 0 and [g1, g2] = − [g2, g1].Based on this commutator we have the following sets associated with the group elements:Center of a group: The center of a group Z (G,) is the set of all elements in (G,) that commute with allthe elements in (G,). Hence Z (G,) = g ∈ (G,) |g x = x g∀x ∈ (G,).Since in an Abelian group, all the elements commute with the each other, we see that the centre of an abeliangroup is the group itself, i,e; Z (G,) = (G,) ∀ Abelian groups (G,).For any subgroup (of a group) we define the following two groups:Normalizer: The normalizer of a subgroup (H,) of (G,) is the set:

N (H,) =g| (g hi) g−1 ∈ (H,) ∀g ∈ (G,)

(3.31)

Centralizer: The centralizer of a subgroup (H,) of (G,) is the set:

Z (H,) =g| (g hi) g−1 = hi ∀g ∈ (G,)

(3.32)

We can now look at some properties of the centralizers and the normalizers of a subgroup.

1. Immediately one can see that the elements in Z (H,) form a subset of those in N (H,).

2. The centalizer of a subgroup forms a subgroup (of a group) of the underlying group, i,e; Z (H,) and(H,) are subgroups of (G,).Justification: Notice that the definition of the centralizer can also be given as the set of elements of thesubgroup that commute with each element in the group. Z (H,) = g|g h = h g,∀g ∈ (H,).With this denition we can verify the group properties of Z (H,).

• Identity: Trivially identity commutes with all the elements of the group and hence it is in Z (H,).

• Inverse: If x ∈ Z (H,) then x−1 ∈ Z (H,).Justification: We see that x y = y x, ∀y ∈ Z (H,). Notice that if It suffices to show:x−1 y = y x−1 ∀g ∈ (G,). Since x ∈ (G,) which is a group, ∃ x−1 ∈ (G,) such thatx x−1 = eG.

x y = y xx−1 (x y)

x−1 =

x−1 (y x) x−1

x−1 x

y x−1

=

x−1 y

x x−1

y x−1 = x−1 y

Hence justifying the assumption.

• Closure: a x ∈ Z (H,) ∀x, a ∈ Z (H,)Justification: It suffices to show that (a x) y = y (a x). Notice that since x, y and a are


DRAFT COPY

CHAPTER 3. ELEMENTARY GROUP THEORY 3.2. GROUP OPERATIONS

elements of Z (H,), they commute with all elements in (G,).

(a x) y = a (y x)= y (a x)

Hence justifying the theorem.

Hence from the above statements it can be seen that Z (H,) is group, moreover it is a subgroup of(G,).

Note that the center of a group is not to be confused with the centralizer of a subgroup in a group. Theformer is the set of elements in (G,) which commute with every element in (G,), while the latter is theset of all elements in (G,) that comuute with every element in the subgroup (H,). Hence the latter isdefined with respect to a subgroup unlike the former. However, both the center (of a group) as well as thecentralizer (of any subgroup in that group) are subgroups of the underlying group.

3.2 Group Operations

3.2.1 Direct product of groups

Direct product: The direct product of two groups (H,) and (K,), represented by (H,) × (K,) is agroup containng (h, k), ∀h ∈ (H,) , k ∈ (K,).We can define the group operations on (H,)× (K,) are:

(h1, k1) (h2, k2) = (h1 h2, k1 k2) (3.33)

The idenity element of this direct product group is the tuple containing the idenity elements of the individualgroups:(eH , eK). The inverse of an element is also the tuple containig the inverse of the correspoding elementsfrom the two groups (in the product).

3.2.2 Homomorphism

Homomorphism: For any two groups (G,) and (H,), a group homomorphism is a function f : (G,) →(H,) whose action on the elements of a, b ∈ G is given by:

f (a b) = f (a) f (b) (3.34)

Notice that it preserves the group structure, i,e; if the elements in G form a group, so do the elements inthe set H. We can now see some properties of f . Let eG and eH denote the identity elements of the groups(G,) and (H,) respectively.

Using (eq. 3.34): f (a eG) = f (a) f (eG)f (a) = f (a) f (eG)

(f(a))−1 f (a) = (f(a))−1 (f (a) f (eG))

eH =(f(a))−1 f (a)

f (eG)

eH = f (eG) (3.35)

Hence we see that the identity element of (G,) is mapped to the identity element of (H,). We now seethe mapping of the inverse of an element in (G,). From (eq. 3.34), for u ∈ (G,):

f (u) fu−1

= f (eG)

Using (eq. 3.35): = eH[f (u)]−1 f (u)

f

u−1

= [f (u)]−1 eH

∴ fu−1

= [f (u)]−1 (3.36)


DRAFT COPY

3.2. GROUP OPERATIONS CHAPTER 3. ELEMENTARY GROUP THEORY

Hence we see that f maps the inverse of every element to the inverse of its image in (H,).

Types of Homomorphisms

Isomorphism: It is a homomorphism f : (G,) → (H,), where f is one-to-one. Hence f−1 too is ahomomorphism.Automorphism: It is a iomorphism from a group onto itself: f : (G,) → (G,).Endomorphism: It is a homomorphism from a group onto itself: f : (G,) → (G,). Note: f is notone-to-one.

Kernel

We now consider the elements in (G,) that are mapped to the same element in (H,) by the homomorphism.As the identity element eH is unique to (H,), the properties of this element can be referrd to as the propertiesof the group. Hence, we consider all the elements in (G,) that are mapped to eH

2. The set containing allsuch elements is called the kernel of f , denoted as ker(f).

ker(f) = g ∈ (G,+q) |f(g) = eH (3.37)

The kernel is useful in associating a homomorphism to a set. That is we can can check the containment of anelement in a set by checking the action of the correspoding homomorphism (associated with that set) on theelement in question. When we talk of (G,) and (H,) being linear codes, or vector spaces over the field Fq

we see that , become addition modulo q: +q. Notice that the identity element here is the null vector. Sothe kernel of the homomorphism (in this case it is a linear map represented by a matrix) is the set of vectorsin (G,+q) that are mapped to the null vector in (H,+q). We now pick a linear map such that the kernel ofthis linear map is the linear code. The advantage of doing this is that we can quickly identify a code elementby checking if it gives the null vector upon action of this linear map. Such a linear map (represented by amatrix) is called the Parity Check Matrix of the linear code. It is used to check for errors in the codewords.If there is an error in the the codeword, the vector undergoes a translation such that the new vector no morebelongs to the vector space, and hence does not lie in the kernel or the pairity check matrix. Therefore uponaction of this matrix it will not give the null vector, thereby indicating the presence of an error.

3.2.3 Conjugation

Two elements of a group a, b ∈ (G,) are said to be conjugate to each other if ∃g ∈ (G,) such that:g a = b g. Restating the previous statement, we have: b is conjugate to a if ∃g ∈ (G,) such thatb = (g a) g−1. We can further formalize this by looking at the RHS of previous equation as a functionof a: b = f(a), where f(a) = (g a) g−1. This function, or automomorphism (as it takes an element in(G,) to itself) is called inner automorphism or conjugation.Conjugation or Inner-Automorphism: It is an automorphism defined by f : (G,) → (G,) such that forsome a, g ∈ (G,), f(a) = g

a g−1

.

We now consider the set of all elements (in a group) that are conjugate to a given element (from the samegroup). That is the set b|g a = b g, g ∈ (G,). Note that since a is the free variable in the definitionof this set, it must be indexed by a. We can also write this set as: Sa =

g

a g−1

|g ∈ (G,)

. Such

a set is called the conjugacy class of a.Conjugacy Class: Conjugacy class of an element (in a group) is the set containing all the elements from thatgroup which are conjugate to the given element.

Cl(a) =g

a g−1

|g ∈ (G,)

(3.38)

We can now look at some properties of this set:2Note that, from (eq. 3.35), eG is one of them.


DRAFT COPY

CHAPTER 3. ELEMENTARY GROUP THEORY 3.3. GROUP ACTIONS

1. Assumption: If g is an element of an Abelian group, ∀g ∈ (G,), #Cl(a) = 1, that is Cl(a) is asingleton set.Justification: To see this, we try to write the definitions of the abelian group, and the conjugacy classof an element, in the same form.

Cl(a) = b|g a = b g, g ∈ (G,)G = b|g b = b g, g ∈ (G,)

Notice that the conjugacy class of an element from an Abelian group has t satisfy Observing the twoconditions in the above definitions, we see that the only ’solution’, or satisfying assignment for a (it is theonly free variable) is a = b. Hence we see that: Cl(a) = b|g a = b g, a = b, g ∈ (G,) ⇒ a.Hence we justify our assumption.

3.3 Group Actions

3.3.1 Generating set of a group

As the elements of a group satisfy certian properties, given a set X it must be possible to construct (ormechanically generate from this set) a subset following these properties. In otherwords, we can take a set ofelements and then generate a group such that group properties are satisfied by construction. Such a set iscalled a generating set of a group and the elements of this set are called generators.Generating Set of a Group: A generating set of a group (G,) is a set X such that every element of (G,)can be expressed as a combination (using ) of finite subset of X. We donte it by: (G,) = X.Since a group is also a set of elemets, it can also be used to generate another group. More generally we canuse more than one group to generate a single group. That is elements in groups (H,) , (G,) can be usedto generate the elements of (G,) In this case we say that (G,) is a direct sum of subgroups (H,) and(K,). But for this to be possible notice that (H,) has to be a normal subgroup of (G,).Direct sum of Groups: A group (G,) is a direct sum of groups (H,) and (K,), represented as (G,) =(H,) ⊕ (K,) if the generating set of (G,) ∈ H ∪ K, and (H,) and (K,) are normal subgroups of(H,).

3.3.2 Symmetric group

We now try to explore the properties of a given set X using groups. By exploring the properties, we meanto look at all possible relations between the elements of the set. For this we need to consider the set of allfunctions h : X → X, since each function relates one element of X with another. For the sake of simplicity,we avoid relations between a given element and many other elements. Hence we only consider all possibleone-to-one functions (or bijections). This set containing all possible bijections from X to X also forms agroup under the composition (of two functions) operation.

Symmetric Group: A symmetric subgroup on a set X is a group formed by the set G cotaining all possiblebijections f : X → X, under the binary operation which denoted the composition of two functions. Thisgroup (G, ) satisfies the group properties.

We can now verify the group properties of this symmetric group:

• Closure: ∀f, g ∈ (G, ) (f g) ∈ (G, ).For this it suffices to show that (f g) is also a bijection. Notice that ∀x ∈ X, (f g) (x) = f (g(x)).Since g is a bijection g : X → X we see that g(x) is injective and surjective [g (X) = (X)]. Hence allvalues of g(x) are distinct (∀ distinct x ∈ X). As f is again a bijection from X to X, it will take eachof these distinct g(x) (corresponding to distinct x) to some f(x) ∈ X. Therefore all values of f(x) aredistinct (∀ distinct x ∈ X). Hence f g is injective. As f(X) = X, we see that f (g(X)) = X. Hencef g is both surjective and injective, thereby bijective, hence justifying the assumption.


DRAFT COPY

3.3. GROUP ACTIONS CHAPTER 3. ELEMENTARY GROUP THEORY

• Associative: ∀α,β, γ ∈ (G, ), α (β γ) = (α β) γ.The above statement is true as we have: [α (β γ)]x = [(α β) γ]x = α (β (γ(x))) ∀x ∈ X. Thisis in general true for any three functions. Hence we justify the assumption.

• Identity: ∃eG ∈ (G, ) such that ∀f ∈ (G, ), eG f = f eG = f .It suffices to show that ∃ a bijection eG such that ∀f ∈ (G, ) and ∀x ∈ X, we have: f (eG(x)) =eG (f(x)) = f(x). We can now see that eG is nothing but the identity map: eG ≡ IX , which clearly isa bijection and hence ∈ (G, ). Hence we justify the assumption.

• Inverse: ∀f ∈ (G, ) ∃g ∈ (G, ) such that f g = g f = eG.It suffices to show that ∀f ∈ (G, ) ∃g ∈ (G, ) such that ∀x ∈ X we have: f (g(x)) = g (f(x)) = x.We can now see that g is nothing but the inverse map: f−1, which clearly is a bijection since f is abijection. Hence we see that there is an inverse in (G, ) for every element in it, thereby justifying theassumption.

3.3.3 Action of Group on a set

In the above section (sec. 3.3.2) we considered a group of bijections on a set X. Now we consider a generalgroup (Q,) and a homomorphism: h from (Q,) to the symmmetric group of X: (G, ). Every element of(Q,) is mapped to a bijection on X. As a result we can describe the action of the bijection (on some x ∈ X)as the action of the element of (Q,) (which has been mapped to this bijection by the homomorphism h).Therefore the homomorphism is defined as: h : (Q,) → (G, ) such that:

∀q1, q2 ∈ (Q,) , h (q1 q2) = h (q1) h (q2) (3.39)

Note that h(q) is a bijection ∀q ∈ (Q,), that is in the above expression: h (q1) : X → X. The homomorphismmaps the identity of (Q,) to identity element of (G, ) the inverse of every element in (Q,) is mapped tothe inverse of the corresponding element’s map.

h (eQ) = IX

∀q ∈ (Q,) , hq−1

= (h(q))−1

Now when we want to describe the operation [h (q1)] (x) for x ∈ X, we denote it as action of q1 on x: q1·x. Sim-ilarly [h(q)] (x)|q ∈ (Q,) can be denoted as: q · x|q ∈ (Q,) and hence Q·X = q · x|x ∈ X, q ∈ (Q,).This operation is called action of a group on a set, or group action. The group action can be described asa function that takes an element of (Q,) and an element of X, giving another element of X. Hence thegroup action on a set is described as another homomorphism from (Q,)×X to X.Group Action: The Group Action of (Q,) on a set X is defined as a homomorphism: f : (Q,)×X → Xsuch that: for x ∈ X, q ∈ (Q,) we have: f (q, x) = q · x which represents q · x ≡ [h(q)] (x) where h is asdefined in (eq. 3.39).

3.3.4 Orbits and Stabilizers

We now need to look at the geometric picture by considering X to be a set of points (in R, C, or anything.)and the action of elements q ∈ (Q,) (which is the action of h(q) as defined in (eq. 3.39). When q · x = x

where x ∈ X we say that x has been transported to the point x and the path taken by x is the set x, x.Consider the set (Q,) · x = q · x|q ∈ (Q,). This says all the points which can be obtained by acting theelements of (Q,) on x. In other words it gives the path traced by the point x ∈ X upon action of elementsin (Q,). This path represented by the set of points is called the Orbit of x, denoted by O(Q,)(x).

Orbit : Orbit of an element x ∈ X : O(Q,)(x) = (Q,) · x ≡ q · x|q ∈ (Q,) (3.40)

Since x is a finite set, it may happen that the path traced by x contains x itself. Suppose the path taken byx has points x, x1, x2, . . . xm, x . . . xn, x . . . xk, x . . . we have:


DRAFT COPY

CHAPTER 3. ELEMENTARY GROUP THEORY 3.3. GROUP ACTIONS

xq1−→ x1

q2···qm−1−−−−−−−−→ xmqm−−→ x

qm+1···qn−1−−−−−−−−−−→ xnqn−→ x

qn+1···qk−1−−−−−−−−−→ xk → . . . which can now be rewrittenas: x

q1q2···qm−1qm−−−−−−−−−−−−−−→ xqm+1···qn−1qn−−−−−−−−−−−−−→ x

qn+1···qk−1...−−−−−−−−−−−→ . . . . We now see that the set of opera-tors: Sx (Q,) = q1 q2 · · · qm−1 qm, qm+1 · · · qn−1 qn, q2 · · · qk−1 . . . leave the point xinvariant. Now since q1, q2, . . . , qm, . . . , qn, . . . ∈ (Q,) which s closed under , we see that the operatorsin Sx (Q,) are also in (Q,). Moveover, the identity operator eQ ∈ Sx (Q,) (since it corresponds tothe identity mapping) and since each of these operators correspond to a bijection (on X), an inverse can bedefined easily which also is a bjection that leaves x invariant. Hence Sx (Q,) contains the inverse of elementelement in it. Note that the set Sx (Q,) is closed under , has an identity element and evey element in thisset has its inverse in the same set. Therefore the set Sx (Q,) along with the operation forms a group,and trivially a subgroup of (Q,). This subgroup is called the Stabilizer Subgroup of (Q,).Stabilizer Subgroup: The stabilizer subgroup Sx (Q,) (of a group) consists of elements that leave an elementx ∈ X invaraint.

Sx (Q,) = q ∈ (Q,) |q · x = x (3.41)

3.3.5 Orbit Stabilizer theorem

We now have a theorem similar to lagrange’s theorem in (eq. 3.29), relating the sizes of O(x), Sx (Q,) and(Q,).

Theorem: For any group, the number of elements in the orbit, with respect to any element in the set and thestabilizer subgroup (of this group) with respect to the same element of the set are related by:

|O(x)|× |Sx (Q,)| = |(Q,)| (3.42)

Proof: Let us denote Sx (Q,) ≡ S(x). Consider the slight modification of S(x) (eq. 3.41) as:

Hy(x) = q ∈ (Q,) |q · x = y (3.43)∴ Hx(x) = S(x) (3.44)

We now claim that the sets Hy(x) for different y are disjoint.Assumption: ∀y1, y2 ∈ X, y1 = y2 the sets Hy1(x) and Hy2 are disjoint.Justification: It suffices to show that if ∃q ∈ Hy1 ∩ Hy2 , then y1 = y2. This is true since in the former casewe will have: q · x = y1 and q · x = y2, which clearly implies y1 = y2, thereby justifying the assumption.We now have: (Q,) =

y∈O(x)

Hy(x) and hence:

|(Q,)| =

y∈O(x)

|Hy(x)| (3.45)

We now claim that each set Hy(x) for every value of y ∈ X has the same cardinality which can be equatedto that of #Hx(x) which is, from (eq. 3.44): |S(x)|.Assumption: For every y ∈ X, #Hy(x) = |S|.Justification: It suffices to show that ∃ a bijection f : S → Hy(x), ∀y ∈ X. We now try to construct thisbijection. For a fixed t ∈ Hy(x) and ∀h ∈ S define the bijection as: f(h) = th. Notice that (t h) ∈ Hy(x)since (t h) · x = t · (h · x) which from (eq. 3.41) = t · x ⇒ y using (eq. 3.43). To show that f is a bijection,we need to show that f is injective and surjective.Injective: For f (h1) = f (h2), then t · h1 = t · h2 which implies h1 = h2. So if h1 = h2 then f (h1) = f (h2).Surjective: We need to show that every element in Hy(x) is covered in the image of f , for some x ∈ X. Itsuffices to show that every element in Hy(x) can be represented in the form of the image of f . Equivalently


DRAFT COPY

3.3. GROUP ACTIONS CHAPTER 3. ELEMENTARY GROUP THEORY

we can show that ∀u ∈ Hy(x), ∃t ∈ Hy(x) such that t−1 · u ∈ S.

From (eq. 3.43) : u · x = y & t · x = y (3.46)

∴ x = t−1 · yUsing (eq. 3.46) : = t−1 · (u · x)

∴ x =t−1 · u

· x

Now from (eq. 3.41) we have:t−1 · u

∈ S(x). Hence we have the justification.

Now we see that f is a bijection and ∀y ∈ X,#Hy(x) = |S|. In (eq. 3.45) we can replace the sum by aproduct as all the entities being summer over have the same value. We then have the statement as in thetheorem (eq. 3.42). Hence we have proved the theorem.


DRAFT COPY

Part III

PREREQUISITES - QUANTUMMECHANICS

65

DRAFT COPY

DRAFT COPY

Chapter 4

Identical Particles

4.0.6 Describing a two state system

Consider the system comprising of 2- two state systems. For example, consider a system of two electrons.The state of the two state system is given by the tensor product of the individual states of the system. Letvectors |k1 and |k2 represent the individual states of the systems (1) and (2) respectively. (From now, itis implicit that the subscript denotes the particle number.) The state of this composite system is given by|k1k2 or, equivalently, |k2k1. Physically, we see no reason as to why we must prefer one to the other, buthowever mathematically they are orthogonal states.

k1k2|k2k1 = δk1,k2 (4.1)

So, it is now evident that if we are given a state of the system, we do not know a priori whether the systemis in state |k1k2 or in |k2k1. (In other words, if we are told that the state of the composite system is |a, bthen, we do not know whether the state of the first system is |a or |b.). More generally, by the principleof superposition, the state of the two state system can be any linear combination of the states |k1k2 and|k2k1:

|ψ = c1|k1k2+ c2|k2k1 (4.2)

Now, when a measurement (given by some measurement operator) is performed on this composite systemgiven by |ψ, the eigenvalues produced by the two states |k1k2 and |k2k1 will be identical (since eigenvaluesare just numbers and k1k2 = k2k1). So, now different eigenkets of |ψ will have the same eigenvalues, therebyintroducing degeneracy into the system. This is called the exchange degeneracy.

4.0.7 Permutation operator

In the previous subsection, we said that we could describe the same composite system using two orthogonalstates. If the state of the system is |k1k2, and we interchange the particles 1 and 2, then we get the state|k2k1. So, the system is physically the same as before. To do this exchange of particles, we define an operatorcalled the permutation operator, with the following property:

P21|k1k2 = |k2k1 (4.3)

From the above definition, it is evident that P21 ≡ P12. Also,

P21P21|k1k2 = P21|k2k1 ⇒ |k1k2 ∴ (P21)2 = I

Also since (P21)2 = I, and the eigenvalue of I is 1, we may say that the eigenvalues of P21 are ±1. P12 is also

Hermitian.

67

DRAFT COPY

CHAPTER 4. IDENTICAL PARTICLES

Hence, the permutation operator changes the state of particle 1 to |k2 and that of particle 2 to |k1. Let usnow take some operator T where T = T1 ⊗ T2. The action of T is defined as:

T1|t1t2 = t1|t1t2 (4.4)T2|t1t2 = t2|t1t2 (4.5)

Now, applying P12 on both sides of (eq. 4.4), we have:

P12T1|t1t2 = t1P12|t1t2Since P12 is unitary (very easy to check from above assumptions), we have:

P12T1P†12P12|t1t2 = t1P12|t1t2

P12T1P†12|t2t1 = t1|t2t1

Now, on comparing the above equation with (eq. 4.5), we obtain the relation:

P12T1P†12 = T2 (4.6)

This shows that the permutation operator P12, can permute the particle number of the operators as well.Let us now take a general Hamiltonian describing the two state system:

H =p21

2m+

p22

2m+ Vint(|x2 − x1|) + V1(x1) + V2(x2) (4.7)

Let us now see the action of the permutation operator, or in other words, the change in the Hamiltonian ofthe composite system under the exchange of the two particles. So, we have:

P12HP †12 = P12

p21

2mP †

12 + P12p22

2mP †

12 + P12V (|x2 − x1|)P †12 + P12V1(x1)P †

12 + P12V2(x2)P †12

⇒ p22

2m+

p21

2m+ Vint(|x1 − x2|) + V2(x2) + V1(x1) ⇒ H

∴ P12HP †12 = H (4.8)

Hence, we see that the Hamiltonian of the composite does not change under the exchange of the two particles.

Hence, [H,P12] = 0, andd

dtP21 = 0. Hence, P12 is a constant of time.

4.0.8 Symmetry and Asymmetry in the wave functions

From (eq. 4.8), we see that, P12 is a constant at all times. This means that if the action of P12 on |ψ isknown initially, then it is known for all times. We still have the physical requirement that when the stateproduced on acting P12 on any of the states must not be different from P12. With this physical requirement,we see that only those states which are invariant under the action of P12 are physically consistent. Hence,we must look for the eigenstates of P12. We already know that the eigenvalues are ±1. The correspondingeigenstates of P12 are:

|ψ+ =1√2

(|k1k2+ |k2k1) (4.9)

|ψ− =1√2

(|k1k2 − |k2k1) (4.10)

These two eigenstates are the only possible states that are physically valid out of all the states given in (eq.4.2). Therefore, any composite system can only exist in one of the two states.

The eigenstate |ψ+ is such that, if we exchange the two particles of the composite system (exchange theparticle indices k1 and k2), then the state remains the same. In other words, the state of the system issymmetric under the exchange of the two particles. On the other hand, the eigenstate |ψ− is such that, ifwe exchange the two particles in the system, the state of the composite system picks up a negative sign. Inother words, the system is asymmetric under the exchange of the two particles.


DRAFT COPY


4.0.9 Extending to many state systems

We may now extend the idea of the two state system to a system comprising of N identical particles. Thestate of the system can now be given by any permutation of:

|ζ = |k1k2k3 . . . kiki+1 . . . kjkj+1 . . . kn (4.11)

The permutation operators Pij are defined:

Pij |ζ = |k1k2k3 . . . kjki+1 . . . kikj+1 . . . kn (4.12)

(Exchanges the particles i and j in the system.). Here again, the system should be physically the sameunder any of the permutations and therefore, we must consider the only the eigenstates of the permutationoperators as the possible states of the composite system. These eigenstates again are the symmetric and theasymmetric states.

4.0.10 Bosons and Fermions

Let us take the systems that are described by the asymmetric wave-function |ψ−. If the two particles in thecomposite system were in the same state, that is k1 = k2, then we see that the two terms in the wave-functioncancel each other. Hence if the two particles are in the state, the wave-function vanishes. A more physicalinterpretation of the scenario is that, we cannot expect both the particles to be in the same state. These”particles” are called fermions and the famous rule that no two fermions can be in the same quantum state iscalled the Pauli Exclusion Principle. The statistics used to study fermions is called the Fermi-Dirac statistics.

On the other hand, let us examine the systems given by the symmetric wave-function |ψ+. Here, if thetwo particles constituting the composite system are in the same quantum state, i,e; k1 = k2, then we see that|ψ+ = |k1k1 = |k2k2. Therefore, two quantum systems can in fact be in the same state. We can also gener-alize this (without proof) to the statement that any number of particles can occupy the same quantum state.There ”particles” are called Bosons and the statistics used to study them is the Bose-Einstein statistics. Animportant consequence of this is the Bose-Einstein condensation where, at absolute zero temperature, allthe quantum states of the system come into a single state (of the lowest energy), and as a result, this energystate becomes macroscopically large.


DRAFT COPY



DRAFT COPY

Chapter 5

Angular Momentum

Any Quantum system has to be identified with some property that is conserved in that system. This is truefor classical systems as well. In many cases we define a quantum system by its total Hamiltonian (as its totalenergy remains conserved). We can also consider another observable, the total Angular Momentum. Itis represented by J. Every quantum system has a total angular momentum associated with it (just like howevery quantum system has a position and momentum associated with it). Also just like X and P, J is also anvector observable (unlike the Hamiltonian, which is a scalar). In the cartesian system of coordinates

−→J = Jxi + Jy j + Jz k (5.1)

Going back to our classical concepts, we se that the angular momentum is defined as:

−→L = −→r ×−→p (5.2)

The same definition still carries on to the quantum case. The difference is that all those quantities those arevariables in the classical case (eq. 5.2) are now operators. The quantum case becomes:

−→J = −→x ×−→p (5.3)

From expanding the above equation (eq. 5.3) we can extract the relation between the total angular momentumand the known observable. Let us now solve the cross product by considering the components or the physicalquantities. To solve the cross product we may consider the determinant form of the cross product.

−→A ×−→B =

i j kAx Ay Az

Bx By Bz

Solving for−→J , we get:

−→J = (ypz − zpy )i + (zpx − xpz)j + (xpy − ypx)k

On equating the components of−→J , we get:

Jx = (ypz − zpy) (5.4)

Jy = (zpx − xpz) (5.5)Jz = (xpy − ypx) (5.6)

The above three equations (eq. 5.4, eq. 5.5 and eq. 5.6) represent the various components of the angularmomentum operator. From this we can extract more about the operators. Firstly we should see if they

71

DRAFT COPY

CHAPTER 5. ANGULAR MOMENTUM

commute. Let us start by finding the commutator of Jx and Jy operators. For this we need to considerequations (eq. 5.4, eq. 5.5 and eq. 5.6).

[Jx,Jy] = [(ypz − zpy), (zpx − xpz)]

⇒ [ypz, zpx]− [ypz,xpz]− [zpy, zpx] + [zpy,xpz]

To solve the above commutations we need to use the expansions:

[A, BC] = [A, B]C + B[A, C][AB, C] = A[B,C] + [A, C]B

From the above two expressions,we can make a new commutation expansion for [AB,CD] which shall beuseful in evaluating the terms of the above expression:

[AB, CD] = [AB, C]D + C[AB, D]⇒ (A[B,C] + [A, C]B)D + C(A[B,D] + [A, D]B)⇒ A[B,C]D + [A, C]BD + CA[B,D] + C[A, D]B

Therefore, we write:

[AB, CD] = A[B,C]D + [A, C]BD + CA[B,D] + C[A, D]B (5.7)

Using the above equation (eq. 5.7) we can see how each term becomes. In each term we can make the rightsubstitutions for A, B, C and D. Then we get the results:

Table 5.1: examining the terms of the commutatorTerms [B,C] [A,C] [B,D] [A,D] Resultterm 1 −i 0 0 0 A[B,C]D ≡ −iypx

term 2 0 0 0 0 0term 3 0 0 0 0 0term 4 0 0 0 i C[A,D]B ≡ ixpy

Therefore, the result is: i(xpy − ypx). If we refer back to (eq. 5.6), we can see that this is the expressionfor Jz, which is the z-component of the total angular momentum (eq. 5.1). Therefore, we have the relation[Jx,Jy] = iJz. Similarly, we can work out the other relations too as [Jy,Jz] = −iJx and [Jx,Jz] = −iJy.The relations can be summarized as:

[Jp,Jq] = ipqrJr (5.8)

The equation above (eq. 5.8) gives the summary of all the commutation relations discussed above. Let us nowgo back and see why we introduced

−→J . We said that we would like to associate a total angular momentum

to a quantum system in order to characterize it. Now we have seen some way of identifying a system. Aquantum system may also exist in various phases or forms. These various forms or phases of a quantumsystem are called the states of a quantum system.

−→J does not say anything about the states in which the

quantum system may exist. To identify the states, we need some property that is different for each of theindividual components 1 and that can be compared with

−→J . For this, we can take some component of the

total angular momentum. Take for example the z-component (this is just a convention) Jz. The number ofvalues of Jz will tell us how many states the system exists in, because each state will have a contributionto Jz. So, on comparing

−→J and Jz for a system, we can see how many states the system exists in. There

is another minor problem before we proceed. The two properties−→J and Jz can be compared. When we say

that two properties can be compared, we mean that there exists a common basis for the two operators. Inother words, the two operators commute. Comparing the two properties

−→J and Jz is slightly weird because

1If it is not different, the states shall be indistinguishable.


DRAFT COPY


the former is a vector and the latter is a scalar. So, let us consider J2 instead of the vector−→J . This would

make the comparison logically right. Now we have:

J2 = J2x + J2

y + J2z (5.9)

From the above expression itself, it is clear that [J2,Jk] = 0, ∀k ∈ x, y, z. Since J2 commutes with Jz,these two operators can be diagonalized simultaneously. This means they have simultaneous eigen kets. Letus consider the equation:

J2|α,β = α|α,β (5.10)Jz|α,β = β|α,β (5.11)

Here |α,β denotes the simultaneous eigen ket of J2 and Jz, with different eigen values. We know J2 is aproperty of a system that exists in various states, each state corresponding to a value of Jz. If we expectthat a particular system must exist in some given number of states, then we must expect those many valuesof Jz, (corresponding to each state), for a single value of J2 (corresponding the system as a whole). So, inmathematical terminology, if we expect a system to exist in n different quantum states, then we must expectn different eigen values for Jz (corresponding to each state) for a single eigen value of J2 (corresponding tothe system). Going back to the above eigenvalue equations (eq. 5.10 and eq. 5.11), we can now claim that:for every α, there shall be a number of values of β. The number of such β’s for a given α shall tell us thenumber of states that the system exists in. So, till now we have argued with physical reasons. We can try toverify this mathematically too. From (eq. 5.11) we have:

J2z|α,β = βJz|α,β ⇒ β2|α,β (5.12)

Therefore, on subtracting the equations (eq. 5.12) from (eq. 5.10), we get:

(J2 − J2z)|α,β = (α− β2)|α,β

⇒ α,β|(J2 − J2z)|α,β = α,β|(α− β2)|α,β

On the LHS, from equation (eq. 5.1) we can write (J2 − J2z) as (J2

x + J2y). The RHS, shall become (α− β2)

if we assume that the simultaneous eigenkets are normalized. Hence, we have:

α,β|(J2x + J2

y)|α,β = (α− β2)

The LHS of the above equation is the expectation value of the a positive operator. The LHS is therefore apositive definite quantity. Hence, we can write that:

0 ≤ (α− β2)

⇒ β2 ≤ α (5.13)

Therefore, from the above equation (eq. 5.13), we can see that the value of β2 is bounded by α. This meansthere are a finite number of states for a finite value of α. In other words, there are two distinct bounds for thevalue of β. Let the extreme values of β be denoted as βmin and βmax. But to examine all the states (valuesof β between βmin and βmax) we need to construct some operator that can help us to traverse through eachof the states from βmin up to βmax and vice-versa. A higher value of β corresponds to a higher value ofangular momentum in the z-direction. We call call this a higher state.Let us now define two operators that take us to a higher and a lower state respectively, from a given state.These are the raising and lowering operators represented as J+ and J−. These operators are definedas:

J+ = Jx + iJy (5.14)J− = Jx − iJy (5.15)


DRAFT COPY


The action of these operators can be defined as:

J+|α,β = c+|α, (β + ) (5.16)J−|α,β = c−|α, (β − ) (5.17)and more importantly:

J+|α,βmax = 0 (5.18)J−|α,βmin = 0 (5.19)

We can just pause for a moment and explore the mathematics of the raising and lowering operators. Let uslook at some commutation relations:

[J+,J−] = [(Jx + iJy), (Jx − iJy)]⇒ RHS : [Jx,Jx] + i([Jy,Jx]− [Jx,Jy]) + [Jy,Jy]using equation (eq. 5.8) ⇒ 0 + i(Jz + Jz) + 0

⇒ [J+,J−] = 2iJz (5.20)

Similarly, we can see:

[Jz,J+] = [(Jz, (Jx + iJy)]RHS : [Jz,Jx] + i[Jz,Jy]

⇒ iJy + Jx

⇒ [J+,Jz] = iJ+ (5.21)similarly, we also have: [J−,Jz] = iJ− (5.22)

There is another important relations that we should work out as we would be using it shortly. Let us simplifythe products J+J− and J−J+:

J+J− = (Jx + iJy)(Jx − iJy)

RHS : J2x + i(JyJx − JxJy) + J2

y

with equation (eq. 5.9), we can replace J2x + J2

y with J2 − J2z

⇒ J2 − J2z + i(JyJx − JxJy)

⇒ J2 − J2z + i[Jy,Jx]

with equation (eq. 5.8) we have: J+J− = J2 − J2z + Jz (5.23)

Similarly, J−J+ = J2 − J2z − Jz (5.24)

Hence with this many commutators, we can proceed to see what the ladder operators (the raising and thelowering operators) do to the states of a system and how to find the number of states in which a systemexists. Let us start by the assumption made in the equation (eq. 5.18):

J+|α,βmax = 0

since any operator acting on 0 would give 0, we can have:

J−J+|α,βmax = 0

from equation (eq. 5.24): (J2 − J2z − Jz)|α,βmax = 0

on expanding: (α− β2max − βmax)|α,βmax = 0

We know that |α,βmax cannot be a null ket since it represents a state of some system. So expressionpreceding the ket must be equal to 0. On equating the eigen value to 0, we have:

α− β2max − βmax = 0

⇒ α = βmax(βmax + ) (5.25)


DRAFT COPY


Similarly, on considering equation (eq. 5.23), we have:

α = βmin(βmin + ) (5.26)

On comparing the above two equations (eq. 5.25 and eq. 5.26), we see that:

βmax = −βmin (5.27)

We know that |α,βmin is the lowest possible state and |α,βmax is the highest possible state. The raisingoperator can be applied repeatedly to the lowest state to get it to the highest one. Each time we apply theraising operator, we get to a higher state. So if we apply the raising operator n times (where n is the numberof states) to |α,βmin, we reach |α,βmax. Each time we act a state with J+, the value of β increases by .Therefore:

βmax = βmin + n n → number of states

⇒ from equation (eq. 5.27): βmax =n2

To eliminate the factor of , Let us define: j =βmax

⇒ j =

n

2(5.28)

Since n, the number of states is an integer, j is either an integer or a half integer. From equation (eq. 5.25),

we can get the form of α. On substituting βmax =n2≡ βmax = j in equation (eq. 5.25), we have:

α = 2j(j + 1) (5.29)

On referring equation (eq. 5.13), we have all the possible values of β. We know that β is related to α by apower half law. β ≤ |α|. Since α has a factor of 2, the expression of β must have a factor. The rest is justa constant. Hence, we may write:

β = m (5.30)

where m is some number, depending on the value of j.On rearranging the above equation, we getβ

= m.

Even from the equation (eq. 5.28), we haveβmax

= j. We discussed earlier that for every βmax, β takesall values from −βmax (which is βmin) to βmax. So, for every value of βmax, β takes (2βmax + 1) values.Hence, the same relation can be said about j and m also since they are equivalent to βmax and β up to someconstant. Hence, for every value of j, we have (2j + 1) values for m and m takes only integer values.

−j ≤ m ≤ j (5.31)

Now, the number of values for m shall determine the number of states in which the system (given by the valueof j) shall exist. Now the simultaneous eigenstates of Jz and J2 can now be labelled as |j,m. Therefore, wehave:

J2|j,m = j(j + 1)2|j,m (5.32)Jz|j,m = m|j,m (5.33)

Where j tells us about the total angular momentum of the system and the number of values of m give thenumber of states in which the system exists.

For any general quantum system, the total angular moment of the system is given by:

J = L + S (5.34)

where: J is the total angular momentum, L is the orbital angular momentum, S is the spin angularmomentum. If the wave function associated with the quantum system is a scalar function, the J = S. Ifthe wave function is a vector function, then we use the relation given in equation (eq. 5.34).

We can relate the above equation for determining the number of states in which a system shall exist in, tothe spin of the quantum system.


DRAFT COPY



DRAFT COPY

Part IV

PREREQUISITES -COMPUTATION

77

DRAFT COPY

DRAFT COPY

Chapter 6

Introduction to Turing Machines

6.0.11 Informal description

A turing machine is an extension of a push down automaton. A push down automaton, if we recall, had afinite length stack to store temporarily some alphabets. Some of the restrictions that we faced in case of apush down automaton were that: Only the topmost element of the stack can be accessed, and on reading thatalphabet, it is erased from the stack. That is, the position of the read write head is always on the topmostelement of the stack. In case of the turing machine, we have a “tape” instead of a stack. As a result, theread write head can be moved anywhere in the tape to access the elements of the tape.The read-write head can be moved anywhere in the tape and it moves one cell at a time. Apart from this,it has a finite control (finite number of states and transition functions) , just like a PDA, that controls theposition of the read write head.

6.0.12 Elements of a turing machine

To give a description of a turing machine, we must provide its basic elements. The turing machine is describedusing a 9-tuple, as given in the figure below:

M = (Q,Σ,Γ,,, δ, s, t, r) (6.1)

• Q = finite set of states

• Σ = finite set of alphabets

• Γ = finite set of tape alphabets (since the input alphabets can also be written to the tape, this setcontains Σ). ∴ Σ ⊂ Γ

• = left end marker (to denote the end of the tape, so that the read write head does not go outsidethe tape. This is not a part of the alphabet and is a property of the tape). ∴ ∈ Γ− Σ

• δ = Transition function. The transition function of a machine tells where the input read write mustmove to (can only move one cell to the right or left, so it gives the direction in which the read writehead must move), based on the state at which the machine is in and the contents of the tape at itscurrent position. The transition function for a turing machine is given as:

δ : Q× Γ× ⇒ Q× Γ× L, R (6.2)

where: L, R stands for the direction in which the read write head must move. (’1’ stands for rightand ’L’ stands for left). For example,

δ(p, a) = (q, b, R) (6.3)

79

DRAFT COPY

CHAPTER 6. INTRODUCTION TO TURING MACHINES

means that “when the machine is in state ’p’ and read write head reads ’a’ on the tape, then: it mustgo to state ’q’ ,the read write head must write ’b’ on to its current cell (over-righting ’a’), and moveone cell to the right on the tape”.

• s = start state. s ∈ Q.

• t = final state. f ∈ Q.

• r = reject state. r ∈ Q.

In addition to the above, there are certain restrictions which need to be met:

1. The left end marker that denotes the end of the tape, must never be over-written (is it is then theread write head may go outside the tape). Therefore, if there is some transition that involves the leftend-marker, then it must write that end-marker back to the tape: (where d ∈ L, R)

δ(p,) = (q,, d)

This effectively means that we cannot over-write .

2. We also require that once a machine enters a final or a reject state, it can never exit. This requirementcan again be formalized as in the previous case by stating that: ∀b ∈ Γ, ∃c, c and d, d ∈ L, R suchthat:

δ(t, b) = (t, c, d)

which states that if there is some transition function that provides the next move for a machine inits final state, then the next move will be such that the system still continues to be in its final state.Similarly,

δ(r, b) = (r, c, d)

Figure 6.1: Figure showing the elements of a Turing machine and its pictorial view


DRAFT COPY


6.0.13 Configurations and Acceptance

Configuration of a Turing machine

A configuration of a turing machine is a set or a tuple of elements that can accurately describe the state ofthe turing machine. A configuration of a turing machine should describe the values of the free variables ofthe turing machine. Therefore, a configuration at some instant of time can be given by: [(the contents of theread-write tape) , (state of the turing machine) , (position of the read-write head on the input tape)]. Weneed to see in more detail about representing some of these quantities. Firstly, the position of the read-writehead can be given as an integer (≥ 0). The state of the turing machine can also be represented normally assome index. To see how to represent the contents of the tape, we must first see how the tape seems like. Thetape, at any instance of time seems like an semi-infinite (bounded on the left side) string that has a finitenumber of elements that belong to Γ∗ (these are the ones that are of our interest) followed b an semi-infinitesequence of blanks: . That is:

Input tape: y . . . ⇒ yω

where: y ∈ Γ∗. Therefore, formally, we define a configuration to be an element of the set of tuples. There-fore,

Configuration: ⊆ ( QStates

× y ω |y ∈ Γ∗Contents of the tape

× NPosition of read-write head

) (6.4)

The start configuration of the read-write head of the turing machine will be at the first index of the tape: 0,scanning the left end-marker. Its state will be the start state of the turing machine: s, and the contents of theread-write tape will have the left end-marker, its input and a semi infinite sequence of blanks: xω.

Start configuration: (s, xω, 0) ; x ∈ Σ∗

Similarly, one can also define an accept and a reject configuration:

Accept configuration: (t, yω, 0) ; y ∈ Σ∗Reject configuration: (r, yω, 0) ; x ∈ Σ∗

The running of the turing machine can now be given as a set of such tuples where consecutive tuples arerelated by some transition function. To denote that a configuration (p,q,0) is related to (e,s,1) through sometransition, we write that:

(p, q, 0) 1−→M

(e, s, 1) (6.5)

It means that there exists some derivation in M that is of unit length which can take (p,q,0) to (e,s,1).We now need some inductive way of defining the “next” configuration of the turing machine, in terms of its“previous” configuration. A “next” configuration after reading an alphabet from the tape is given by:

(p, z, n) 1−→M

(q, sn

b (z), n− 1) if δ(p, zn) = (q, b, L)(q, sn

b (z), n + 1) if δ(p, zn) = (q, b, R)(6.6)

where:z = string that denotes the contents of the tapezn = nth alphabet of the string “z” sn

b (z) = string produced when the nth alphabet in “z” is replaced by ’b’.The meaning of (eq. 6.6) is quite clear. It says that: if the machine is initially in state q, its read-write headreads the symbol zn, at position n, on the tape, and the transition function requires the state to be changedto q, b to be written to the current position of the tape (over-writing zn) and the read-write head to moveleft (to position n− 1); then the new configuration will be:(new state , new contents of the tape formed by replacing the nth symbol in z by b , n − 1). Similarly, wecan try to see the other case also.


DRAFT COPY


Acceptance

We can now extend the concept of 1−→M

by defining its reflexive transitive closure: ∗−→M

inductively:

• α0−→M

α

• αn+1−−−→M

γ if αn−→M

β ; β1−→M

γ

• α∗−→M

β if αn−→M

β for some n ≥ 0

We can now use ∗−→M

to define the notion of acceptance and rejection. Intuitively, by acceptance of a string x

(x ∈ Σ∗) we mean that ∃ some derivation from the start configuration that can upon reading the alphabetsof x can lead to the accept configuration. Hence, for a string x ∈ Σ∗

acceptance: (s, xω, 0) 1−→M

(t, yω, n) x ∈ Σ∗ (6.7)

rejection: (s, xω, 0) 1−→M

(r, yω, n) x ∈ Σ∗ (6.8)

6.0.14 Classes of languages

Now we look at how the set of languages (subsets of Σ∗) are divided based on Turing machines. To see that,first we see that we need to note that if a string is accepted or rejected by a turing machine, the turing machineis said to “halt” on that string. If the string is neither accepted nor rejected, that is, the turing machine runsindefinitely on that string, then the turing machine is said to “loop” on that string. Now languages that canbe accepted by some turing machine (but not necessarily rejected) are called Recursively Enumerablelanguages. (We will see how this name originated, a little later). There are some turing machines that halton all their inputs. These are called Total Turing machines. Languages accepted by these total turingmachines are called Recursive languages. That is:

L(M) = x|x ∈ Σ∗, M accepts x is

recursively enumerable if M is some turing machinerecursive if M is some total turing machine

(6.9)

Clearly, we can see that Recursive languages are part of the the larger set of Recursively Enumerable lan-guages. Every recursive language is a recursively enumerable language but the converse is not true. To seehow the languages are classified, we can look at the figure below:


DRAFT COPY

CHAPTER 6. INTRODUCTION TO TURING MACHINES6.1. EXAMPLES: TURING MACHINES FOR SOME LANGUAGES

Figure 6.2: Classification of Languages. The universal set corresponds to the set of all subsets of Σ∗. This isan uncountably infinite set.

Existence of Non-Recursively enumerable languages

The set of all languages is nothing but the set of all subsets of Σ∗. Since Σ∗ is a countably infinite set, theset of all subsets of this countably infinite set is uncountably infinite. But we will later see that the set ofTuring machines is countably infinite. That is, we can find some way of enumerating all the Turing machines.Therefore, it is quite evident that Σ∗ has elements that are not in the R.E set (as shown in the picture above).These languages are called Not Recursively Enumerable. Therefore, ∃ no turing machines that will acceptthese languages. Another subtle point is that the set of R.E languages is a countably infinite subset of anuncountably infinite set. This means that there is a large portion of 2Σ∗ that is not R.E.

6.1 Examples: Turing machines for some languages

6.1.1 L(M) = ωω|ω ∈ a, b∗

We know that the above language L(M) is not even context free. It turns out that we can find a turingmachine that will accept L(M). Let us try to intuitively see how to construct an algorithm that checkswhether a string is in L(M). The algorithm may look something like:On some input string x:

1. Mark the end of the string (on the tape) by .

2. Check if the input string is of even length. If not reject, else proceed.

3. Find the middle of the string:


DRAFT COPY

6.1. EXAMPLES: TURING MACHINES FOR SOME LANGUAGESCHAPTER 6. INTRODUCTION TO TURING MACHINES

• Scan the tape from to , each time marking the leftmost unmarked alphabet by ` and therightmost unmarked alphabet by .

• On finding thatàndáre next to each other, the middle is now in between these two. (Middle isin between the lastànd )

4. Compare the corresponding alphabets (corresponding in the sense, alphabets that are at equal distancesfrom the middle). If at any stage they are equal, reject. Else, keep comparing until the and symbols,and accept.

We have now given a description of how the turing machine must work. We can try to write down theturing machine by describing all the parameters that constitute the turing machine (see eq. 6.1). The turingmachine for this problem, can be defined as:

Q = q01, q02, q11, q001, q002, q11, q3, q300, q301, q302Γ = a, b, a, a, b, b,,Σ = a, bt = accept stater = reject stateq01 = state state

Now we write down the transitions as follows:δ(q01, a) = (q02, a, R)δ(q02, a) = (q02, a, R)δ(q01, b) = (q02, b, R)δ(q02, b) = (q01, b, R)δ(q01,) = (q11,, L)δ(q02,) = (r,, L)δ(q11, a/b) = (q11, a/b, L)δ(q11, a/b/ ) = (q001, a/b/ , R)δ(q001, a/b) = (q12, a/b, R)δ(q12, a/b) = (q12, a/b,R)δ(q12,) = (q002,, L)δ(q12, a/b) = (q002, a/b, L)δ(q002, a/b) = (q11, a/b, L)δ(q001, a/b/ ) = (q3, a/b/ , L)δ(q002, a/b/ ) = (q3, a/b/ , R)come to the beginning of the tape and move state to q300

δ(q300, /) = (q300, /, R)δ(q300, a) = (q301,, R)δ(q300, b) = (q302,, R)δ(q301,/a/b) = (q301,/a/b, R)δ(q301, a) = (q32,, L)δ(q301, b) = (r, b, L)δ(q302,/a/b) = (q302,/a/b, R)δ(q302, b) = (q32,, L)δ(q302, a) = (r, a, L)δ(q301/q302,) = (r,, L)come to the starting of the tape and move state to q300

δ(q32, a/b/a/b/) = (q300, a/b/a/b/, L)δ(q32,) = (q300,, R)δ(q300,) = (t,, L)


DRAFT COPY

CHAPTER 6. INTRODUCTION TO TURING MACHINES6.1. EXAMPLES: TURING MACHINES FOR SOME LANGUAGES

6.1.2 ap| p is a prime

The language is over a single alphabet, and therefore, the problem reduces to finding whether a number(length of a string in that language) is prime or not. Intuitively, to determine whether a number is prime ornot, we use the sieve of erasthosthenes method. To find whether a number n is prime:

1. If n ≤ 1, n is not prime. If n = 2, then n is prime.

2. List down all the numbers from 2 to n.

3. Declare the first unmarked number in the list as prime, if this number happens to be n itself, then n isprime. Else, mark all its multiples (including itself) on the list.

4. If the last number n, is marked in the list, then n is not prime.

5. Repeat the steps 3 and 4 for all the numbers on the list.

We can now see how the above steps translate when we think of a turing machine following the algo-rithm:

1. Determine if there are at least three a’s by scanning the first three cells of the tape. If there are onlytwo, then accept. If there is one or less, reject.

2. Create an identification for the last a (which corresponds to the prime number n). Erase the last a andreplace it with $. Also, erase the first a (replace with ).

3. Identifying multiples:

• Start from , scan right and find the first non-blank symbol. (Let this position be m). If thishappens to be $, accept. Else, erase the symbol and replace it with a.

• We now need to delete all multiples of m, that is all a’s that occur at positions that are multiplesof m.

a. Move left until , marking each symbol on the way, with a .

b. Erase a (replace it with ).

c. Start from the leftmost marked blank: , move right until the first non-marked symbol: a, andmark it with a.

d. Repeat the above step until theˆis moved from to the first unmarked a, to give a.

e. If $ is marked with a , reject.

f. Repeat steps b to e, until all the symbols in the tape (except ) are marked with`or .

• Repeat steps 1 and 2.

We can now try to write down the turing machine by describing its elements (and its transitions):

Q = q0, q000, q001, q002, q1, q101, q102, q202, q203, q204, q205Σ = aΓ = a, a, a,, , ,, $

Transitions:δ(q0,) = (q000,, R)δ(q000,) = (r,, L)δ(q000, a) = (q001, a, R)δ(q001,) = (r,, L)δ(q001, a) = (q1, a, L)Stage Pass: Go to the beginning of string. Move state to q100. Erase the first a.

δ(q1, a) = (q1, a, L)δ(q1,) = (q100,, R)Marking the first a as :


DRAFT COPY

6.2. VARIATIONS OF THE TURING MACHINECHAPTER 6. INTRODUCTION TO TURING MACHINES

δ(q100, a) = (q101,, R)Keep moving right until and mark the last a as $.

δ(q101, a) = (q101, a, R)δ(q101,) = (q102,, L)δ(q102, a) = (q2, $, L)Go to the beginning of the tape and move state to q200.

δ(q2, a) = (q2, a, L)δ(q2,) = (q200,, R)δ(q200,/a) = (q200,, R)δ(q200, a) = (q201, a, L)If $ is the first unmarked symbol, to be marked with $, then accept.

δ(q200, $) = (t, $, L)Fromˆto , mark all the symbols with :

δ(q201, a/ /$) = (q201, a//$, L)δ(q201,) = (q202,, R)Shifting the markers. Collect the first marked symbol, and keep moving right to find the first unmarked one after

, and mark it.

δ(q202, a/) = (q203, a/, R)δ(q203, a/) = (q203, a/, R)δ(q203, a) = (q204, a, R)δ(q204, a) = (q204, a, R)Do the same marking for all the multiples:

δ(q204, a) = (q202, a, L)δ(q204, $/$) = (q202, $, L)If a itself needs to be shifted, then move right to the first a (unmarked), and mark it as a.

δ(q202, a) = (q205,, R)δ(q205, a) = (q205, a, R)Do the same striking off for multiples of all the numbers.

δ(q205, a) = (q200, a, L)If $ happens to be one of the multiples (to be marked as $), reject.

δ(q205, $) = (r,, L)

Therefore, we now have an explicit construction of a turing machine that will accept L(M) = ap| p is prime.

6.2 Variations of the Turing Machine

The Turing machine that we have been considering until (eq. 6.1) is a single tape deterministic turingmachine. Now we consider variations of the Turing Machine. Our task is now to show that all thesemodels are equivalent, that is, we can simulate all these Turing Machines using the model of the Single TapeDeterministic Turing Machine.

6.2.1 Multi-Track Turing Machines

6.2.2 Multi-Tape Turing Machines

These are Turing machines with multiple tapes and multiple read write heads. A rough diagram of the turingmachine looks like:


DRAFT COPY

CHAPTER 6. INTRODUCTION TO TURING MACHINES6.2. VARIATIONS OF THE TURING MACHINE

Figure 6.3: Figure showing a Turing machine with 3 independent tapes and read write heads

At each step the finite control reads three different symbols from the three different read write heads of thesetapes. Based on this information and the information contained in the states, the turing machine makesthe transitions, and makes necessary operations on the three tapes. Each read-write head can move in anydirection, irrespective of the other read-write heads. We now need to simulate the above turing machineusing the single tape turing machine. If should reduce the above problem of three tapes to a problem of threetapes with a common read-write head. We can then use the result of the previous variation (section: 6.2.1)to show that it trivially reduces to the single tape turing machine.Intuitively, the problem of blindly replacing a single read-write head instead of three read-write heads isthat, when the finite control switches tapes, the information about its position on the previous tape is lost.Therefore, when it again comes back to that tape, it wouldn’t know form where ,on the tape, it must resume.So, if we could preserve this information, then we can have a single read-write head that will read theinformation from each of these tapes, and when it switches tapes, it will remember its last position on theprevious tape, by storing this position somewhere. Another subtle issue is that the tape is unbounded onone side, so the information about the position of the read-write head cannot be stored in the states, becausethe number of states are finite. Therefore, we need a new tape. We can now do the following:

1. Introduce a new track to each of the three tapes. This doesn’t increase the power of the turing machineas we have shown earlier (section: 6.2.1).

2. In each of the cases, fill this additional track will 0’s except at one position, which corresponds to theposition of the read-write head on its parallel track.

3. Therefore, when the finite control switches tapes, the position of the read-write head on the previoustape is remembered in this “extra” track.

A schematic view of this modified turing machine looks like:


DRAFT COPY

6.2. VARIATIONS OF THE TURING MACHINECHAPTER 6. INTRODUCTION TO TURING MACHINES

Figure 6.4: Figure showing a turing machine with three tapes but only one read write head. The light linesindicate the position of the read write head when it left the tape.

Therefore, we have now reduced the case of a multiple read write heads to a single read write head case. Nowwe can use the same technique as in (section: 6.2.1) to reduce it the standard model.

6.2.3 Multi-Dimensional Turing Machines

Figure 6.5: Figure showing a Turing Machine with a Two-Dimensional Tape


DRAFT COPY

CHAPTER 6. INTRODUCTION TO TURING MACHINES6.2. VARIATIONS OF THE TURING MACHINE

6.2.4 Non-Deterministic Turing Machines

The case of a not deterministic turing machine is a slightly non-trivial case. In this case, the turing machinesmakes non-deterministic transition, just like the transitions of a PDA or an NFA. the transition function isnow defined as:

δ ⊆ (Q× Γ×Q× Γ× L, R) (6.10)

The transition function of this NDTM will take an element (q, a) where q ∈ Q and a ∈ Γ, to the set(q1, a1, L), (q2, a2, R), (q3, a3, L), . . . , where the turing machine, upon reading (q, a) can go non-deteministicallyto any element of the set (q1, a1, L), (q2, a2, R), (q3, a3, L), . . . . To convert it to a deterministic model, wemust first consider all the transitions.

Table 6.1: Table showing all possible transitionsState 1 2 3 4

δ(q1, a1) − δ(q2, a2, L) δ(q5, a5, R) δ(q3, a3, R)δ(q2, a2) δ(q1, a1, L) − δ(q4, a4, R) δ(q3, a3, R)δ(q3, a3) − δ(q1, a1, R) δ(q5, a5, L) −δ(q4, a4) − δ(q5, a5, L) − −δ(q5, a5) − δ(q2, a2, L) − δ(q3, a3, L)

Let the non deterministic turing machine be Mn. At every step, the transition function of Mn has at mostk choices. Let these be labelled from 1 . . . k. Now, a particular number represents a possible transition.Therefore, a sequence of numbers from 1 to k, represents a possible run of the Turing machine. Now if wetake all possible sequences of numbers form 1 to k, then we will get all possible runs of Mn. Since Mn isnon deterministic, it suffices with at least one of the runs of the Turing machine results in an acceptance.Therefore now let Md be a machine with 3 tracks. On one track, it will have

6.2.5 Enumeration Machines

An enumeration machine is a modification of a turing machine, and is described as follows:

1. It has two tapes, a read-write tape called the work tape on which it does some operations, and a writeonly output tape, on which it prints some output, after doing something on the work tape.

2. It can only write symbols in Σ.

3. It does not have any input. Also it does not have any accept or reject states. It starts from a startstate, with both tapes blank, and continues to make transitions just like a T.M.

4. When it prints some string on the output tape (it is said that the string has been enumerated), themachine enters a special state called the Enumeration State. After writing the output, in the nexttransition, the output tape is automatically erased.

5. The machine may never enumerate any strings L(E) = φ, or it may enumerate infinitely many strings.The strings being enumerated can be repeated.

6. Thus, the machine runs forever.

6.2.6 Equivalence of the Turing machines and Enumeration machines

We now need to show that Turing machines and Enumeration machines are equivalent in terms of computationpower. For this, we need to show a two way equivalence:


DRAFT COPY

6.3. UNIVERSAL TURING MACHINES CHAPTER 6. INTRODUCTION TO TURING MACHINES

1. Given a Enumeration machine E, ∃ a Turing machine M such that L(M) = L(E). 1

We can now show the construction of this turing machine M that will accept all the strings which areenumerated by E. Now let M have a tape with three tracks. On the first two tracks, we can simulateE, by making one of them the work tape and the other as the output tape. The third track is leftfor the turing machine M . On input x, M will copy the input to the third track, and run E on theother two tracks. E will now run and occasionally output strings on the output track. Each time Eenumerates a string, M will check it with its string on the third track. If they match, M shall acceptx. Otherwise, M will wait (indefinitely) until the string is enumerated.Now we see that if a string is enumerated by E, then it will be accepted by M . Therefore, M will onlyaccept strings that are enumerated by E. Hence we have shown that ∀ E, ∃ M such that L(M) = L(E).

2. Given a Turing machine M , ∃ an Enumeration machine E such that L(E) = L(M).We can now show the construction of the enumeration machine E that will enumerate only those stringsthat are accepted by M . Consider an enumeration machine with a 2-track work tape, where, the firsttrack is used to simulate M , and the second is used by E itself. The enumeration machine E will nowenumerate all the strings in Σ∗ on the second track of the work tape.Intuitively we would expect that the Enumeration machine E now copies each of the strings one-by-oneon to the first track of the work tape and simulates M on this string. If M accepts the string, then theenumerator prints the string on its output tape. But, there is a subtle issue here. We failed to considerthe possibility that M is not a Total Turing machine. So, M need not accept or reject some strings, itcould indefinitely keep looping; In which case, E would be stuck simulating M on x forever and wouldnever move on to strings later in the list. (Also, it is impossible to determine whether M halts on x 2).The flaw in this procedure is that we are simulating M on each of the strings, one after the other.To avoid getting stuck in one string, we must do some sort of a time sharing procedure where we runseveral simulations, working a bit of each simulation each time. Consider the following steps:

• Divide the work track of E into several segments, separated by some # ∈ Γ. The length of thesesegments can be arbitrary. Our goal is now to simulate M on several strings, using the differentsegments of the tape, doing one step of each simulation at a time.

• In the first segment of the tape, let E run one step of the simulation of M on the first input string.Then let E run one step, each, of the simulation of M on the first and second input strings. Then,let E run one step, each, of the simulation of M on the first, second and the third input strings.This process can go on for-ever.

If M halts on some of the strings (say after m steps), then that string will be accepted (or rejected)after m steps of this simulation. If M does not halt for some strings, then M will not get stuck in thatstring and each time it will perform one step of simulation of M on that string. Therefore, strings whichare accepted by the Turing Machine, will be enumerated after a finite number of simulation-steps; andthose which strings on which M loops, even E will loop on them (they are never enumerated).Thus, we have now constructed an Enumeration machine that will enumerate all the strings acceptedby M .

Therefore, from this two way equivalence, we can conclude that an Enumeration Machine and a TuringMachine are equivalent in terms of their computation power.

6.3 Universal Turing machines

Universal Turing machines are some Turing machines which are structurally like any other Turing machines,but their task is to simulate other Turing machines. They take as input, a Turing machine (rather, adescription of a Turing machine) and an input string, and simulate what M would do on input x. Theiroutput would be the same as M ’s output on input x. Now, we know that Turing machines can only accept

1In fact, the name Recursively Enumerable comes from the fact that the language can be enumerated by some Enumerationmachine.

2This is the famous Halting problem


DRAFT COPY

CHAPTER 6. INTRODUCTION TO TURING MACHINES6.4. SET OPERATIONS ON TURING MACHINES

strings as inputs. So, in order to input description, we need to find some encoding of Turing Machinesas strings. As a matter of convenience, this encoding of Turing machines using strings over the alphabet0, 1.

6.3.1 Encoding Turing machines over 0, 1∗

To encode the description of a Turing machine as a string, we must find a way of representing the nineelements (eq. 6.1): (Q,Σ,Γ, s, t, r,,, δ). Let us first see how the first eight components are encoded.

• Let there be n states labelled 1 . . . n, out of which the sth state is start state, tth and rth states are theaccept and reject states respectively. All we need to do to encode a state is to represent n, using thealphabet 0, 1. This can be done trivially by putting 0n. Similarly, we put 0s, 0t and 0r to representthe start, accept and the reject states respectively. (Alphabet 1 need not even be used).

• Let there be m input alphabets labelled 1 . . . m, k tape alphabets labelled 1 . . . k, out of which, uth andvth alphabets represent the left end-marker and the blank symbols respectively. Similar to the abovecase, the k tape alphabets can be represented as 0k, and we can put 0u and 0v to represent the leftend-marker and the blank symbols respectively.

• The transition function can be encoded using the encoding of states as described previously. A transitionfunction of the form: δ(p, a) = (q, b, L) can be encoded as: 0p10a10q10b10, where the last 0 stands forL and 1 if R. Similarly, we can enumerate all the transitions as strings in 0, 1.

• Now we have encoded all the components of the Turing machine. We just need to put together thesecomponents, by separating each description by a 1. Hence:

(Q,Σ,Γ, s, t, r,,) ≡ 0n10m10k10s10t10r10u10v (6.11)

Along with the above description, we must append the encoding of all the transitions as strings in0, 1.

6.3.2 Working of a Universal Turing machines

A universal turing machine will now take an encoding of a turing machine M and an input string x. Todifferentiate between the encoding and the description, the two are separated by a # symbol, that is part ofthe tape alphabet of the UTM - U . Hence the input string is represented as M#x. U has a 3-track tape.On the first track, it stores the description of the turing machine M and input x. The second track is usedas the tape for the turing machine M . the simulation contents will be stored in this tape. The third trackwill now be used to store the current state and the position of the read-write head of M . Depending on thetransition function,the contents in the second and third tapes are updated.

6.4 Set operations on Turing machines

Given below are some set operations that can be applied to Turing machines as they represent the set of R.Elanguages. These can also be thought of as set operations on R.E languages. The given set operations arefor two Turing machines. They can however be generalized to involve more than two also. In many of theseset operations, we need to distinguish between Total Turing machines and Turing machines (these consistsof the Non-Halting machines as well as the Halting machines), which is the same as distinguishing betweenRecursive and Recursively Enumerable languages.


DRAFT COPY

6.4. SET OPERATIONS ON TURING MACHINESCHAPTER 6. INTRODUCTION TO TURING MACHINES

6.4.1 Union of two turing machines

Let the two turing machine whose union needs to be taken be M1 and M2. We now need to find some turingmachine M , such that: L(M) = L(M1)

L(M2).

1. Consider a turing machine with a 4-track tape. On the first track, we have the encoding of M1. On thesecond track, we have the encoding of M2. On the third and fourth tracks each we copy the input x.The last two tracks are like the read-write tapes that M1 and M2 will use. Let M be the new TuringMachine formed.

2. Now, we simulate M1 (using the description of M1 on the first track and a Universal Turing machine)on the input x, on the third track. Then we simulate M2 on the input x on the fourth track. If eitherof the two simulations result in acceptance of the input string, the string is accepted by M .There is a subtle issue here. There may be cases where the Turing machine M1 may loop indefinitelyon input x but M2 may accept x. However, with the procedure described above, we will get stuck inthe simulation of M1, thereby, not realizing that M2 will actually halt and accept x. To avoid thesesituations, we need to follow a time-sharing procedure. That is, we need to simulate one step of M1

and M2 alternatively on the input x in the last two tape. Now, if any simulation loops, it still does notprevent the other simulation from running. Therefore, for a string in L(M), one of them will halt, infinite time, and the string will be accepted. For strings not in L(M), either (both) of them will reject,in which case, the new Turing machine will also reject; and if both M1 and M2 loop on some input, Mwill also loop.

6.4.2 Intersection of two turing machines

To take the intersection of two Turing machines M1 and M2, we need to follow a similar procedure as above,except that there is not time-sharing technique required. We can simulate M1 and M2, one after the other,on two the tracks of M . Only if both simulations result in acceptance, the input is accepted by the new turingmachine. Now if M1 loops on some string, it is true that we will be stuck in simulating M1 and never beginsimulating M2. But the criteria for acceptance requires both M1 and M2 to accept. So, if we see that M1

does not accept and keeps looping, there is no use on simulating M2, and might as well keep looping.

6.4.3 Complement of a Turing machine

To take the complement of a Total Turing machine, it is enough if we make the accept states as reject statesand vice-versa. The problem with a non-Total Turing machine is that it does not halt for all inputs. For aNon-Total Turing machine, “not-accepting” and “rejecting” are not synonymous. When it does not acceptsome input, it need not also reject, it could keep looping on that particular input. So, if we interchangethe accept and the reject states, then the new Non-Total machine will reject strings that the old machineaccepted and accept strings that the old machine rejected; but on strings that the old machine loops, thenew machine will also loop on them. As, a result, it will neither accept nor reject some strings. Therefore,the new machine is not even a valid Turing machine for the complement Language.Theorem: If M is a Turing machine such that M is a total Turing machine, then M is also a total Turingmachine.Proof:

1. If M and M are turing machines, then (from section: 6.2.6) we can have some two enumeration machineE and E, that can enumerate all the strings in L(M) and L(M) respectively.

2. If L(M) is recursively enumerable and we show that, given an arbitrary input string x, it is possible tofind whether x ∈ L(M) or x /∈ L(M), then L(M) is recursive.

• Consider a universal Turing machine U , that takes as input M#M#x.

• U is a 6-track turing machine.


DRAFT COPY

CHAPTER 6. INTRODUCTION TO TURING MACHINES 6.5. HALTING PROBLEM

- The first and the fourth tracks are used to store the description of M and x, and M and xrespectively.

- The second and fifth tracks are used as a read-write tape to store the result of simulation of Mand M on x respectively.

- The third and sixth tracks are used to keep track of the states and the read-write head of Mand M respectively.

• Now, given an input x, U will now run both M and M on x, in a employing a time sharingprocedure.

• If the simulation of M results in an acceptance, then xinL(M). If the simulation of M results inan acceptance, then x /∈ L(M). These is no scope of looping here because both M or M mustaccept all the strings in L(M) and ˜L(M) respectively.

• Therefore, we have found a method to accurately determine if an arbitrary string x is in L(M) ornot. Therefore, L(M) is a recursive language and M is a Total Turing machine.

6.4.4 Concatenation of two Turing machine

Let M1 and M2 be Turing machines accepting L(M1) and L(M2) respectively. Now we need to find a Turingmachine that will accept the language L(M1) · L(M2). To do this, we simply make the final states of M1 asthe non-final states and put an -transition form these states (old final states of M1) to the start state of M2.The new machine formed will now accept all x where x ∈ L(M1) · L(M2).

6.5 Halting Problem

The halting problem is a famous problem for turing machines. The halting problem, informally is, whetherthere exists some universal turing machine that can take as input, some Turing machine M and an arbitraryinput x, and determine whether M halts (or loops) on the input x. Formally, the halting problem is: Is thelanguage

HP = M#x|M halts on x (6.12)

recursive ?It is obviously Recursively Enumerable, as we have the universal turing machine which can determine if x isaccepted by M . For this, We just need to the UTM to simulate M on x. The whole problem is to determinewhether M will be reject x, or will loop indefinitely.


DRAFT COPY

6.6. DECIDABILITY AND UNDECIDABILITYCHAPTER 6. INTRODUCTION TO TURING MACHINES

States →T.M’s ↓ 0 1 00 01 10 11 000 001 010 . . .

M H L L H L H H L L H . . .M0 H H L H L H L L H H . . .M1 H H L L H L H L L H . . .M00 L L H H L L H H H LM01 H L H L L H L L H HM10 H H L H H L L H L LM11 H H L L H L H L H LM000 L L L H H L L L L L . . .M001 L L H H L L H H H L . . .M010 L H L H L H L L L H . . .

......

...

Table 6.2: Table showing a possible output table of the (hypothetical) Total Turing Machine that can solvethe Halting Problem. The (i, j) cell denotes whether the Turing machine Mi will halt on input j. H standsfor Halt and L for Loop.

Figure 6.6: Output table for the Turing Machine: K. Notice the diagonal. The machine K differs from eachof Mi’s in the ith position, thereby contradicting the fact that K is a Turing machine.

6.5.1 Membership Problem

6.6 Decidability and Undecidability

6.7 Quantum Turing Machines


DRAFT COPY

Chapter 7

Computational Complexity

Chapter not yet started. The Topics to be covered are:

1. Definition of Complexity classes P and NP. NP-Completeness. Two equivalent definitions of NP lan-guages.

2. Reductions. Closure of P and NP under reductions.

3. Cooke Levin Theorem: Proof.

4. The P = NP problem.

5. Examples of NP-Complete problems: Proof by reduction to other (already proved NP-Complete lan-guages).

(a) TM-SAT

(b) k-clique

(c) Independent Set

(d) Vertex cover

(e) Hamiltonian Path and Hamiltonian Cycle

6. Defintion and examples of languages for the co-NP set.

7. The Decision vs. Search problem and the particular case for the NP-Complete languages.

8. Definition of Space Complexity classes: DSPACE, NSPACE.

9. Time and Space constructible functions

10. Deterministic Space Hierarchy theorem: Proof by diagonalization.

11. Configuration graphs and the method of crossing sequences.

12. Deterministic Time Hierarchy theorem: Proof by diagonalization, using the following ideas:

(a) Simulation of k-tape Turing machine using a

- 1-tape turing machine with a quadratic slowdown - Bound proved using crossing sequencesargument.

- 2-Tape turing machine with a logarithmic slowdown - moving the data in cells instead of theread/write head - Bi operations.

13. Non-Deterministic Time Hierarchy Theorem. Proof by Lazy-diagonalization.

14. Immerman Szelepeenyi Theorem: NSPACE = coNSPACE

95

DRAFT COPY

CHAPTER 7. COMPUTATIONAL COMPLEXITY

(a) Proof for: PATH is NL-Complete. Then using this to give an algorithm for deciding PATH.Guessing the number of reachable vertices from start state and ensuring the guess. Therebyproving PATH is also NL-Complete. Hence NL = coNL.

15. Proof for: 2− SAT is in P , and 2− SAT is NL-Complete.

16. Various Time and Space complexity classes. The upward and the downward translational lemmas.

17. Logarithmic and sub-logarithmic space bounds: Classes L and NL. Logspace reductions logspaceconstructible, logspace transducers.

18. Savitch Theorem: PSPACE = NSPACE. Proof: Using the graph reachability or the PATH problem.

19. PSPACE Completeness:

(a) True Quantified Boolean Formulae (TQBF): Proof for: TQBF is PSPACE-Complete.

(b) Generalized Geography (GG): Proof for: GG PSPACE-Complete.

20. Oracle Turing machines

(a) Definition of an Oracle Machine

(b) Relations between different Complexity classes, imposed due to the oracle machines. Ex. NPTQBF =PTQBF .

(c) Baker Gill Solvay Theorem: ∃ an Oracle A, PA = NPA.

21. Polynomial Time Hierarchy

(a) FILL

(b) Definition of Complexity classes Σ2P and Π2

P , and their properties.

22. Definition of the Complexity class P/Poly.

(a) Properties and theorems involving the class P/Poly, and their proof:

- SAT ∈ P/Poly ⇒ PH = Σ2P

No subset of 1∗ is NP-Complete unless P = NP .


DRAFT COPY

Part V

INFORMATION THEORY

97

DRAFT COPY

DRAFT COPY

Chapter 8

Fundamentals of InformationTheory

8.1 Introduction

There are two types of formalisms for this theory. One is due to Shannon, where the information stored inan event is measured using the uncertainty associated with the probability of that event. Another is die toKolmogorov/Chaitin, where the amount of information stored in an object is proportional to number of bitsneeded to describe (compress) that object.Whenever we talk about the information stored in an “object”, we mean the information stored in some ran-dom variable. The value that a random variable takes is the information stored in that random variable. Wenow attempt now quantify this information stored in the random variable, using Shannon’s approach.

8.2 Axiomatic Definition of the Shannon Entropy

Before stating the axioms that our measure of uncertainty must satisfy, we need to state some definitionsthat we will use throughout:

1. Let X be a discrete random variable (source of information) that takes the values x1, x2, . . . xn withprobabilities p1, p2, . . . pn respectively. The event X = xi occurs with probability pi.

2. Let us assume that

i

pi = 1 and pi > 0∀i.

3. Define the function: h(pi) = uncertainty in the event X = xi.

4. Define another function of n variables: H(p1, p2, . . . , pn) = average uncertainty associated with allthe events X = xi. We have the relation: H(p1, p2, . . . , pn) =

i

pih(pi). For convenience, let us

write H(X) instead of H(p1, p2, . . . , pn). H(X) can be interpreted as an average of all uncertaintiesassociated with X = xi∀i, with each quantity being weighted with its probability.

5. Define a function f(N), the average uncertainty associated with N independent and equally probable

events: f(N) = H(1N

,1N

, . . .1N

).

Now that we have the necessary definitions, we state some basic properties that we demand, the measure ofuncertainty, H(X), must satisfy:

99

DRAFT COPY

8.2. AXIOMATIC DEFINITION OF THE SHANNON ENTROPYCHAPTER 8. FUNDAMENTALS OF INFORMATION THEORY

1. f(N) must be a monotonically increasing function of N .This is because, intuitively, the uncertainty associated an event occurring out of N events (equallylikely), should increase as N increases.

2. For any two mutually independent events X and Y , where X takes N values and Y takes M values,we must have: f(MN) = f(M) + f(N). This comes from the probability condition that the jointprobability of two events is the sum of their probabilities, if they are mutually independent.

3. We now go back to the function H(X). Consider the following experiment: The set of Random variablesthat X can take is divided into two groups A and B. The experiment is continues to obtain the valuesx1, x2, . . . .

Figure 8.1: Figure showing the compound experiment.

Now the probability of obtaining a value of X, say PX = xi = P A is chosen × Px| A is chosen + P B is chosen × Px| B is chosen . Therefore, uncertainty about the value of X obtained =(Uncertainty about which group is chosen) + (probability of a group being chosen) × (Uncertaintywith finding xi in that group).

There are only 2 groups that can be chosen with probabilitiesr

i=1

pi andN

i=r+1

respectively. The un-

certainty associated with this event is H

r

i=1

pi,N

i=r+1

pi

. Similarly, from the diagram we can see the

probability of xi being chosen given the group. Therefore, uncertainty associated with finding xi in

group A is: H

p1ri=1 pi

,p2ri=1 pi

, . . . ,prri=1 pi

, and that of finding xi in B is : H

pr+1Ni=r+1 pi

,pr+2Ni=r+1 pi

, . . . ,pNN

i=r+1 pi

.


DRAFT COPY

CHAPTER 8. FUNDAMENTALS OF INFORMATION THEORY8.2. AXIOMATIC DEFINITION OF THE SHANNON ENTROPY

Therefore, we have:

H(X) = H(p1+p2+· · ·+pr, pr+1+pr+2+· · ·+pN )+(p1+p2+· · ·+pr)H

p1ri=1 pi

,p2ri=1 pi

, . . . ,prri=1 pi

+

(pr+1 + pr+2 + · · ·+ pN )H

pr+1Ni=r+1 pi

,pr+2Ni=r+1 pi

, . . . ,pNN

i=r+1 pi

Hence, we now have another requirement that H(X) must satisfy.

4. Finally, we now demand that H(p, 1 − p) is a continuous function of p. That is, the effect of a smallchange in the probability of some event must also be a small change in the uncertainty associated withthat event.

With the above requirements, we state the theorem that:Theorem: The functional form that satisfies all the above axioms is:

H(p1, p2, . . . , pN ) = −CN

i=1

pi log pi (8.1)

where pi stands for PX = xi. The base of the logarithm and C are some arbitrary constants. We nowneed to prove the above theorem by showing explicitly that the above axioms are satisfied by the functionalform in (eq. 8.1).

1. From (eq. 8.1), we can see that

f(N) = H(1N

,1N

, . . . ,1N

)

= −CN

i=1

PX = xi log PX = xi

= −C

1N

log1N

+1N

log1N

+ · · ·+ 1N

log1N

= −C

log

1N

⇒ C log N (8.2)

now, we know that C > 0 and N > 0. Since log N is a monotonically increasing function of N , weconclude that f(N) is also a monotonically increasing function of N . Therefore, the first axiom issatisfied by the functional form in 8.1.

2. The second axiom is regarding two mutually independent random variables. Consider them to be Xand Y taking N and M discrete values respectively. Once again we need to consider the functionsf(N) and f(M), similar to how we defined it in (eq. 8.2). Since X and Y are mutually independent,we have: PX = xi, Y = yi = PX = xi× PY = yi. From (eq. 8.2):

f(NM) = −C log(PX = xi, Y = yi)= −C [log PX = xi+ log PY = yi]= −C log PX = xi− C log PY = yi= C log N + C log M ⇒ f(N) + f(M)

Hence, the second axiom is also satisfied by the functional form in (eq. 8.1)

3. We now need to show the satisfaction of the grouping axiom (axiom 3). We can show this by inductionon N . Before assuming that the theorem holds for N , we need to show that the N = 2 case is satisfied.For N = 2 case, we have:

H(p1, p2) = H(p1, p2) + p1H(p1) + p2H(p2)= H(p1, p2)


DRAFT COPY

8.2. AXIOMATIC DEFINITION OF THE SHANNON ENTROPYCHAPTER 8. FUNDAMENTALS OF INFORMATION THEORY

since, H(p1) = H(p2) = 0 1. Hence, we see that the grouping axiom trivially holds for the N = 2case. Let us now assume that the formula holds for N , and hence for the N = 1 case, we considerthe values x1, x2, . . . , xN+1 to be split into two groups, one consisting of x1, . . . , xN and the otherhaving xN+1. Therefore, we write:

H(p1, p2, . . . pN+1) = H(p1 + · · ·+ pN , pN+1) + (p1 + . . . pN )H

p1Ni=1 pi

. . .pNNi=1 pi

+ pN+1H(pN+1)

We have, H(pN+1) = 0. (See footnote 1). From (eq. 8.1), we have LHS of the above equation =

−CN+1

i=1

pi log pi. We now need to show that the RHS is also the same. Consider the first term in the

RHS:

H(p1 + p2 + · · ·+ pN ) = −C

(p1 + p2 + . . . pN ) log

N

i=1

pi

+ pN+1 log pN+1

∵N+1

i=1

pi = 1 ⇒ = −C

N

i=1

pi

×

log

N

i=1

pi

− C

1−

N

i=1

pi

log

1−

N

i=1

pi

= −C

N

i=1

pi

×

log

N

i=1

pi

− CpN+1 log pN+1 (8.3)

The second term of the RHS would give:

N

i=1

pi

H

p1Ni=1 pi

, . . . ,pNNi=1 pi

=

N

i=1

pi

−C

p1Ni=1 pi

log

p1Ni=1 pi

+

p2Ni=1 pi

log

p2Ni=1 pi

+

· · ·+ pNNi=1 pi

log

pNNi=1 pi

= −C

p1 log p1 + p2 log p2 + · · ·+ pN log pN −

log

N

i=1

pi

×

N

i=1

pi

= −CN

i=1

pi log pi + C

N

i=1

pi

×

log

N

i=1

pi

(8.4)

We can now put together the simplified expressions of the two terms in the RHS, and we get fromequations (eq. 8.3) and (eq. 8.4):

RHS = − C

N

i=1

pi

×

log

N

i=1

pi

− CpN+1 log pN+1 − C

N

i=1

pi log pi + C

N

i=1

pi

×

log

N

i=1

pi

= −CN

i=1

pi log pi − CpN+1 log pN+1

= −CN+1

i=1

pi log pi

= H(p1, p2, . . . pN+1)

Therefore, we have shown that the RHS simplifies to the same expression as the LHS, and hence thegrouping axiom is also satisfied by the functional form in (eq. 8.1).

1When H is a function of p1, p2, . . . pn, we must have the condition thatNX

i=1

pi = 1. Similarly, if H is a function of a single

variable p, then we must have the condition that p = 1. If p = 1, then it means that the corresponding event is certain andintuitively, we would require that the uncertainty associated with the event occurring is 0.


DRAFT COPY

CHAPTER 8. FUNDAMENTALS OF INFORMATION THEORY8.3. INTERPRETATIONS OF THE UNCERTAINTY FUNCTION

4. We now need to show the property that the uncertainty function is continuous function of its parameters.We can show this for the N = 2 case, as the cases for N > 2 will then hold. For the N = 2 case, wehave: p1 = p and p2 = 1− p. We have:

H(p, 1− p) = −C [p log p− (1− p) log(1− p)]

We need to show that the above function is continuous ∀p (We have already assumed that p > 0). Toshow that it is continuous, we can show that it is differentiable ∀p > 0. On differentiating H(p, 1− p),we get:

d

dpH(p, 1− p) = −C [(log p− 1)− (− log p− 1)] ⇒ −C [2 log p]

We know that in the interval [0, 1] − 0, the function log p is well defined. Therefore. we have thatH(p, 1 − p) is differentiable ∀p > 0. Hence, we have that H(p, 1 − p) is also continuous ∀p > 0.Therefore, we have shown that the functional form in 8.1 satisfies axiom four.

Hence, we conclude that the definition of the measure of uncertainty proposed in (eq. 8.1) is indeed the rightmeasure of uncertainty according to our axioms. We take the base of the logarithm to be 2 and C = 1, byconvention. Therefore, we have:

H(X) = −

i

pi log pi (8.5)

The above measure of uncertainty given by H is Shannon’s measure of uncertainty and the quantity H(X)is called the Shannon Entropy of X, or the Communicational Entropy of X.

8.3 Interpretations of the Uncertainty Function

1. From the definition of the Shannon Entropy (eq. 8.5), we have:

H(X) = −

i

pi log pi

=

i

piwi

Where, wi is given by the random variable W (X) = − log PX = xi. Therefore, we have:

H(X) = E (W (X)) (8.6)

Where E stands for the Expectation value. Therefore, the uncertainty associated with the randomvariable X is the expectation value of a random variable W (X) defined as: W (X) = − log PX = xi.

2. Let us consider the following experiment. A coin is tossed, giving heads (0) with probability p and tails(1) with probability (1− p). Our task is to find out the outcome of a particular run of the experiment.All that we can do is to ask questions about the coin, whose answers will be a “yes” or a “no”. Wemust find the outcome of the coin toss by asking the minimum average number of questions.Let a introduce a slight technicality: Let the coin toss be replaced by a random variable X that takesvalue 0 and 1 with probabilities p and (1− p) respectively. Consider the trivial case where the coin istossed once and we need to guess the outcome: The trivial and the only question to ask would be “is X= 0 ?” If we receive a “yes”, we know that the outcome was 0, or if we receive a “no”, the outcome was1. Therefore, by asking one question, we have guessed the outcome of one coin toss, thereby asking anaverage of one question per coin toss. The question obviously arises: Can we do better and if so, whatis the best we can do ? To answer the first question, consider another slightly non-trivial example. Inthe figure below we have the situation where we wait for two coin tosses and then guess the outcomeof both the tosses simultaneously.


DRAFT COPY

8.3. INTERPRETATIONS OF THE UNCERTAINTY FUNCTIONCHAPTER 8. FUNDAMENTALS OF INFORMATION THEORY

Figure 8.2: Experiment involving two coin tosses. The guess is made simultaneously on the results of boththe tosses.

Figure 8.3: Experiment involving three coin tosses. The guess is made simultaneously on the results of allthe tosses.

For the case of two, three and four coin tosses, if the probability of getting X = 0 is p, then we see thatthe average number of questions, per coin toss, to be asked is:


DRAFT COPY


Table 8.1: Average number of questions to be asked to determine the result of one coin toss

tosses General expression for average question per coin toss p = 0.95

2(x2 + 2(1− x)x + 3(1− x)x + 3(1− x)2)

20.57

3(x3 + 3((1− x)2x + (1− x)3) + 4(x2(1− x) + 2x(1− x)2) + 5(2x2(1− x)))

30.51

4(x4 + 4(2x3(1− x) + (1− x)4) + 5(2x3(1− x) + 4x(1− x)3 + 2x2(1− x)2) + 6(4x2(1− x)2))

40.41

Here is a graph showing the coin toss results:

Figure 8.4: Graph showing the average number of questions that need to be asked per coin toss, to determinethe value of the toss.

Indeed, one can do a similar method for the case of five and above number of coin tosses. We canclearly see that by increasing the number of coin tosses, one can say the result of a toss by asking lessthan one question per coin toss. The claim is that, after a certain number of tosses, one cannot dobetter. This “certain number” is dependent on p, the probability of obtaining a heads, and it is thefunctional value H(p) (from eq. 8.5). Therefore, restating our claim, we say that: if the probability ofheads is p, then, H(p) is the minimum average number of questions that one can ask to determine thevalue of X. This is another interpretation of the uncertainty function.

3. There is another interpretation which is similar to the previous one. Consider X to be a discrete randomvariable that takes values x1, x2, . . . xn with probabilities p1, p2, . . . pn probabilities. Consider a nexperiments with X where it takes n different values, or equivalently, consider n identically distributedrandom variables X1, X2, . . . Xn. Let fi(X1, X2, . . . Xn) be a function that will give the number of


DRAFT COPY


times xi occurs in the sequence (X1, X2, . . . Xn). The value fi(X1, X2, . . . Xn) takes, will certainlydepend on the values X1, X2, . . . Xn assume, in the experiment. We can however, compute the averagenumber of occurrences of xi. So, consider all the cases:

• P (xi occurring once) ≡ P(anyone Xj = xi, and all other Xj = xi)= nC1 × pi × (1− pi)n−1

• Similarly, P (xi occurring twice) ≡ P(any two of Xj = xi and all other Xj = xi) = (number ofways of choosing 2 Xj ’s from n X’s) × (probability of two of them being xi) × (Probability of therest not being xi) = nC2 × p2

i × (1− pi)n−2

• And so on ... we have: P (xi occurring m times) ≡ (number of ways of choosing m Xj ’s from nX’s) × (probability of m of them being xi) × (Probability of the rest not being xi) = mC2×pm

i ×(1− pi)n−m

• Therefore, we see that the average number of occurrences are:

E(X) =n C1pi(1− pi)n−1 + 2nC2p2i (1− pi)2 + 3nC3p

3i (1− pi)n−3 + · · ·+n Cnpn

i

=n

m=1

mnCmpmi (1− pi)n−m

=n

m=1

m× n!

m!(n−m)!× pi × pm−1

i (1− pi)n−m

=n

m=1

npi ×

(n− 1)!(m− 1)!(n−m)!

pm−1i (1− pi)n−m

= npi

n

m=1

n−1Cm−1pm−1i (1− pi)n−m

= np(pi + (1− pi))n−1

= npi

Therefore, we see that, on an average xi will occur npi number of times. There will certainly besome cases where the number of xi’s is not npi. The number of such sequences can be estimatedusing the standard deviation of this random variable. The standard deviation σ is defined by: σ2 =E(X2)− (E(X))2.

σ2 =n

m=1

m2nCmpmi (1− pi)n−m − (npi)2

=n

m=1

m2 n!

m!(n−m)!pm

i (1− pi)n−m

− (npi)2

=n

m=1

m

n!(m− 1)!(n−m)!

pmi (1− pi)n−m

− (npi)2

=n

m=1

n(m− 1 + 1)

(n− 1)!(m− 1)!(n−m)!

pmi (1− pi)n−m

− (npi)2

=n

m=1

n

(m− 1)

(n− 1)!(m− 1)!(n−m)!

pmi (1− pi)n−m +

(n− 1)!(m− 1)!(n−m)!

pmi (1− pi)n−m

− (npi)2

= n

n

m=1

(n− 1)!

(m− 2)!(n−m)!pm

i (1− pi)n−m

+

n

m=1

(n− 1)!

(m− 1)!(n−m)!pm

i (1− pi)n−m

− (npi)2


DRAFT COPY


= n

(n− 1)p2

i

n

m=1

(n− 2)!

(m− 2)!(n−m)!pm−2

i (1− pi)n−m

+ pi

n

m=1

(n− 1)!

(m− 1)!(n−m)!pm−1

i (1− pi)n−m

− (npi)2

= n

(n− 1)p2

i

n

m=1

n−2Cm−2pm−2i (1− pi)n−m + pi

n

m=1

n−1Cm−1pm−1i (1− pi)n−m

− (npi)2

= n(n− 1)p2

i (pi + (1− pi))n−2 + pi(pi + (1− pi))n−1− (npi)2

= np2

i (n− 1) + pi

− n2p2

i

= n2p2i − np2

i + npi − n2p2i

∴ σ2 = npi(1− pi)

Now consider the Tcheybechev’s Inequality which states that: P (|X − µ| ≤ ) ≤ σ2

2, where µ and σ

are the mean and the standard deviation values (respectively) of the probability distribution of X.Applying the Tchybechev’s inequality to this case, we get:

P (|fi(X1, X2, . . . Xn)− npi| ≥ ) ≥ npi(1− pi)2

Let k be some large number such that:

npi(1− pi)

= k, ∀pi, n. Then we have:

P (|fi(X1, X2, . . . Xn)− npi| ≥

npi(1− pi)k

) ≥ k2 (8.7)

P

fi(X1, X2, . . . Xn)− npi

npi(1− pi)

≥1k

≥ k2

P

fi(X1, X2, . . . Xn)− npi

npi(1− pi)

≤ k

≥ k2 (8.8)

The above statement (eq. 8.7) says that the probability of the frequency of xi (given by fi(X1, X2, . . . Xn))

being, within a maximum distance of the order√

n

k(where k is a large number), to npi is greater than

k2. In other words, for a large value of k, the probability of the frequency of xi in X1, X2, . . . Xn beingclose npi is large. Of course, there are some sequences for whom the frequency of xi in X1, X2, . . . Xn

exceeds npi by a number larger than√

n

k. But these sequences occur with very low probability, for a

sufficiently large k. Let us now consider the sequences with high probability, from (eq. 8.8), we seethat, for these sequences:

fi(X1, X2, . . . Xn)− npi

npi(1− pi)

≤ k (8.9)

The sequences X1, X2, . . . Xn for whom the above property holds are called Typical Sequences.They are sequences in which the difference between the actual and the expected frequencies of xi is ofthe order of

√n, which is small compared to n, for large n. Taking the condition from (eq. 8.9), we

can say:

fi(X1, X2, . . . Xn)− npi ≤ k

npi(1− pi)

fi(X1, X2, . . . Xn) ≤ k

npi(1− pi) + npi


DRAFT COPY


Now, since npi is a positive number, we can say:

k

npi(1− pi)− npi ≤ fi(X1, X2, . . . Xn) ≤ k

npi(1− pi) + npi (8.10)

In the above equation, we have found a lower as well as an upper bound for the frequency for xi, ∀i, ina typical sequence. Let us now look at the probability of obtaining a typical sequence:P(X1, X2, . . . Xn) to be a typical sequence = Px1 occurring f1(X1, X2, . . . Xn) times, x2 occurringf2(X1, X2, . . . Xn), . . . , fn occurring fn(X1, X2, . . . Xn) times. Therefore, if we denote the probabilityof (X1, X2, . . . Xn) being a typical sequence as p(X), then:

p(X) = pf1(X1,X2,...Xn)1 × pf2(X1,X2,...Xn)

2 × · · ·× pfn(X1,X2,...Xn)n

log p(X) = p1 log f1(X1, X2, . . . Xn) + p2 log f2(X1, X2, . . . Xn) + · · ·+ pn log fn(X1, X2, . . . Xn)

=

i

pi log fi(X1, X2, . . . Xn)

∴ − log p(X) = −

i

pi log fi(X1, X2, . . . Xn)

Let us now take the inequality in (eq. 8.10) and, multiply by − log pi and sum over all i, for all theterms of the inequality:

−

i

npi log pi − k

npi(1− pi) log pi

≤ −

i

(fi(X1, X2, . . . Xn) log pi) ≤

−

i

k

npi(1− pi) log pi + npi log pi

The middle term in the inequality can be obtained from (eq. 3). Expanding the first and the thirdterms:

−n

i

(pi log pi)−

i

k

npi(1− pi) log pi

≤ − log p(X) ≤

−

i

k

npi(1− pi) log pi

+

i

(npi log pi)

Let

i

k

npi(1− pi) log pi

= A

√n, for some positive constant A. Now, from (eq. 8.5), we have:

−nH(X)−A

√n≤ − log p(X) ≤ −

nH(X) + A

√n

−nH(X) + A

√n≤ log p(X) ≤ −

nH(X)−A

√n

where H(X) ≡ H(p1, p2, . . . , pn) 2. Now, we can take antilogarithms of all the terms in the aboveinequality:

2−nH(X)−A√

n ≤ p(X) ≤ 2−nH(X)+A√

n

Therefore, we see that the probability of obtaining a typical sequence is indeed related to the Shannonentropy of the random variable. Furthermore, we want to find the total number or typical sequences

of length n

=

1p(X)

, we need to find the upper and lower bounds for

1p(X)

. For this, let us take

inverse of all the terms in the above inequality:

2nH(X)−A√

n ≤ number of typical sequences of length n ≤ 2nH(X)+A√

n (8.11)

Hence, if we assume√

n is very small 3, number of typical sequences of length n ∝ nH(X). Therefore, we

see that, for a discrete random variable X (taking n values), H(X) =Number of typical sequences of length n

n.

This is another interpretation of the Shannon Entropy of the random variable X.2We have assumed that X1, X2, . . . , Xn are independent, identically distributed random variables. Hence from the definition

of Shannon entropy, H(X1) = H(X2) = · · · = H(Xn). Hence, without any loss of generality, we may call all of them as H(X).3 so small that even the exponential term ≈ 2

√n is negligible


DRAFT COPY

Part VI

CODING

109

DRAFT COPY

DRAFT COPY

Chapter 9

Classical Coding Theory

9.1 Introduction

9.1.1 Definitions

To start with, we would be needing some basic set theoretic notations that would be used throughout.Suppose A is a set, we define the following quantities:

1. A∗ =∞

r=1

Ar, and Ar = (A×A×A× · · ·×r times

A)

2. 2A = Set containing all possible subsets of A

3. #A = cardinality of set A

Having the above notations in mind, let us now look at some basic terminology that would be useful indiscussing the concept and various properties related to codes and coding theory:

1. Alphabet: An alphabet, is a set of symbols. Each symbol, called a “letter” represents one unit ofinformation. Ex. the sets 0, 1, a, b, c, etc represent alphabets.

2. String: A String is a set of symbols (from an alphabet). More formally a string s over an alphabet Γis nothing but a finite subset of Γ∗: s ⊂ Γ∗. The number of symbols (from Γ) in s is called the blocklength of s. Notice that Γ∗ is nothing but the set of all strings that can be formed using symbols in A.Ex. 001010 is a string over the alphabet 0, 1.

3. Code: A code, denoted by C is a set of strings (formed over some alphabet, called the code alphabet.This alphabet is denoted by Γ). Similar to the example for string, we see that 0, 010, 10, 01 is a codeover 0, 1. The average length of each codeword1 is called

4. Encoding: An encoding is a map between an alphabet to a set of strings (or a code, generally not overthe same alphabet.) f : A → C, where A is called the message alphabet. Ex. A = x1, x2, x3, x4 andΓ = 0, 1 where C ⊂ Γ∗ ⇒ 0, 010, 01, 10, then an encoding will map (finite) sequences (or strings)consisting of symbols in A to strings in Γ. An encoding, for instance can be:

x1 −→ 0x2 −→ 010x3 −→ 01x4 −→ 10

1that isX

c∈C(block length of c)

111

DRAFT COPY

9.1. INTRODUCTION CHAPTER 9. CLASSICAL CODING THEORY

Quite evidently, as any other function, the encoding map cannot be one-to-many, in which case therewould be many encoded sequences for the same message sequence, thereby causing an ambiguity as towhich one is the right encoded one.

5. Decoding: A decoding is also a map between a code and an alphabet, whose function is to performthe inverse of an encoding map. The decoding, given an encoding (as defined above), is a map fromC (code) to A (message alphabet). As example of a decoding map is just the reverse of the encodingmap. The decoding map also depends on the code and hence the encoding. Now, just as we had avery naive condition for the encoding map not to be one-to-many, the same condition is applicable fora decoding map too, as we want the decoding also to be unambiguous (that is a sequence in Γ∗ mustmap to only one sequence in A∗). But notice that this condition, in turn imposes additional conditionson the encoding map. Moreover, it says that the encoding map must be one-to-one (injective). Whensuch an injective encoding map exists, decoding can be done without any ambiguity.

6. Uniquely Decipherable: Let us now define the extensions of the encoding and decoding maps. Considerthe encoding map between two strings2, f : A∗ → Γ∗. This will map a message sequence (over A) toa code sequence (over Γ). Analogously, the extended decoding map would now map code sequences tomessage sequences. If f is one-to-one, the certainly our decoding would be unique, and therefore theencoding is called uniquely decipherable and the code, a uniquely decipherable code.

7. Irreducible3 codes: A code called an instantaneous code if no string in the code is a prefix of any otherstring in the code. More formally, for an instantaneous code C, we have: ∀u, v ∈ C, ∃ no w ∈ Γ∗ suchthat v = uw.

9.1.2 Notations from graphs

Let us take some set V with elements v1 . . . vn. These elements represented as points on a space andmoreover, we can connect these points (geometrically). This new connected entity is called a graph. Thepoints and the connections are called vertices and edges respectively. These edges can now be unidirectional(or directed). (Ex. Suppose there are two points vi & vj , we can have an edge connecting vi with vj but thesame edge does not connect vj with vi.) Suppose we now have some sort of comparison between elements inV and state that a vertex (corresponding to an element) that is “superior” to another is connected to it, bya directed edge. This restricted structure is now called a tree, with the most superior vertex being the root.Moreover if we now assume that there can be only two edges from a vertex, this structure is called a binarytree. For a vertex vi of a binary tree, we define: O (vi) = # vertices the root to vi We would be needing thisknowledge of a binary tree in our latter discussion.

Now that we have an encoding (f : A → C), we would expected to:

• be uniquely decipherable

• have minimum average code word length

9.1.3 Unique Decipherability

Problem: Given an encoding (mapping) f : A → C, and an encoded sequence (an element of Γ∗), is it possibleto find an unique message sequence (element of S∗) ? We need to find all the encodings that correspond to“no” instances of this question. Consider the example from (stat. 4) of (sec. 9.1.1). Let us try to find somesequence that would lead to a “no” answer for whether we can uniquely identify a message sequence to anencoded sequence. Take, for example the encoded sequence: “010”. This corresponds 3 different message

2This extension is just made using the fact that the action of the map on a string α in A∗ is nothing but the concatenated

string produced by joining the result of action of f on each symbol in α.3 or Instantaneous


DRAFT COPY

CHAPTER 9. CLASSICAL CODING THEORY 9.1. INTRODUCTION

sequences, namely: x2, x3x1, x1, x4. We can verify this by taking:

f(x2) = 010f(x3x1) = f(x3)f(x1) = (01)(0) ⇒ 010f(x3x1) = f(x1)f(x4) = (0)(10) ⇒ 010

By finding these sequences, we have stated that the code C is not uniquely decipherable. So, to classify acode as not uniquely decipherable, we need to find such ambiguous sequences in Γ∗, which correspond tomore than one sequence in A∗. In general it isn’t easy to find such ambiguous sequences, and we need someway of checking the existence of such ambiguous sequences.

To see this process of decoding, let us more closely look at how one would naively go about decoding somearbitrary sequence from Γ∗.

Aim: Given some general sequence c1c2 . . . cp ∈ Γ∗, find the splitting s1 ∈ C, s2 ∈ C, . . . sq ∈ C so that usingthe reverse of the encoding map (or the decoding map), we get the message sequences.

Intuitively: Guessing the splitting:Given a code sequence, c1c2 . . . cp ∈ Γ∗. Let S0 be the code C.

• Read characters until it forms a word in S0, and see which all words in S0 (other than itself) it mightcorrespond to. These are initial guesses. These guesses words are those that have the sequence read,as their prefix.

• Separate the suffix of all these words (suffix symbols are to be read in future) and store the suffixes inS1.

• Repeat steps 1 & 2 (using S1 instead of the second S0 in step 1) constructing sets S1, S2,etc until Si

becomes empty.

• If Sj (for some j) contains an element of S0, say ckj−1+1 . . . ckj :

– Suppose initially we read ck . . . ck1 & one of the initial guesses were ck . . . ckj . The suffix ck1+1 . . . ckj ∈S1, and so on ... we construct S2, S3,etc.

– In the process of forming Sj , we have read j codewords:ck . . . ck1 , ck1+1 . . . ck2 , . . . , ckj−2+1 . . . ckj−1

,

including the guessed ck . . . ckj . We also see that: c1 . . . ck ∈C

ck+1 . . . ck1 ∈C

ck1+1 . . . ck2ck2+1 . . . ckj−1... ... ... ... ... ... ...

ckj−1+1 . . . ckj ∈Sj

∈C

.

– If Sj has an element, say ckj−1+1 . . . ckj that is also in C, then the code sequence ck . . . ckj has twoambiguous deciphering and hence it is not uniquely decipherable.

Consider the theorem summarising the above illustration: More Formally we have, with S0 = C,

S1 = w ∈ Γ∗|u = vw, (u ∈ S0) & (v ∈ S0)

With the above base case, we define the sets Si inductively:

Si = w ∈ Γ∗|u = vw, (u ∈ S0 ∨ v ∈ Si−1) & (v ∈ S0 ∨ u ∈ Si−1)

Si = set of all suffixes of words in Si−1, whose corresponding prefixes in C. We now define the union of allthe sets Si from i = 1 to ∞ (Si can be constructed for any i, after a particular i, Si will all be empty. Soinstead to finding out what that “maximum” value of i is, we just just take an infinite union.)

S∞ =∞

i=1

Si

Theorem: (Sardinas Paterson): An encoding of S0 is uniquely decipherable iff S0

S∞ = φ.


DRAFT COPY

9.2. CLASSIFYING INSTANTANEOUS AND UNIQUELY DECIPHERABLE CODESCHAPTER 9. CLASSICAL CODING THEORY

9.2 Classifying instantaneous and uniquely decipherable codes

Let us first convince ourselves that an instantaneous code is easier to decode, as compared to a generaluniquely decipherable code. If we were to follow the naive method of decoding (as described in (sec. 9.1.3))then since no codeword is a prefix of another, all the sets Si (from i = 1 . . .∞) would be empty, therebyassuring that the code is uniquely decipherable. The method is decoding now would simply be to: read aparticular sequence (symbol by symbol, until it forms a codeword), then substitute that sequence with thecorresponding message alphabet, as given in the encoding map.

Hence we want to see if we can, without loss of generality confine ourselves to instantaneous codes. To dothis, we must show that if it is possible to have a uniquely decipherable code CUD, with N codewords oflengths n1, n2, . . . , nN respectively; then there always exists an instantaneous code with the same numberof codewords and corresponding codeword lengths4. This is our main purpose in this section, as it will nowenable us to analyse the easier class of instantaneous codes, which would also be applicable to the class ofuniquely decipherable codes. We would be achieving this purpose in 3 steps. Firstly, we arrive at somecondition which holds if there exists an instantaneous code with codeword lengths n1, n2, . . . , nN over thealphabet Γ. This condition is an inequality condition called the Kraft inequality. In the second part, wewould be extending this inequality to show that the existence of uniquely decipherable codes also impliesthe same condition as above. The condition is again the same as before, but the inequality here goes by thename of Kraft-MacMillan inequality. Finally in the third part, we show the converse of the Kraft inequalitywhich would imply that the inequality condition assures the existence of instantaneous codes. Therefore,at the end we would have shown that if there exists an uniquely decipherable code (with codeword lengthsn1, n2, . . . , nN ) the a particular condition is satisfied, which in turn implies the existence of an instantaneouscode with the same corresponding codeword lengths (n1, n2, . . . , nN ). After this, we would consider onlyinstantaneous codes.

9.2.1 Part 1: Kraft’s Inequality

We now ask the question whether it is possible to construct an instantaneous encoding f : A → C such thatevery element ai ∈ A, has the property |ai| = ni. So we need to ask whether ∃ an instantaneous code withcodewords of block length n where the the ith codeword has a block length ni.

Theorem: f is an instantaneous encoding of A to C (over Γ) with codeword lengths ni, i ∈ 1 . . . |A|, iff

N

i=1

|Γ|−ni ≤ 1 (9.1)

Proof Idea: The proof that we would be presenting here is by constructing an instantaneous code. To startwith, we take all sequences in Γ∗ whose block length is at most n. We now arrange these sequences in theform of a tree, with the convention that if some sequence of length l, matches another of length l + 1, atall but the last position: (l + 1), then the two vertices are connected, where the vertex corresponding to thelarger sequence is a child of the vertex corresponding to the smaller. After constructing, we would have alarge tree (not necessarily a binary tree) with |Γ|nN vertices. In this tree, each level would have a codewordof a successively larger length.

This is the gadget that we employ. Now to construct N codewords of length n1, . . . , nN , it may naively seemthat we can simply choose all the codewords corresponding to vertices in a single path from the root to oneof the leaves. But notice that this method would certainly produce sequences of desired lengths, but eachsequence would be the prefix of the following sequence. Collection of these, by definition would not give aninstantaneous code. hence no two sequences chosen for the instantaneous should have their correspondingvertices connected by a path. A better method would be to choose the code sequences such that from a path,

4Needless to say, the converse of this assumption is always true.


DRAFT COPY

CHAPTER 9. CLASSICAL CODING THEORY9.2. CLASSIFYING INSTANTANEOUS AND UNIQUELY DECIPHERABLE CODES

we pick only one sequence. In other words, once we pick a vertex, we delete its subtree. After finding allsuch codewords that would constitute the instantaneous code, we would have deleted a couple of vertices.Hence the total number of vertices is certainly less than |Γ|nN (what it was before constructing the code).On imposing this simple inequality, we arrive with the statement of the theorem (eq. 9.1).

Proof: Define

Γ = 0, 1, . . . |Γ|− 1n1 ≤ n2 ≤ n3 ≤ · · · ≤ nN

Consider a method of generating these instantaneous codes:Consider a tree of size k and order Γ, which is a set of vertices Vi such that, all points:

wiwj ∈N

i=1

Γni | |wi| = (l + 1), |wj | = l;wi, wjare equal in first l positions

are connected by an edge. From this graph, we can say that:

• For O(Vi) < O(Vj), If ∃ a path connecting Vi (corresponding to code word wj) and Vi (correspondingto code word wi), then wj is a prefix of wi.

• So, while constructing the instantaneous encoding, if we choose wi to be a code sequence, then we cannotchoose any wj , if ∃ a path connecting Vj (corresponding to code word wj) and Vi (corresponding tocode word wi).

• Therefore, if we want a code word wi of length ni, we need to choose a vertex Vi such that O(Vi) = ni,and delete its subtree.

• Suppose the tree has |Γ|nN vertices, then for every code word wi, of length ni, a subtree of size |Γ|nN−ni

will be eliminated. Therefore, after all the code words are constructed or pickedN

i=1

|Γ|nN−nivertices

would have been deleted.

• Trivially, the number of vertices removed from the tree ≤ total number of vertices in the tree.N

i=1

|Γ|nN−ni≤ |Γ|nN

∴N

i=1

|Γ|−ni ≤ 1 (9.2)

9.2.2 Part 2: Macmillan’s Inequality

Since, every instantaneous code is also a uniquely decipherable, Kraft’s Inequality is a sufficient conditionfor showing unique decipherability. We can show that it is also a necessary condition.

Proof Idea: This proof is beased on the assumption that if K =N

i=1

|Γ|−ni is less than 1, then Kn would decay

to 1, as n → ∞. We start by boosting this quantity by a power of n, put in the assumption for uniquelydecipherable code (eq. 9.6) and take its limit as n →∞. We see that this limit gives 1, thereby stating thatK ≤ 1.

Proof. Proof:We have:

N

i=1

|Γ|−ni = |Γ|−1 + |Γ|−2 + · · ·+ |Γ|−N


DRAFT COPY


Let us now group all the |Γ|−ni where ni = k (all the code words of equal length) and let there be m groups,each containing αk elements.

∴N

i=1

|Γ|−ni =m

i=1

αk |Γ|−k (9.3)

Therefore, we now have to find a lower bound form

i=1

αk |Γ|−k. Consider:

m

i=1

αk |Γ|−k

udef=

αi1 |Γ|

−i1 + αi2 |Γ|−i2 + · · ·+ αim |Γ|−im

u

In the RHS of the above equation, each term of the equation will be of the form (αi1αi2 . . . αiu) αi1 |Γ|−(i1+i2+···+iu).

m

i=1

αk |Γ|−k

u

=

i1,i2,...,iu1≤ij≤m

(αi1αi2 . . . αiu) |Γ|−(i1+i2+···+iu)

Now we see that m ≤ (i1 + i2 + · · ·+ iu) ≤ mu. We now collect all the ij ’s such that: (i1 + i2 + · · ·+ iu) = k,for some values of k. (Grouping all ij ’s that give the same sum: k).

m

i=1

αk |Γ|−k

u

=mu

k=m

i1+i2+···+iu=k

(αi1αi2 . . . αiu) |Γ|−k

=mu

k=m

|Γ|−k

i1+i2+···+iu=k

(αi1αi2 . . . αiu)

In the RHS of the above equation, we can put:

Nk =

i1+i2+···+iu=k

(αi1αi2 . . . αiu) (9.4)

On substituting the above definition, we have:

m

i=1

αk |Γ|−k

u

=mu

k=m

Nk |Γ|−k

(9.5)

Let us look at the structure of the term Nk:

• From the definition of αk in (eq. 9.3), we see that αk = number of code words of length k.

• Therefore, from the RHS of (eq. 9.4): (αi1αi2 . . . αiu) = (# of code words of length i1)×(# of code words of length i2)×· · ·× (# of code words of length iu).

• This is the same as # code sequences which have: (code of length i1) followed by (code of length i2)followed by . . . followed by (code of length iu). But since we are given that i1 + i2 + · · · + iu = k,we can say that these code sequences will have length k. Hence, Nk = # code sequences of length khaving code words of lengths i1, i2, . . . , iu such that ∀a < b, code sequence of length ia is before the codeword of length ib. Let Nk = # sequences of length k, with codewords i1, i2 . . . iu in any order. Clearly,Nk ≥ Nk. Now each sequence in Nk is uniquely decipherable and hence can correspond to only onesplit up of codewords5. Consider now a even general quantity, Γk = # of all sequences over Γ of lengthk. Now Nk ≤ Γk, as the latter contains sequences that needn’t even correspond to any meaningfuldecoding.

5Note that all sequences in Nk are such that the codewords in it are present in a specific order, whereas the sequences inNk are such that the codewords can be in any order, but still the codewords in the sequence must be the same as the code isuniquely decipherable.


DRAFT COPY


• Hence, Nk ≤ than the total # of code words of length k. Therefore

Nk ≤ |Γ|k (9.6)

Using the result of (eq. 9.6) in (eq. 9.5), we have:

m

i=1

αk |Γ|−k

u

≤mu

k=m

|Γ|k |Γ|−k

≤ um−m + 1

From the above inequality we can say that:

m

i=1

αk |Γ|−k

u

≤ um

∴m

i=1

αk |Γ|−k ≤ u1u m

1u

In the above expression, u is arbitrary and hence we can remove u by taking the limit of the above expressionas u →∞. The LHS will remain unchanged.

m

i=1

αk |Γ|−k ≤ limu→∞

u

1u m

1u

(9.7)

The limit limu→∞

m1u ⇒ mlimu→∞( 1

u ) ⇒ m0 ⇒ 1. The other limit:

limu→∞

u1u = elimu→∞( log u

u )

We know that from L-Hospital’s rule, limu→∞

log u

u

= lim

u→∞

d

du (log u)1

⇒ lim

u→∞

1u

⇒ 0. Hence,

limu→∞

u1u = 1

Using the result of the above limit evaluations in (eq. 9.7), we get:

m

i=1

αk |Γ|−k ≤ 1

thus, from (eq. 9.3), we have proved the Macmillan Inequality.

9.2.3 Part 3: Converse of Kraft’s inequality

We can now prove the converse of the Kraft’s inequality, which states that: ∃ an uniquely decipherable code,

with codewords of length n1, n2, . . . nN, over an alphabet Γ iffN

i=1

|Γ|ni ≤ 1.

Let K =N

i=1

|Γ|ni . It suffices to show that, given the condition K ≤ 1, we can construct an instantaneous code6

with codewords of length n1, n2, . . . nN. Let us use the same gadget (of the tree structure with codewords6We saw that every instantaneous code is also uniquely decipherable.


DRAFT COPY


innN

i=1

Γni corresponding to vertices) as in the proof of Kraft’s inequality for instantaneous codes. Now if

the same procedure of picking a vertex at some depth ni (and hence a corresponding codeword with lengthni) and deleting its subtree (of size ΓnN−ni) would work, then we are done as we can choose the vertices atlevels nN , nN−1, . . . , n1 to get various codewords. The only concern is that there should not be a case wherethere are no more vertices left in the tree to pick (which would imply that the corresponding instantaneouscode cannot be corrected). Here is where we need to use the statement of the kraft’s inequality (K ≤ 1) toensure that after every step (after every codeword is added to the instantaneous code), there are still somevertices left in the tree. Notice that after ni vertices have been picked (implies that all vertices at levels nj

for j < i are also picked), total remaining vertices in the tree is: ΓnN −i

j=1

ΓnN−nj ⇒ ΓnN

1−i

j=1

Γ−nj

.

Now since the codewords collected upto length ni form an instantaneous code, the kraft’s equality holds in

their case, thereby causingi

j=1

Γ−nj ≤ 1 ⇒

1−i

j=1

Γ−nj

≥ 0. Hence putting this in the expression for

the remaining vertices in the tree, we get: Remaining vertices after picking i codewords is ≥ 0 for any i.Therefore we see that we can always pick codewords upto length nN .

Hence we have shown the converse of the Kraft-MacMillian inequality, there by finally showing that wheneverthere exists an uniquely decipherable code, there certainly exists an instantaneous code. Therefore, in thenext section we show a lower bound on the average codeword length, but this time only for instantaneouscodes, without any loss of generality.

9.2.4 Bound on codeword length - Shannon’s Noiseless Coding Theorem[26]

We have already shown that existence of a uniquely decipherable code (with given alphabet size) impliesthe existence of an instantaneous code with the same alphabet size. Therefore, to show that there exists nouniquely decipherable code (with certain properties), it suffices to show the non-existence of instantaneouscode (over an alphabet of the same size) with the same property. Moreover, we would now see that the caseof instantaneous codes is easier to deal with as compared to general uniquely decipherable codes.

In the first few sections, we had remarked that a code is expected to have minimum average codeword length.Here we would be exploring the a lower bound for the minimum average codeword length. Quite evidently,this quantity cannot be as trivial as 0 or 1 or even N (the alphabet size), due to the requirement of uniquedecipherability. Again, it is this property that we would make use of, in the form of kraft’s inequality.

Define:

l(f) =

x∈A

l(f(x))µ(x)

the average code word length of an encoding f , and

L(µ) = min

l(f)|f : A → B∗, is an uniquely decipherable code⇒ min

x∈A

m(x)µ(x)|f : A → B∗,

x∈A

b−m(x) ≤ 1

The lower bound on the minimum average codeword length is given by the Shannon’s Noiseless Coding The-orem:

Theorem: Let (A, µ) be an EIS. Let b = #B, where B is the encoding alphabet. Then:

−

x∈A µ(x) log µ(x)log b

≤ L(µ) ≤−

x∈A µ(x) log µ(x)

log b+ 1 (9.8)


DRAFT COPY


Proof. Define:

ν(x) =b−m(x)

y∈A b−m(y)

(9.9)

T =

y∈A

b−m(y) (9.10)

Therefore, from (eq. 9.9):

log ν(x) = −m(x) log b− log T

∴ m(x) log b = − log ν(x)− log T

⇒ m(x) =− log ν(x)− log T

log b

∴

x∈A

m(x) =

x∈A

− log ν(x)− log T

log b

⇒

x∈A

m(x) =−

x∈A log ν(x)−

x∈A log T

log b

∴

x∈A

m(x)µ(x) =−

x∈A µ(x) log ν(x)− log T

x∈A µ(x)

log b

(9.11)

We have that:

x∈A

µ(x) = 1.

(log b)

x∈A

m(x)µ(x) = −

x∈A

µ(x) log ν(x)

⇒

x∈A

µ(x) log ν(x) = −

x∈A

µ(x) log

ν(x)× µ(x)µ(x)

x∈A


x∈A

µ(x) log µ(x)−

x∈A

µ(x) log

ν(x)µ(x)

⇒

x∈A


x∈A

µ(x) log µ(x)− log

x∈A

ν(x)µ(x)

µ(x)

(9.12)

We have:

x∈A

ν(x)µ(x)

µ(x)

=

x∈A

ν(x)µ(x)

11

µ(x) . The is just the geometric mean of µ(x) number of quantities.

Therefore, using the property that the geometric mean of n quantities is less than or equal to their arithmeticmean, we have:

x∈A

ν(x)µ(x)

11

µ(x) ≤

x∈A

ν(x)µ(x)

1µ(x)

∴

x∈A

ν(x)µ(x)

µ(x)

≤

x∈A

ν(x) (9.13)

From (eq. 9.9), (eq. 9.10) and the unique decipherability of the code, we have:

x∈A

ν(x) =

x∈A b−m(x)

T,

where the numerator is ≤ 1. Hence the RHS of (eq. 9.13) ≤ 1. Therefore, (eq. 9.12) now becomes:

x∈A

µ(x) log ν(x) ≥ −

x∈A

µ(x) log µ(x) (9.14)


DRAFT COPY


We see that, from the unique decipherability of the code, and from (eq. 9.10) that T ≤ 1. Therefore1T≥ 1 ⇒ − log T ≥ 0 ⇒ log T ≤ 0. Hence, in (eq. 9.11), we see that:

x∈A

µ(x)ν(x) ≥

x∈A µ(x) log ν(x)log b

(9.15)

Using the equations (eq. 9.14) and (eq. 9.15), we have:

−

x∈A

µ(x) log µ(x) ≤

x∈A

µ(x)ν(x)

log b

−


≤

x∈A

µ(x)ν(x) (9.16)

Therefore, in the above statement, we have proved that the lower bound on the average length of the code

is given by: −


. We now have to show the upper bound. For this, let us choose a specific

instance of m(x) such that:

m(x)− 1 <− log µ(x)

log b≤ m(x) , ∀x ∈ A (9.17)

The above assumption can be shown to be valid by observing that the unique decipherability of the coderemains preserved. That is:

From (eq. 9.17):

x∈A

b−m(x) ≤

x∈A

blog µ(x)

log b

∴

x∈A

b−m(x) ≤

x∈A

blogb µ(x)

⇒

x∈A

b−m(x) ≤

x∈A

µ(x)

∴

x∈A

b−m(x) ≤ 1

Hence, we see that the unique decipherability of the code is preserved. Now, the quantity

x∈A

m(x)µ(x)

(the average code word length) can be given an upper bound: From (eq. 9.17): m(x) <

1− log µ(x)

log b

.

Therefore, we have:

x∈A

m(x)µ(x) <

x∈A

1− log µ(x)

log b

µ(x)

On expanding the above RHS and using the property that:

x∈A

µ(x) = 1, we have:

x∈A

m(x)µ(x) <


log b+ 1 (9.18)

Therefore, in the above expression we have given an upper bound for the average code word length. Now,taking expressions (eq. 9.16) and (eq. 9.18), we see that we obtain the expression in the theorem:

−


≤ L(µ) ≤−


log b+ 1

thus proving the theorem.


DRAFT COPY


In the theorem above, if we put the condition that the encoding is done over the binary alphabet: B =0, 1⇒ b = 2, the lower bound statement gives:

−

x∈A

µ(x) log µ(x) < L(µ)

⇒ H(x) < L(µ) (9.19)

Where, H(x) is the binary entropy associated with the random variable X, or the entropy of the elementaryinformation source (A, µ(x)). The above statement (eq. 9.19) is the famous Shannon’s Noiseless CodingTheorem for binary codes.

Examples: Codes satisfying the Shannon Bound

Consider a Random Variable X, which takes values x1, x2, x3, x4, x5 with corresponding probabilities:12,14,18,

116

,116

. For reasons stated earlier, We would now use instantaneous and variable length (binary7)

codes to encode the values of X which is the information source and show that in each case the shannonbound is satisfied.

H(X) = − [pi lg pi + (1− pi) lg(1− pi)]

⇒ −12

lg12

+14

lg14

+18

lg18

+116

lg116

+116

lg116

+12

lg12

+34

lg34

+78

lg78

+1516

lg1516

+1516

lg1516

∴ H(X) = 2.7794

1. Instantaneous Encoding:

x1 −→ 000x2 −→ 001x3 −→ 010x4 −→ 011x5 −→ 100

The average codeword length = l(X) = 3× 116

+ 3× 116

+ 3× 18

+ 3× 14

+ 3× 12

= 3. Also this is theminimum possible with instantaneous codes, as there are 5 values of the random variable to be encodedand only 4 codewords with length 2. Hence: H(X) ≤ l(X).

2. Variable Length Codes:

x1 −→ 00x2 −→ 010x3 −→ 0110x4 −→ 01110x5 −→ 11111

The average codeword length = 5× 116

+5× 116

+4× 18

+3× 14

+2× 12

= 2.875. Therefore, the averagecodeword length exceeds the shannon entropy, thereby satisfying the Shannon Bound.

7One can show the bound for any q-ary code, appropriately using the expression for q-ary entropy function: Hq(X) =x logq(q − 1)− x logq x(1− x) logq(1− x).


DRAFT COPY

9.3. ERROR CORRECTING CODES CHAPTER 9. CLASSICAL CODING THEORY

9.3 Error Correcting Codes

We see that a code is defined as a set of strings or sequences, over the encoding alphabet. We now identifya subset of this set, which follows some additional properties as Error Correcting codes.

9.3.1 Definitions

Let C ⊆ Γ∗ be a code, defined over the code alphabet Γ = x1, x2 . . . xn. We now need to look at someterminology that would be useful in characterizing the error correcting properties of C. Before moving on toerror correcting codes, note that from now on we only look at instantaneous codes. Hence all the codewordsin the code have equal block length.

1. Hamming distance: Hamming distance between two sequences8 s1 and s2, denoted by d (s1, s2) is thenumber of positions at which s1 and s2 differ. Properties of Hamming distance:

• Positive: d (s1, s2) > 0, ∀ (s1 = s2)

• Symmetry: d (s1, s2) = d (s2, s1)

• Triangle Inequality: d (s1, s2) ≥ d (s1, s3) + d (s3, s2)

2. Code parameters: For the purpose of error correction, codes are labelled using the following fourparameters.

• Block length (n): The block length of a code C is the block length of the codewords in C. Wedefined C ⊆ Γ∗. Hence ∃n ≥ 0 such that C ⊆ Γn. This n is nothing but the block length of C.

• Alphabet size: q = |Γ|. This is just the number of elements in the alphabet. In most cases weconsider q = 2: (binary) codes.

• code dimension k: This is related to the number of codewords in the code. A code C over Γ (with|Γ| = q) would have qk codewords. Hence, k = log|Γ| |C|.

• Distance of a code: d (C) is defined as the minimum hamming distance between two codewords inC. Formally, d (C) = min(si,sj)∈C⊗2d (si, sj).

3. Error operation: An error operation is a function that translates a q-ary sequence, by a number e. Acommunication channel can be modelled as an error operation. Formally, an error operation η : Γ → Γis such that: η (c) = c⊕ e, where c ∈ C, e ∈ Γ and ⊕ denotes addition molulo q. Here η is such that itsaction on a the codeword translates only a single symbol of the codeword by an amount e. A t-erroroperation is nothing but application of η t-times, in a sequential manner. Hence the final, erroneous

sequence can be represented as:

η η · · · η t times

(c).

4. Error correction operation: Intutively, the process of recovering from an error consists of detecting theerror followed correction. If a codeword c ∈ C is transmitted across an (error inducing) communicationchannel, what we receive at the other end c (formally, this is equal to η(c)) need not necessarily be c.As we require the received and transmitted sequences to match, the only remedy is to construct anotheroperation (function) ξ such that ξ (η(c)) = c. Such an operation is called an error correction operation.Extending this concept, suppose c is transmitted across t identical channels, then the received sequence

would be


(c). Now if ξ is such that ξ


(c)

= c, then ξ is called a t error

correction operation. Before correcting an error, it is worthwhile to check the existence of an error 9.8The two sequences, if not of equal length, then the difference in their lengths simply adds to the hamming distance.9We are making the assuption that if an error has occurred, the resulting sequence is not part of the original code. Hence if

c ∈ C, then η (c) = (c + e) /∈ C.


DRAFT COPY

CHAPTER 9. CLASSICAL CODING THEORY 9.3. ERROR CORRECTING CODES

Hence, we need some function χ such that:

χ(x) =

0 if η (c) ∈ C1 if η (c) /∈ C

. Such a function, as described above is called an Error detection (or syndrome

measuring) function.

5. t-Error Correcting code: C is a t-Error Corretecting if ∃ error correcting functions ∀c ∈ C.

6. Error detecting code: C is an error detetcting code if ∃ an error detecting function ∀c ∈ C.

Having defined these notions of error correction and correction, we now need to see how they are related toa given code C. Before that, let us note a naive condition that ensures the existence of an error correctionoperation for any error on a classical code. Notice that every codeword c ∈ C would have sequences associatedto it, which are the result of t-errors occurring on c. The code must be such that ∀c1, c2 ∈ C no error sequencescorresponding to c1 must be equal to any error sequences corresponding to c2.

But as of now it is not clear how a code would “ensure” that the error sequences of no two codewords math.Recall that the error is nothing but a translation of the binary sequence. To ensure that the erroneoussequence does not match, we must ensure that the translated sequences corresponding to c1 & c2 do notmatch. If C is a t-error correcting code, then we must ensure that after t (or lesser) translations (single bittranslations) of any two codewords, we do not get the same sequence. It now suffices to ensure that no twocodewords in C must be separated by 2t (or lesser) single bit translations. in other words, we are saying thatany two codewords must vary at a minimum of 2t + 1 places for a t-error correcting code. This implies thatthe minimum (hamming) distance between two codewords in C must be > 2t. For a more presice treatment,let us consider the next section, containing the distance bound.

9.3.2 Code parameters for a good error correcting code

A code is now denoted by its n, k, d and q values, and is called an (n, k, d)q code. Note that n is thedimesionality of the vector space spanned by all vectors correspodning to n length sequences over Fq. Theseinclude the vectors in the vector space C (of dimensionality k), and those not in C as well. Therefore, k ≤ nand as a result we get R ≤ 1. We see that in order to maximise the rate of information transmission, wemust minimise n and maximise k10. The former is lower-bounded by the shannon entropy of the source, andthe latter is upper-bounded by n itself. Hence, we require that k be as close to n. At the moment, we donot worry about the value of q (as we consider only q = 2), though it too has some non-trivial optimumvalue.

9.3.3 Bound on the code distance - The Distance Bound

The next non-trivial bound is on the parameter d. Before getting a bound on d, we first need to look athow messages are decoded, or how an error is corrected. Let C = wi and wX be the sequence transmit-ted across some channel that performs t error operations, resulting in v bein received. Hence, the distancebetween wX and v is ≤ t. Let us find a naive algorithm for error correction,i,e; for v to be decoded (orcorrected) back to wX , based on the only measure of comparison we know: the hamming distance (betweenthe received and any of the known code words). This naive algorithm will simply choose wj ∈ C whichis closest to v than all other wi ∈ C (i = j), to be the original sequnece that was transmitted. For thisalgorithm to give the correct result: wj = wX , means that no other sequence must be as close or closerto v (even if maximum possible error has occurred), than wX . For this to occur, we need to maximise thedistance between the code words, since if the distance between wX and another codeword wX is large, thena large number of errors (with lesser probability) would need to occur, in order to make v close to wX .If the distance between he codewords is large, then the number of codewords (or the message length: k)reduces accordingly, which is again undesirable, as we would like to maximise k as well. Therefore d could

10It is as though n alphabets are given to us to construct a code and we use only k of them, or equivalently, we are given n

basis vectors of of which we use only k of them to construct the code. In both cases we are reducing the ability to correct moreerrors, as the extra alphabets could be used to detect errors (like parity bits).


DRAFT COPY


be kept such that it is just sufficient for correcting t errors. (C then becomes a t error correcting code.) Ageometric picture of the same idea can be see in (fig. 9.1). We now see that there is a lower bound ford, related to the number of errors to be corrected. To make this precise, we have the following theorem:

Let C be a t - error correcting code, with distance d (C) = min(ω,ω)∈C⊗2d (ω,ω). Then, d (C) ≥ 2t + 1.

Proof: [15][25]We show the necessity and sufficiency of the above condition.

• We can start by showing the sufficiency condition.Claim: Given C is a t error correcting code, d (C) ≥ 2t + 1.Proof: Let ω ∈ C, ω ∈ C be codewords, transmitted across an information channel (that induces at moste errors) resulting in the sequences v, v respectively, being received. Hence the (hamming) distancebetween codewords in the pairs (v, w) and (v, w) would be at most e: d (ω, v) ≤ e, d (ω, v) ≤ e.

For an error to be corrected, only v is to be deciphered as ω (using the minimum distance decodingprinciple). We require that no other sequence, except v, is closer to ω than v. Hence d (ω, v) < d (ω, v)∀ received sequences (v = v). 11 Hence, for an error to be corrected: v = v ⇒ d (v, v) > 0. Thegeometric picture can be seen as:

Figure 9.1: Figure showing the geometric interprettation of the Hamming distances between the code wordsand their associated erroroneous sequences. The band centered around a code word wi contains all words rj ,such that if rj is received, it is decoded as wj . On transmission of w, the received sequence must be withinthe band centered around w. It’s distance from w can be at most t, and it’s distance from w must be greaterthan t. As a result, the distance between w and w must be greater than 2e.

We now note that:

d (ω,ω) = d (ω, v) + d (v, v) + d (v, ω)⇒ d (ω,ω) = e + d (v, v) + e

Now removing the term d (v, v), would make the above relation an inequality:

d (ω,ω) > e + e

∴ d (ω,ω) ≥ 2e + 1

Hence, we have shown that the minimum distance of a t error correcting code is 2t+1, thereby provingsufficiency.

11There are two ways to see the error correction condition. The other one would be: For an error to be corrected, v is to bedeciphered as ω (using the minimum distance decoding principle). Therefore we require that no other codeword, except ω, iscloser to v than ω. Hence, d (ω, v) < d

`ω, v

´∀

`ω = ω

´∈ C. After this, the same analysis follows.


DRAFT COPY


• We now need to show necessity of the above condition.Claim: Given d (C) = 2t + 1, C is a t error correcting code.Proof: Let d (C) = 2t + 1; ω,ω be codewords ∈ C and v be the received sequence on transmission of ω.We now give a proof by providing a contradiction to the definition of d (C). Suppose C is not a t errorcorrecting code, then we have the relation (from the proof of sufficiency):

d (ω, v) < d (ω, v) ∀ (ω = ω) ∈ C (9.20)

By the triangle inequality, we have:

d (ω,ω) ≤ d (ω, v) + d (v, ω)using (eq. 9.20) ≤ 2d (ω, v)

≤ 2t

≤ d (C)− 1

Using the definition of the distance, we see that it is the minimum of the set of all hamming distancesbetween every two codewords of the code. But the above equation says that ∃ωi ∈ C, ωj ∈ C, withωi = ω,ωj = ω, such that d (ω,ω) < d (C). This is a clear contradiction, as it says ∃ an element in aset that is lesser than the minimum value for the set. Hence, we see that the assumption that C is nota t error correcting code is faulty. Therefore, we have shown that C is indeed a t error correcting code,thereby proving necessity of the above condition.

Therefore, we have shown the distance bound for classical codes.

9.3.4 Bound on the number of codewords

The Singleton Bound [9]

We can interpret the above theorem as: distance between any two vectors ∈ C ≥ (2t + 1). Due to this thenumber of codewords in C would decrease, and hence |C| would have an upperbound. Moreover, we first showthat |C| is related to the distance of the code by a very simple bound.

Theorem: If C is a (n, k, d)2 code, then n− k ≥ d− 1.

Proof: For any code with distance d, all codewords are of length > d−1. Hence, ∀ (u = u1 . . . ui1 . . . uid . . . un) ,(v = v1 . . . vi1 . . . vid . . . vn) ∈ C, ∃ii . . . id such that uik = vik ∀k ∈ ii . . . id.Now let C = u1 . . . ui1uid+1 . . . un|u1 . . . ui1 . . . uid . . . un ∈ C. Clearly12, |C| = |C|⇒ 2k. Also |w| ≤ n−d−1∀w ∈ C. Hence, # codewords in C cannot exceed # binary sequences of length n− d− 1, which is 2n−d−1.Hence, we have:

2k ≤ 2n−d−1

k ≤ n− d− 1∴ n− k ≥ d− 1

Thereby proving the theorem.

The Hamming Bound [25]

Now we consider the case of error correcting codes. The below theorem give essentially a stronger boundthan the previous case. To be more precise, consider the following theorem:

12As we have not removed any codeword from C while defining C, # codewords would not change. Note that however theremay be many codewords in C that are identical (there would be 2d identical codewords).


DRAFT COPY


Theorem: If C is a (n, k, d)2 t-error correcting code, then |C| ≤ 2n

ti=0

ni

Proof: We need to show the necessity and the sufficiency of the above condition.

• We can start by showing the sufficiency of the above condition.

Claim: Given C is a (n, k, d)2 t-error correcting code, |C| ≤ 2n

ti=0

ni

Proof:Consider (fig. 9.1). As, each band represents a distinct code word ∈ C, there are |C| bands in total. Thekey assuption made here is the following: the only13 type of error that can occur to any codeword is anaddition modulo q where (that is nothing but a bit flip for the binary case) q = |Γ| and here we considerq = 2. This error does not change the number of symbols in a codeword. Hence the erroneous sequence∈ Γn, and this is true for every erroneous sequence. Since the total number of (binary) sequnces oflength n is 2n, the total number of sequences in all the bands out together, cannot exceed this sum.We see that:

total # error sequences = (# error sequences for each codeword)× (#codewords) (9.21)

Notice that

# error sequences for a single codeword = # sequences over that differ from a given sequence at ≤ t positions

⇒t

m=0

(# sequences that differ from a given sequence, at exactly m positions)

(9.22)

Suppose the given sequence is: u = u1u2 . . . uk1 . . . ukm . . . un. Any other sequence which differs fromu at excatly m locations, would be: v = u1u2 . . . vk1 . . . vkm−1 . . . un with ui = vi,∀i ∈ k1 . . . km−1.Since the value of the ui’s are fixed, so are those of vi (as we are considering only binary codes andthe error is only of the bit flip type: if ui = 0, then vi is fixed to be 1. This is however not the case14

for q > 2.). The only freedom is that the elements of the sequence u1 . . . uk1−1 and ukm . . . un canbe positioned anywhere in u. As there are n − m positions in u in which we would like to place m

digits, # possible ways =

nn−m

(This is equal to

nn

as permuting the n−m ui’s is the same as

permuting the m vi’s.). Since each new position would correspond to a sequence (∈ Γn) different from

u, # different positions = # sequences that differ from u (in exactly m locations) ⇒

nm

. Using (eq.

9.22), we have: total # error sequences for a single codeword =n

m=0

nm

, and from (eq. 9.21): total

error sequences = |C|×t

m=0

nm

. This, we said would be ≤ 2n. Hence, we have:

|C|×t

m=0

nm

≤ 2n

∴ |C| ≤ 2n

ti=0

ni

13We will have to edit this assumption while taking up this bound for the quantum codes.14NOTE: We latter show that a q−ary quantum code is equivalent to a 2q−ary classical code. Therefore clearly, this assumption

does not work for binary quantum codes. This is also reflected upon by the fact that there are more than one type of errors (bitand phase flip) that can occur on a single binary quantum codeword.


DRAFT COPY


Hence we have shown the upperbound on |C|, thereby proving sufficiency.

• We now need to show the necessity of the above condition.

Claim: Given |C| ≤ 2n

ti=0

ni

, C is a t-error correcting code.

Proof: Before going to the proof, we can look at the physical interpretation of the claim. The claimsays that if the number of codewords are bounded by a certain number, then the code becomes a t-errorcorrecting code. But this need to be true, as all the codewords can be close (closer than the minimumdistance for a t-error correcting code) together such that C still satisfies the above cardinality bound.Therefore, we see that the Hamming Bound is not a necessary condition for a code to be a t-correctingcode.

The Gilbert Varshamov Bound [16][24]

In the previous theorem (Hamming Bound), we saw that a for t error correcting code, |C| cannot be arbi-trarily close to 2n. Hence this seems like a bound on the error-correcting capacity of a code: the farther |C|is from

2n

ti=0

ni

, the lesser is the efficiency (more technically, the Rate) of the error correcting code. But

all codes cannot be perfect. We would like to see there is some limit on |C| that can make C a ”good” errorcorrecting code, and if so, we also need to see if there are a ”large” number of them.

Theorem: If C is a (n, k, d)q t-error correcting code, then |C| ≥ qn

d−1i=0

ni

(q − 1)i

and it has a rate R, given by:

R ≥ 1− hq (p)−

with = o(1).

Proof:and for binary codes, we have: |C| ≥ 2n

d−1i=0

ni

.

We will try to prove the general statement for q-ary codes. Before that, consider some definitions:

d−1

i=0

ni

(q − 1)i = Vq (n, d− 1) (9.23)

Hq(p) = p logq (q − 1)− p logq p− (1− p) logq (1− p) (9.24)

Stirling’s Approximation: m! =√

2πmm

e

m(1 + o(1)) (9.25)

o(1) denotes a quantity asymptotically smaller than a constant.Ω(n) denotes a quantity that is asymptotically equal to n.

Note that the statement of the Hamming Bound says that |C|t

i=0

ni

(q − 1)i ≤ qn, with the euqality holding

only in the case of perfect codes. To establish an inequality in the case of perfect codes too, we simply adda positive quantity to the LHS of the Hamming bound statement which gives:

|C|2t

i=0

ni

(q − 1)i ≥ qn


DRAFT COPY


In the case of perfect t-error correcting codes, we have the condition d = 2t + 1.

|C|d−1

i=0

ni

(q − 1)i ≥ qn

|C| ≥ qn

d−1i=0

ni

(q − 1)i

|C| ≥ qn

Vq (n, d− 1)

thereby showing the Gilbert Vasharmov Bound. Hence we have showed that there ”exists” good codes.Certainly this does not seem enough as we need to show that there are a significant number of ”good” codes.For this, we need to see the probability of this above bound being satisfied which closely depends on thedistace of the code, which in turn depends on the number of errors that occur on transmission of a singlecodeword (of block length n). Firstly we assume that the channel that causes error is a binary symmetricchannel 15. We try to see the upper and lower bound on the number of errors induced by this channel ona codeword of length n (this is Vq (n, d)) for arbitrarily large n, but fixed probability p of error occurring.Hence the average # errors would be np, therby causing each codeword to be np distance away from eachother. In otherwords, the average distance of the code would be np. Hence, we now try to find upper andlower bounds for Vq (n, np).From the properties of the Entropy function as defined in (eq. 9.24) we see that it is monotonic and one-to-one

in the regoin p ∈0, 1− 1

q

. Consider the following assumption:

Assumption: for q ≥ 2, p ∈0, 1− 1

q

we have:

Vq (n, pn) ≤ qhq(p)n

Justification: Trivially, it suffices to show:Vq (n, pn)qhq(p)n

≤ 1. Starting with the LHS, using (eq. 9.24) and (eq.

9.23):

=

npi=0

ni

(q − 1)i

q[p logq(q−1)−p logq p−(1−p) logq(1−p)]n

=

npi=0

ni

(q − 1)i

(q − 1)np p−np (1− p)−(1−p)n

=np

i=0

ni

(q − 1)i (q − 1)−np pnp (1− p)n(1−p)

=np

i=0

ni

(q − 1)i (1− p)n

p

(q − 1) (1− p)

np

In the range where hq(p) is monotonic we see that p ≤ 1 − 1q. Hence we have: 1 − p ≤ 1 − q. Using this:

p

q − 1≤ 1

qwhich using the second inequality gives

p

q − 1≤ 1 − p. Using this in the above equation, we see

thatp

(q − 1) (1− p)≤ 1. Therefore ∀i ≤ np,

p

(q − 1) (1− p)

np

≤

p

(q − 1) (1− p)

i

. On putting this,

15This channel causes an error with probablity p and does not cause any error with probability (1− p).


DRAFT COPY


the above equation becomes an inequality, thus giving:

≤np

i=0

ni

(q − 1)i (1− p)n

p

(q − 1) (1− p)

i

≤np

i=0

ni

pi (1− p)n−i

≤ (p + (1− p))np

≤ 1

Hence we have a upperbound:

Vq (n, pn) ≤ qhq(p)n (9.26)

We now try to find a lowerbound on Vq (n, pn). We have:

Vq (n, pn) ≥

npn

(q − 1)pn

The RHS of the above equation gives:n!

(pn)! (n− pn)!. Using Striling’s approximation given in (eq. 9.25),

we note that:

npn

=

√2πn

ne

n

2π (n− pn)

n−pne

n−pn√2πnpnp

e

np (1 + o(1))

≥

n

n− pn

12

n

(n− pn)1−p

n 1pn

12

1pn

n

e−np

≥ 1

((1− p) np)12

n

(n− np)1−p

ne−np

npnp

≥

11− p

n(1−p) 1p

np

e−np

≥ qhq(p)n − p logq (q − 1) q−np

≥ q(hq(p)−o(1))n

where we have used that p = o(1). Therefore, we see that

qhq(p)n ≥ Vq (n, pn) ≥ q(hq(p)−o(1))n (9.27)

We now have an upper and lower bound on the number of error sequneces that can occur on the transmissionof a codeword of length n. We now need to show a lowerbound for the rate, as a good error correcting code

has a higher rate. Note that from Hamming bound and (eq. 9.27), we have: qk ≥ qn

qhq(p)n. Therefore, we

have: qk ≥ qn−nhq(p). Hence, on taking logarithms (base q) on both sides, we get:

k ≥ n− nhq (p)R ≥ 1− hq (p) (9.28)

A stronger bound can be established 16 as:

R ≥ 1− hq (p)− (9.29)

16See the statement of this theorem in [16]. The version in [24] however, provides the statement of the G.V bound as the oneobtained here.


DRAFT COPY


Where 17 is a small quantity. We now try to show that the above bound is voilated with an inverseexponential probability, thereby asserting that there exists exponentially many error correcting codes thatsatisfy the Gilbert Vasharmov Bound. More precisely,

Theorem: ∃ exponentially many codewords that satisfy the Gilbert Vasharmov bound.

Proof:From the upperbound statement (eq. 9.26), putting the distance of a code as p:

Vq (n, np) ≤ qhq(p)n

≤ q(hq(p)−1)nqn

Vq (n, np)qn

≤ q(hq(p)−1)n

Notice that the RHS of the above equation is nothing but the probability of a codeword (of block length n)existing within a distance of pn of another codeword. Summing this probability over all the qk codewords,we get the total probability of the code having at most distance pn. (Note that we are still dealing withperfect codes and hence every pair of codewords would be separated by the minimum distance, which is alsothe distance of the code.)

Total Probability: P ≤ qkq(hq(p)−1)n

From (eq. 9.29), we get k ≥ (1− hq (p)− )n.

P ≤q(1−hq(p)−)n

q(hq(p)−1)n

≤q(1−hq(p)−)n

q(hq(p)−1)n

≤ q−n

≤ e−Ω(n)

Hence, we see that the probability of a code having at most distance np is ≤ e−Ω(n). The complement ofthis event however is the case of a code having a distance at least np. This is given by 1 − P. Hence, wesee that this probability is ≥

1− e−Ω(n)

. Since this probability is very close to 1, we see that there are

exponentially many codes that satisfy the Gilbert Vasharmov Bound, and hence are good error correctingcodes.

Growth Enumerating Function [20]

Define: N (p) = R − 1 + Hq(p). N (p) = 0 if the code attains the G.V bound. The quantity N (p) is called

the Growth Enumerating Function. Taking N = 2, R =12

and H2(p) = −p log2 p− (1− p) log2(1− p), N (p),plotted as a function of p gives:

17Notice that p is nothing but the probability of an error occurring. One can write p =t

n.


DRAFT COPY


Figure 9.2: Graph showing the growth enumerating function N (p) with R =12. In this case, pGV ∼ 0.1101.

At p = pGV , N (p) = 0 and the code attains the G.V bound, thereby being a perfect code. For p < pGV ,the growth function is negative and hence number of codewords at a distance less np from any codeword,exponentially decreases with n. On the other hand, for p > pGV , N (p) > 0 and hence the number ofcodewords at a distance greater that np from any codeword increases exponentially with n. Hence, taking aHamming space with points as codewords, we see that any sphere, centered at the codeword, with radius lessthan npGV would not contains other points, whereas any sphere centered around the same codeword, with aradius greater that npGV would contain exponentially many other points (codewords).

Figure 9.3: Figure Gilbert Varshamov Bound in a Hamming Space: For radius of the sphere greater thatnpGV , exponentially many points will lie inside the sphere and for radius less than npGV number of pointsinside the sphere would decrease exponentially with n.

If N (p) < 0, then number of codewords inside a sphere (centered around any codeword), with radius npdecreases exponentially with n and similarly, if N (p) > 0, the number of codewords inside a sphere of radiusnp would grow exponentially with n. Hence its name, Growth Enumerating Function.


DRAFT COPY


9.3.5 Parity Check codes [25]

Until now, we have concentrated on error detecting and correcting capacity of codes. Now we shift ourattention to the ease of error correction and detection. For this purpose, we introduce the parity check code,which makes the error detection procedure easy. 18 Let wi be the code words transmitted across thechannel. This encoding appends an extra bit b, called the Parity bit, to each code word wj ∈ C such that:

b =

1 if # 1’s in wj is odd0 if # 1’s in wj is even

. Hence, after this appending all the codewords will have even # 1’s. Now, if

any t-tuple error occurs (if t is odd), then the number of ones will become odd, i.e; the sum of the digits ofthe code word will be 1( mod 2). The error detection operation would report error if the sum results in 1.To check if the number of 1’s is even, we just need to add up all the digits of the code checking if the sum= 0. Let us consider all those codewords for which the # 1’s is even. Each of these codewords will satisfy alinear equation of the type α1r1 + α2r2 + . . . αnrn = 0, with ri ∈ 0, 1andαi ∈ 0, 1. If these are m suchcodewords (over the same code alphabet) with even # 1’s, then we have the system of linear equations or amatrix equation:

α11w1 + α12w2 + · · ·+ α1nwn = 0α21w1 + α22w2 + · · ·+ α2nwn = 0... +

... + . . . +... = 0

αm1w1 + αm2w2 + · · ·+ αmnwn = 0

⇒

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

r1

r2...

rn

= 0 where aij ∈ 0, 1 ⇒ HX = 0

Recall that if there is no error, then # 1’s is even for all the codewords, and hence we will have |C| linearequations that can written in the martix form as above, using a (|C|× n) matrix H, called the parity checkmatrix. Hence the code digits ri in the vector X satisfies the equation HX = 0. This is now a very simplecondition, which if not satisfied, indicated the presence of an error. The ease in this method as opposed tothe naive method of error detection is that it takes only polynomial time to multiply two matrices whereas,it takes exponential time to list down all the sequences in the error neighbourhood of a codeword and checkwhether the received sequence is one of them.

9.3.6 Linear Codes

We know that the set Γn forms a vector space over Γ. However, any subset of Γn (say S), though uniquelydecipherable, need not form a vector space (over Γ). Let us look at some condition(s) on S that would makeit a vector space.

Assumption Let S = xi ⊆ Γn be a set such that ∃ a matrix H over a field F where Hxi = 0 ∀xi ∈ S, thenS forms a vector space over Γ.Justification: Let x1, x2 ∈ S. This implies that ∃ a matrix H such that ∀x1, x2 ∈ S, Hx1 = Hx2 = 0. Toshow that S is a vector space, we need to show that:

• v (= x1 + x2) ∈ S: For this, it suffices to show that Hv = 0. As H is a linear operator, Hv =H (x1 + x2) = Hx1 + Hx2 ⇒ 0, for some vector k. Hence we see that Hv = 0, thereby justifying theassumption.

• v (= cx1) ∈ S: We need Hv = 0. We have: Hv = H (cx1) ⇒ cHx1 = 0, thereby justifying theassumption.

Hence, the set S forms a vector space over Γ and is called a linear code. A linear code can also be defined asthe code that corresponds to the null space of a matrix which, here, is denoted by H and is called a ParityCheck Matrix for the S. Suppose we are given that G is some linear map, U forms a vector space and X bea set such that ∀u ∈ U , ∃x ∈ X we have: Gu = x. It is simple to see that X has the properties of a vector

18However, it cannot be used to correct errors as the exact location of the error cannot be determined.


DRAFT COPY

CHAPTER 9. CLASSICAL CODING THEORY 9.4. EXAMPLES

space, and hence forms a linear code. Such a matrix G is called a generator matrix of the linear code X .Consider the following relation:

Gu = x

multiplying H on both sides: HGu = Hx

Since x ∈ X , which a linear code: HGu = 0As this holds ∀u ∈ U : HG = 0

The above is a very important property for linear codes19.

9.4 Examples

Errors in classical binary codes are only of the bit-flip type, i.e; In a codeword c ∈ 0, 1∗ the symbol 1 isreplaced by 0 or vice-versa. Hence on passing a binary sequence across an erroneous channel, we expect thebits in that sequence to be flipped.

9.4.1 Repetition Code[24]

If suppose, the channel flips one bit with probability ℘ (where ℘ ≤ 1 and in practice ℘ 1). Therefore,considering that each bit flip occurs independently, the probability of more bit flips is smaller (probability ofbit flips decays as inverse power of the # bit flips). As a result, by choosing an error model where ℘ 1, wemay safely ignore all except single bit flip errors. By this logic, if many copies of the same bit is passed acrossthe channel then only one of them would undergo a bit flip and by observing the majority of the bits, we canconclude which bit had been flipped. Flipping back this bit would then give us the transmitted sequence.This error correcting code where a single bit is repeated (the number of repetitions depends upon how small℘ is compared to unity) is called the Repetition Code. Hence the encoding map for the repetition code isgiven as:

0 → 000 & 1 → 111 (9.30)

Now consider the below procedure for correcting a single bit flip error on an arbitrary bit. Let 0101 bethe sequence to be transmitted. Now using the repetition encoding (eq. 9.30) we get 000111000111 tobe the transmitted sequence. The channel now flips one of the bits in the sequence. After receiving theerroneous sequence, we simply compare block-by-block this sequence with the transmitted one and identifythe mismatch block. Only one bit would differ in these blocks. This, by majority voting, is the flipped bit andit is flipped back to correct the sequence. On decoding this, we obtain back our transmitted sequence.

0101 000111000111Transmitted: 000111000111 &Received: 010111000111. Bymajority vote, 2nd bit has flipped

Flip the second bit to get:0001110001110101

encode transmission causing single bit flip

decode

19Note that the definition of linear code is sometimes given as HT

x = 0, ∀x ∈ X . In that case we have the condition HTG = 0


DRAFT COPY

9.4. EXAMPLES CHAPTER 9. CLASSICAL CODING THEORY


DRAFT COPY

Chapter 10

Quantum Codes

10.1 Introduction

An n-qubit Quantum Code is defined as a 2k dimensional subspace of the 2n dimensional vector space overn-qubits (the field (C)2

n

). The main difference between classical and quantum codes (that will be madeuse of in different bounds, is the fact that the quantum coding space is 2k dimensional as compared to thek-dimensional classical coding space. A quantum codeword has n-qubits out of which the k qubits are calledthe information bits, as in classical theory and the rest are redundant bits, used for error correction properties(number of redundant qubits is bounded by the quantum singleton bound). The reason for the quantumcoding space to be 2k dimensional is because a code whose each codeword had k−bits would, classically have2k codewords. As there is no superposition states in case of classical codes unlike quantum codes, we seethat using these 2k classical codes, we can combine them with different amplitudes to form more codewords.Hence the 2k classical codewords of the n-bit classical code are taken as the basis vectors of the n-qubitquantum code. Since the dimensionally of a Hilbert space is the number o basis vectors used to describe it,we see that an n-qubit quantum code is 2k dimensional. To differentiate this notation from the classical case,we use: [[n, k, d]]q to denote a 2k dimensional quantum code which is a subspace of 2n dimensional vectorspace, with the codewords separated by a distance d. The subscript q denotes the alphabets size, which istaken always as two, in all further explanations1.

10.2 Errors in Quantum codes

Just as in the classical case, errors in quantum codes can be bit-flips on the basis vectors of the code. Noticethat in the case of quantum codes, we have codes which are superpositions of the basis (with some amplitudes)and hence there is scope for errors affecting this superpositions, by changing the phases of these amplitudes.These are called phase errors. It is easy to see that if the original code is in the computational (σZ) basis,then bit flip errors are caused by σX and phase flip are caused by σZ operators. However, we in the σX basis,we can see that σZ causes phase errors.

We now extend the scope of errors by noticing that any unitary operator that acts on n-qubits is an error.To build a framework to analyse these errors, we view the quantum computer (system that performs thecomputation) with state ρs as a closed system and the errors as the bath or the environment, with ρb inwhich the quantum computer is placed. The combined open quantum system not is described by the densitymatrix ρs × ρb. The error operation is now described as a map or a time evolution of this open quantumsystem. After we have found how the open system evolves, we sum over all degrees of freedom of the bath (orby taking a partial trace of ρs× ρb with the environment) thereby obtaining the change in the closed system

1The general case where q = d for some d is called a qudit code.

135

DRAFT COPY

10.2. ERRORS IN QUANTUM CODES CHAPTER 10. QUANTUM CODES

or the quantum computer (incorporating the effects of the bath, during the evolution). But as a map, notmuch information can be extracted about the error operation. There are two ways of representing this error,firstly we try to represent this operator as discrete operators. This representation is called the Operator SumRepresentation.. After this, using the operator sum representation, we express the time evolution of ρs × ρb

using the master equation.

10.2.1 Operator sum representation [3] [22]

The evolution of an open quantum system can be described using an unitary operator U (which acts on thetensor product state of the system and the bath):

ρS,B (t) = U (ρS (0)⊗ ρB (0)) U† (10.1)

Let the states of the system and the bath be given in terms of thier basis: ρS (0) =

ij

sij |ij| and ρB (0) =

µν

bµν |µν| respectively.

ρS,B (t) = U

ijµν

|iµjν|

U†

Tracing over the bath we get: ρS (t):

ρS (t) = trB

U

ijµν

|iµjν|

U†

=

ijµνµ

µ|U |iµjν|U† |µ

Assuming that the unitary evolution has the form: U = US ⊗ UB : U |iµ =

αβ

uαiβµ|αβ and jν|U† =

ηυ

u∗ηjυνηυ|. Hence we get:

ρS (t) =

ijµνµ

sijbµνµ|

αβ

uαiβµ|αβ

ηυ

u∗ηjυνηυ|

|µ

=

ijµνµαβηυ

sijbµνuαiβµu∗ηjυνµ|αβηυ|µ

Consider the inner product µ|αβ: since µ is a state of the bath, it must be contracted only with theket vector of the bath (and hence the other vector is left unchanged, or equivalently multiplied by identityoperator of the system) which is |β. Due to the orthonormality of the basis vectors, this inner product willgive: υµ,β |α. Using this technique on both the inner products in the above expression, we get:

ρS (t) =

ijµνµαβηυ

sijbµνuαiβµu∗ηjυνδµ,βδη,υ|αη|

Summing over β and υ and noticing that the only nonvanishing comonent is from β = µ and υ = η, wehave:

ρS (t) =

ijµνµαη

sijbµνuαiµµ|αβu∗ηjµν


DRAFT COPY

CHAPTER 10. QUANTUM CODES 10.2. ERRORS IN QUANTUM CODES

Summing over α and η, and noticing:

α

uαiµµ|α = uµµ, we can substitue in the above expression. This

is nothing but contracting a fourth rank tensor (U , with elements uαiµµ) with a rank one tensor (vector) toobtain a third rank tensor (tensor product between a second rank tensor and a vector).

ρS (t) =

ijµνµ

sijbµνuµµ|ij|u∗µν

=

µνµ

bµνuµµ

ij

|ij|

u∗µν

=

µνµ

uµµρSu∗µν

Putting uµµ = µ|U |µ and u∗µν = ν|U†|µ and letting Eµ =

bµνµ|U |µ, we get:

ρS (t) =

µ

EµρE†µ

We see that the evolution of the density matrix is given by the noise function: ρS (t) = (ρ). This noisefunction can be represented as in the above summation form:

(ρ) =

µ

EµρE†µ (10.2)

the above form is called the Operator sum representation.

10.2.2 Lindblad form using the Master Equation [24]

We have: ρs (t) = ε (ρs (0)), which in the operator sum representation becomes: ρs (t) = ε (ρs (0)) =

k

Ekρs (0) E†k. Now expanding ρ (t) in a taylor series:

ρs (t) = ρs (0) +∂ρ

∂tO (δt) (10.3)

∴ ρs (t) = ρs (0) +O (δt) =

k

Ekρs (0) E†k (10.4)

Expanding the RHS, we get:

ρs (0) +O (δt) = E0ρs (0) E†0 +

k≥1

Ekρs (0) E†k

Comparing term by term in the above equation: E0 ∼ Is +O (δt) and Ek ∼ O (δt). Using the Hamiltonianof the system to be Hs, we can write the operator E0 (for infinitisimal time translations) terms of Hs:

E0 = Is +

K − i

Hs

δt

where K is some hermitian operator. For the operators Ek, k ≥ 1, we have the general form:

Ek =√

δtLk for k ≥ 1


DRAFT COPY


From the trace-preserving condition of Ek as established in (eq. ??), we see that:

Is =

k

E†kEk

= E†0E0 +

k≥1

E†kEk

=Is +

K +

i

H

δt

Is +

K − i

H

δt

+

k≥1

L†kLk

δt

= Is + 2Kδt +

K +i

H

K − i

H

(δt)2 +

k≥1

L†kLk

δt

Ignoring (δt)2 terms: 0 = 2Kδt +

k≥1

L†kLk

δt

∴ K = −12

k≥1

L†kLk (10.5)

Putting the above equation in (eq. 10.4) we get:

ρs (t) =

Is +

K − i

H

ρs (0)

Is +

K +

i

H

+

k≥1

Lkρs (0) L†k

δt

Using (eq. 10.5) = ρs (0) +

K − i

H

ρs (0) Is + Isρs (0)

K +

i

H

+

k≥1

Lkρs(0)L†k

δt +Oδt2

We can now make the approximation: ρ (t) = ρ (0) + O (δt). Puttting this in the above equation, we seethat this δt factor compbines with the δt already present outside the square-paranthesis, to give a (δt)2

contribution that can be ignored. Therefore, merely replacing ρ (0) → ρ (t), inside the O (δt) does not affectthe expression. Making this change, we get:

= ρs (0) +

Kρs (t) + ρs (t) K

− i

Hρs (t)− ρs (t) H

+

k≥1

L†kLk

δt +Oδt2

= ρs (0) +

ρs (t) ,−12

k≥1

L†kLk

− i

[H, ρs (t)] +

k≥1

L†kLk

δt

We can now compare the above equation with (eq. 10.3):

∂ρ

∂t= − i

[H, ρs (t)] +

k≥1

L†kρs (t)Lk −

ρs (t) ,12

k≥1

L†kLk

The above is called the master equation for the density matrix of the system and Lk are called LindbladOperators. Notice that this is just the incorporation of the effect of the bath in the usual time evolution

equation for the density operator of the close system:∂ρ

∂t= − i

[H, ρ (0)]. We can now reaggrange the term:


DRAFT COPY


k≥1

L†kρs (0) Lk −

ρs (0) ,12

k≥1

L†kLk

:

k≥1

L†kρs (0) Lk −

ρs (0) ,12

k≥1

L†kLk

=

k≥1

L†kρs (0) Lk −

12

k≥1

ρs (0)L†kLk +

k≥1

L†kLkρs (0)

=12

k≥1

L†

kρs (0)− ρs (0) L†k

Lk +

k≥1

L†k

ρs (0) Lk − Lkρs (0)

=12

k≥1

L†

k, ρs (t)Lk + L†

k [ρs (t) , Lk]

we therefore have the Lindblad master equation, where Lk are called the lindblad operators [3].

ρ (t) = − i

[H, ρs (t)] +12

k≥1

L†

k, ρs (t)Lk + L†

k [ρs (t) , Lk]

(10.6)

10.2.3 Error Correction Condition for Quantum Codes [24]

We saw that the classical error correction condition is nothing but the distance bound on the codewords, butin the quantum case, unlike the classical case where all known errors are only of one type-the bit flip, therecan be arbitrary unitary errors. Hence we have a constraint on these errors, if they are to be correctable.When we say that an error, the map , is correctable, we mean that there exists another map or operation2 R,such that composition ofR with would give back the initial state ρ. The error correction condition says that:

Let C be a code, P be a projector onto the code space, be a noise (quanum operation) having operationalelements Ei. ∃ an error-correction operation R correcting Ei on C iff:

PE†i EjP = ΓijP (10.7)

where Γij are entries of a hermitian matrix Γ.

Proof Idea: The key idea here is that quantum operations, to be reversible, cannot increase distinguishabil-ity. The Error operation, if correctable, must be such that, if there are two codes such |ψi, |ψj which areorthogonal (or not orthogonal - indistinguishable through a measurement), they must continue to be so evenafter the error has occurred. Hence, if we have any two error operators Ei and Ej acting on |ψi and |ψjrespectively, such that: |ψi = Ei|ψi and |ψi = Ej |ψj then:

ψi|ψj ∼ ψi|ψjψi|E†

i Ej |ψj ∼ ψi|ψj

The above equation must hold for all elements (|ψi, |ψj) ∈ Code space. Hence, we can consider the projectoroperator P , onto the code space, instead of individual vectors. Therefore,

PE†i EjP ∼ P

The proportionality constant has to be some real number.

PE†i EjP = λP

or, generally, the constant λ can be the elements of a hermitian matrix. Formal Proof: To formally provethe theorem, we need to show the necessity and sufficinecy of the above comdition. In other words, we needto prove the theorem as well as its converse. We can start by showing sufficiency condition.

2Symbol R is not to be confused with rate of a code. The contexts of usage will be entirely different.


DRAFT COPY


• Claim: ∃ an error correcting operation R such that R (ρ) ∝ ρ, given that PE†i EjP = ΓijP .

Proof: Let Fk =

i

uikEi be a set of unitarily equivalent set of errors, uik begin elements of the unitary

operator U that diagonalises Γ. Hence, D = U†ΓU is diagonal, with elements dkl = dkkδkl.

PF †kFlP =

ij

ukiujlPE†i EjP

By Assumption: =

ij

ukiujlΓijP

=

ij

ΓijukiujlP

=U†ΓU

= dklP

= δkldkkP (10.8)

Consider now the polar decomposition FkP :

FkP = Uk

PF †

kP

= Uk

dkkP (10.9)

∴ FkPU†k =

dkkUkPU†

k

⇒ UkPU†k =

FkPU†k√

dkk

Define the subspaces, based on the LHS of the above equation:

Pk = UkPU†k (10.10)

Pl = UlPU†l (10.11)

We now show that the two (above) code spaces are orthogonal, i,e; P †l Pk ∝ δkl:

P †kPl =

Ul

from (eq. 10.8)∝δkl PF †

l FkP Uk√dkkdll

∴ P †kPl =

δklUlU†k√

dkkdll∝ δkl

What is the significance of the code spaces being orthogonal ? Finally we show the existance of anerror correcting operation R (ρ) ∝ ρ. R is also, in some sense, similar to a quantum operation.More precisely, it is the inverse of a noise operation. It too has a similar operator sum representation.Hence3, R (σ) =

i

R†i (σ) Ri.

R (ρ) =

k

U†P †k (ρ) PkUk

=

kl

U†kP †

kFlρF †l PkUk

=

kl

ζklζ†kl (10.12)

where ζkl = U†kP †

kFl√

ρ

3Notice the position of the daggers, as opposed to the definition of a noise operation in (eq. 10.2). It is, in this sense, theinverse of a noise operation.


DRAFT COPY


From (eq. 10.11): P †k = UkPU†

k , and by multiplying both sides of (eq. 10.9) by U†k , we get: FkPU†

k =

dkkUkPU†k ⇒

dkkP †

k . Hence, we have: P †k =

FkPU†k√

dkk. Substituting this form of P †

k in the expression

for ζkl, we get:

ζkl =U†

kUkPFkFlP√

ρ√dkk

=PFkFlP

√ρ√

dkk

from (eq. 10.8) =δkldkk

√ρ√

dkk

=

dkk√

ρδkl

Putting the above form of ζkl in (eq. 10.12), we get:

R (ρ) =

kl

dkkρδkl

=

k

dkkρ

∴ R (ρ) ∝ ρ

Hence, we have shown the existence of an error correction operation, thereby proving sufficiency.

• We now need to show necessity, of the error correction codition. Since, the error acts on the encodedstate, present in the code space, given by PρP , we have:Claim: PE†

i EjP = ΓijP iff ∃R such that R (PρP ) ∝ PρP , ∀ρ 4.Proof: In the operator sum representation,

R (PρP ) =

ij

RjEiPρPE†i R

†j

By assumption, RHS of the above equation is ∝ PρP .

ij

RjEiPρPE†i R

†j ∝ PρP ⇒ cPρP

ij

ξijρξ†ij = cPρP

where ξij = RjEiP . In the above equation, we see that the operation

ij

ξij , on ρ is equivalent to the

operation cP on ρ. Putting back the expression for ξij , we get:

RkEiP ∼ cP ⇒ RkEiP =

k

ckP

∴ (RkEiP )† =

k

ckP

†

⇒ PE†i R

†k =

k

c∗kP (10.13)

similarly, we have for Ej : RkEjP =

l

clP (10.14)

multiplying (eq. 10.13) and (eq. 10.14): PE†j EiR

†kRkEiP =

kl

c∗l ck

Γij

P

∴ PE†j EiP = ΓijP

4Note that acts on the element of the code space, corresponding to ρ: PρP , and not on ρ.


DRAFT COPY

10.3. DISTANCE BOUND [?] CHAPTER 10. QUANTUM CODES

where Γij are elements of some hermitian matrix. Not clear, why they have to be elements of a hermitianmatrix.

Therefore, we have proved both the necessity and sufficiency of the quantum error-correction condition,thereby proving the theorem.

10.3 Distance Bound [1]

In section (sec. 9.3.3) we saw that the distance of a t-error correcting code must be greater than 2t. We havethe same bound for quantum codes also, with a small extra condition. Let us follow this statement for thecase of stabilizer codes. Notice that when an error E occurs on a stabilizer code S, it either anti-commuteswith one stabilizer (generator), or commutes with all the generators. Let us consider the former case, whichresults in an orthogonal code-space after the action of the error. This can be distinguished from the originalcode and hence the error recovery can be performed. In this case, irrespective of the distance of the code (evenif the distance of the code is 1 - the anti-commutation condition separated the erroneous and the originalcode-spaces), we can detect and also recover from the error. Taking up the latter case of the two possibilitiesfor an error operation, if the error is not a stabilizer: E ∈ Z (S) (in which case we need not correct thiserror) we see that this gives a codes-pace that is not orthogonal to the previous one. This corresponds tothe classical case and hence we need to condition the distance of the code for errors in Z (S)−S. With this,we first define the notion of ”distance” of a stabilizer code. Note that the distance of a general quantumcode is cannot be just defined as hamming distance between the basis vectors as then all 2n−1 vectors (in ann-qubit code) which have different relative phases, would have the same distance. However, we can definethe hamming distance for two stabilizers as in he classical sense. In the classical case we see that the distanceof a code can be interpreted as the minimum number of bit flips two codewords, a code can correct. Thedistance of the stabilizer code is defined as the minimum weight of an element in Z (S)−S, where the weightof an operator wt(E) is number of non-trivial (non-identity) tensor factors in E. Hence we have:

Theorem: A t-error correcting quantum stabilizer code must have a distance d ≥ 2t + 1.

10.4 The Quantum Singleton Bound [1]

In section (sec. 9.3.4) we discussed the classical case of the singleton bound. The main idea in the singletonbound is as follows: As the code has only k information bits, it suffices to consider all the codewords to bevectors in the 2k dimensional vector space. But this implies that each codeword is at a distance of 1 fromeach other. The same was possible in the classical case where we could have taken the block length of eachcodeword as k (instead of n), with each codeword being at a distance 1 from each other. This was clearlynot desirable as we wanted a distance d code, as a result of which we had to introduce5 d − 1 ancilla bitsto make each codeword separated by a distance d from each other. This increases the block length from kto n = k + d − 1. This statement gives the classical singleton bound: n − k ≥ d − 1. In the quantum case,there are both the bit flip (classical errors) as well as the phase flip errors (which are bit flip errors in theσX basis). Clearly, more number of error operators suggests that the classical bound cannot carry on for thequantum case as well, as otherwise as we would have just corrected the bit flip (σZ) errors. Now if we assumethat, in addition to t bit flip errors, there are t phase flip errors as well, then we need to append d− 1 moreancilla qubits (or tensor factors in the stabilizer) to account for the phase errors6.

Theorem: For an [[n, k]]2 quantum code C, we have: n− k ≥ 2(d− 1).

5We need to show how introducing d ancilla bits increases the distance since it is not obvious.6This needs to be shown, as again it is not very obvious. Note that there is another ambiguity as it seems like we have not

used a similar assumption for the bit and phase flip errors in the previous section, while discussing the distance bound.


DRAFT COPY

CHAPTER 10. QUANTUM CODES 10.5. QUANTUM HAMMING BOUND

10.5 Quantum Hamming Bound

7 In (sec. 9.3.4) we discuss the hamming bound for classical codes. Notice that the only difference betweenan (n, k)q classical code and an [[n, k]] quantum code code is that the former is k−dimensional whereasthe latter is 2k dimensional. There are two ways of realising the quantum hamming bound. One is byconsidering an equivalent classical code whose dimension is 2k, which works for general q−ary codes and theother, specifically for binary codes, is based on the fact that there are three error operators in the quantumcase, as opposed to only one in the classical case.

Take the former method first. Notice that a (n, k)q classical code can be seen as describing the state of ntwo state classical systems. Similarly, a [[n, k]]q describes the state of n two state quantum systems. Moreexplicitly, let us take n = 1 and q = 2. A two state classical system can be described using just two distinct(orthogonal vectors) with each vector corresponding to one of the states of this system. The states of thesystem would be 0, 1. On the other hand consider a spin half (two state quantum) system. This can

exist in one of|0, |1, |0+ |1√

2,|0 − |1√

2

states. There are four states the system can be in and they can

be described using the quantum code: |0, |1. Certainly, the classical code 0, 1 would not be able todescribe all the states, whereas the classical code 00, 01, 10, 11, or equivalently a, b, c, d would be able to.In order to describe a 2−state quantum system, we require a 4−ary quantum code and similarly to describen 2−state quantum systems8, we need n 4−ary classical codes (as opposed to n 2−ary quantum codes.).Therefore a 2− ary quantum code is equivalent9 to any 4−ary quantum code.

Let us now go back to the general case of q−state system, so as to look at the theorem for q−ary codes. Aq−ary quantum system has states that span a q−dimensional hilbert space. The basis vectors of this hilbertspace form the states of the system. Hence are 2q states the q−-state system can be in and we would requirea 2q-ary classical code to describe this system. Now, summarising we have seen that a q − ary quantumcode is equivalent to a 2q−ary classical code. As a result, we can take the quantum bound for the q−aryquantum code to be the same as that for a 2q−ary classical code. This gives the quantum hamming bound(with q = 2):

Theorem: For an [[n, k]]2 quantum code C, we have:

|C| ≤ 2n

tm=0

nm

3m

(10.15)

Taking now the second method, we note that the only type of error a classical error correcting code has tocorrect is a bit flip error, as opposed to a bit flip, a phase flip as well as a bit-phase flip errors a quantum codehas to correct. Let us recall the way we had derived the quantum hamming bound in (sec. 9.3.4), observingmore closely the footnotes in this section (pertaining to quantum codes). While calculating the total numberof binary code sequences that differ from a given sequence in (eq. 9.22), we had taken a fixed sequence u =u1u2 . . . uk1 . . . ukm . . . un and we counted the number of sequences of the form v = u1u2 . . . vk1 . . . vkm−1 . . . un

with ui = vi,∀i ∈ k1 . . . km−1. Here we had made a key assumption that (for the case of binary classicalcodes and the assumed error model) if ui is fixed, then the only value vi can take is a bit fliped value ofui, or in the quantum coding terminology, we had assumed that if ui = |0 then vi = Xui ⇒ |1 where Xis the bit-flip operator (we had also later remarked that it is this assumption that needs to be modified inthe quantum case). In the quantum code, the assumed error model consists of both X and the phase-flip Z

operators. Hence, now if ui = |0, then vi can be any one of X|0, Z|0, XZ|0⇒|1, |0+ |1√

2,|0 − |1√

2

corresponding to bit flip (X), phase flip (Z) and the bit phase flip (XZ or iY ) errors respectively. With

7An alternate proof, for the case of binary codes, can be found at [24] which emphasizes the presence of three distinct errorsoperators (pauli matrices X, Y and Z) for quantum codes as opposed to only one (bit flip) for classical binary codes.

8We assume that these n systems are non-interacting.9That is, we can have a bijection between a 2− ary quantum code and a 4−ary classical code.


DRAFT COPY

10.6. THE QUANTUM GILBERT VARSHAMOV BOUND [?] CHAPTER 10. QUANTUM CODES

this, we see that the quantity in (eq. 9.22) becomesn

m=0

n

m

3m. Continuing the same procedure as in (sec.

9.3.4) now gives the equation of the hamming bound for quantum codes, as described in (eq. 10.15).

10.6 The Quantum Gilbert Varshamov Bound [2]

10 Just as in the classical case (sec. 9.3.4), consider the quantum hamming bound for perfect codes:

2kt

i=0

n

i

3i = 2n. We can add a positive definite quantity to the LHS of the above equation, obtain-

ing the inequality: 2k2t

i=0

n

i

3i = 2n. As per the notation defined in (eq. 9.23), we see that

2t

i=0

n

i

3i =

2t

i=0

n

i

(4− 1)i ⇒ V4 (n, d− 1) where we have used the fact that for perfect codes: 2t + 1 = d.

2k ≥ 2n

V4 (n, d− 1)(10.16)

Thereby showing the Quantum Gilbert Varshamov bound. Similar to the classical case, this above boundjust states that there exists some (in fact, atleast one) perfect quantum code(s), which is not of much useas we need some statement that assures that there are a significant number of quantum codes. We will firstshow that these codes have a rate that is very close to 1, thereby making them ”perfect”. Also notice thatthe quantum gilbert varshamov (eq. 10.16) for codes over C2 is the sames as the classical gilbert varshamovbound for codes over F4. Hence using the general q-ary analysis as in (sec. 9.3.4) we can construct theasymtotic bound for the quantum case. Before that, we note a very important point that distinguishes thequantum case from the classical case. This is based upon the error models in the classical (where we onlydeal with bit flip errors) and quantum cases (where we deal with both bit as well as phase flip errors, casedby pauli operators X and Z respectively.). While showing the asymtotic form in the classical case we haveassumed a binary symmetric channel that will cause an error with probability p and leave the bit unchangedwith probability 1−p, thereby causing the average # errors to be 2p. In the quantum case we assume a similarbinary channel which will cause a bit flip error with probability p1 and a phase flip error with probability p2,moreover we will assume this channel to be symmetric, by taking p1 = p2. Hence the error probability perbit becomes 2p thereby making the average number of errors as 2np. Extending (eq. 9.26) to the quantumcase (q = 4 and p → 2p), we have:

V4 (n, 2np) ≤ 4h4(2p)n (10.17)

Using the above equation along with (eq. 10.16), we obtain:

2k ≥ 2n

V4 (n, 2np)

≥ 2n

22h4(2p)n

≥ 2n−2h4(2p)n

k ≥ n− 2h4 (2p) n

k

n≥ 1− 2h4 (2p)

Using (eq. 9.24) with q = 4 : R ≥ 1− 2 [2p log4 3− 2p log4(2p)− (1− 2p) log4 (1− 2p)]≥ 1− 2p log 3− [−2p log 2p− (1− 2p) log (1− 2p)] (10.18)

Using (eq. 9.24) with q = 2 : R ≥ 1− 2p log 3− h2 (2p) (10.19)10Given as an exercise in [24]


DRAFT COPY

CHAPTER 10. QUANTUM CODES 10.7. EXAMPLES

As the error probability is nothing but# errors

# total qubits=

t

n. Here is where we also use that fact the underlying

quantum error correcting code can correct t erros. Now we are left with the task of showing that this boundis satisfied significant, more precisely, an exponential number of times, by showing the the above bound itviolated with inverse exponential probability. But for this we can exactly follow the steps in section (sec.9.3.4). Hence we see that there are an exponential number of perfect quantum codes too.

10.7 Examples

10.7.1 Bit Flip code [24]

Just as the case in classical channel, a quantum channel too can cause a single bit flip (we ignore multiplebit flips on the same codeword, based on the assumption that they occurs with negligible probability). A bit

flip error on a [[n, k, d]]2 quantum codeword |ψ =2k

i=1

αi|i affects each of the 2k basis vectors in |i. Each

basis vector can be considered as a (n, k, d)2 classical code. Upon transmission of |ψ, each |i undergoesa bit flip, i.e; one of the n components of |i is flipped. It suffices to detect and correct errors on each |iindependently. Hence we need to consider a repetition code where each component11 |i is repeated (again,the number of repetitions depends upon the probability of bit flip). Suppose each component is repeated mtimes, then |i becomes a vector in a 2mn dimensional space. Consider the encoding map for a particularcase, k = 2 where number of repetitions, m = 3:

|00 → |000000 & |01 → |000111 (10.20a)|10 → |111000 & |11 → |111111 (10.20b)

The error correction procedure, for some state |ψ = α|01+ β|11 can be summarised as:

α|10+ β|11 α|111000+ β|111111

Transmitted:α|111000 + β|111111 &Received:α|111010+β|111101. Bymajority vote, 5th bit hasflipped

Flip the fifth bits in eachto get:α|111000+ β|111111

α|10+ β|11

encode transmission causing single bit flip

decode

The encoding circuit is given for a simpler case where k = 1 (spin half system), and m = 3. Note that theminimum value of m = 3, as by the distance bound, a 1-error correcting code must have a minimum distanceof 3.

11Notice that the key concept of the repetition code is to repeat (create copies) of the entity which is affected entirely by theerror. In this case this entity is the component of the basis vector and not the basis vector itself since we are not assumingmultiple qubit errors. Unless all the components of the basis vector is affected simultaneously, the basis vector would not beentirely affected.


DRAFT COPY

10.7. EXAMPLES CHAPTER 10. QUANTUM CODES

Figure 10.1: Figure showing the error correction of a single bit flip error, on a three-qubit state. The portionof the circuit before the single qubit error (transmission) is the encoding part. The decoding procedure isnot explicitly shown. It is done merely by ignoring two qubit and taking only one.

Error Correction condition for the repetition code

12 We now try to establish the error correction condition in (sec. 10.2.3), explicitly for the case of the repetitioncode, thereby showing that any two single qubit error can be corrected by the repetition code. The repetitioncode is given by

|ψ = α|0 + β|1

→

|ψL = α|000 + β|111

and hence any projector onto the code

subspace, P in (eq. ), would be: P ∼ |ψψ|⇒ |γ|2 |000000|+ηγ|2|111000|+γη|000111|+|η|2 |111111|.The error operators here are single qubit errors, and are pauli operators: X,Y, Z ≡

σ(m)|m ∈ X,Y, Z

(We use the notation σ(m) to denote the mth pauli matrix in the set: X,Y, Z.). Hence the quantityE†

j Ek ≡ σ(m)j σ(p)

k . We now need to show that (eq. 10.7) is satisfied for the repetition code (with the abovesubstitutions). Hence the error correction condition becomes: |ψψ|σ(m)

j σ(p)k |ψψ| = γij |ψψ|. It now

suffices to show that ψ|σ(m)j σ(p)

k |ψ = γij , (γij can also be 0). Notice: ψ|σ(m)j σ(p)

k |ψ would be (ignoring

12Given as an exercise in [24]


DRAFT COPY


constants γ&η.): (000|+ 111) σ(m)j σ(p)

k (|000+ |111). Notice that13:

σ(m)j σ(p)

k |000 = (|1jδm,X − |1jδp,Y + |0jδp,Y )⊗ (|1kδm,X − |1kδp,Y + |0kδp,Y )

∴ 000|σ(m)j σ(p)

k |000 = 000 |(|1jδm,X − |1jδm,Y + |0jδm,Z)⊗ (|1kδp,X − |1kδp,Y + |0kδp,Z)

= (δp,X + δp,Y )× (δm,X + δm,Y )

Similarly, 111|σ(m)j σ(p)

k |000 = δp,X + δp,Y

σ(m)j σ(p)

k |111 = (|0jδm,X + |0jδp,Y − |1jδp,Y )⊗ (|1kδm,X + |0kδp,Y − |0kδp,Y )

∴ 000|σ(m)j σ(p)

k |111 = 000 |(|0jδm,X + |0jδm,Y − |1jδm,Z)⊗ (|0kδp,X + |0kδp,Y − |1kδp,Z)

= (δp,X + δp,Y )× (δm,X + δm,Y )

Similarly, 111|σ(m)j σ(p)

k |111 = δp,Zδm,Z

∴ Summarising: ψ|σ(m)j σ(p)

k |ψ =|γ|2 + |η|2

δp,Zδm,Z + (ηγ + γη) (δp,X + δp,Y )× (δm,X + δm,Y )

Finally, Pσ(m)j σ(p)

k P = [δp,Zδm,Z + 2γη (δp,X + δp,Y )× (δm,X + δm,Y )]P

Hence we have shown the error correction condition holds, moreover all the constants are real, so they canform the entries of a hermitian matrix, as the error correction condition demands.

10.7.2 Phase flip Errors [24]

It must be noted that phase errors do not affect each basis vector independently, unlike the bit flip error,and therefore they have no classical counterpart. This error only affects14 superposition states (again, whichdo not have classical counterpart). Under this error |0+ |1 → |0 − |1.

But we have the property of the Hadamard gate H which can transform a Z (phase flip) error to a X(bit flip) error. We would now be considering the code in the X-eigenbasis, by applying the H operator toeach codeword, such that phase flip errors turn into bit flip errors and then use the same error correctionprocedure as before. Now the encoding circuit would consist of H gates on each qubit just before it is sendfor transmission (into the single qubit error channel). The encoding map would therefore be:

|0 → |000 → |+ ++|1 → |111 → |−−−

∴ |ψ = α|0+ β|1 → α|+ +++ β|−−−

To bring the code back to the Z eigenbasis (the at receiving end, just before decoding), we need to apply theH† (notice that a property of H is: H† = H). Summarising the above error correction procedure, we havethe following error correction circuit.

13Note: We are now using the convention: |0t ≡ |0 . . . 1|ztth location

. . . 0 and similarly |1t ≡ |1 . . . 0|ztth location

. . . 1.

14Truely speaking, it also affects states that are not in a superposition. Ex. under this error, |1 → −|1. But there is justan overall phase difference (which can be ignored, as we consider all states only upto some arbitrary phase, to be equivalent.)between the original and the erroneous sequence.


DRAFT COPY


Figure 10.2: Figure showing a quantum circuit for correction of a single phase error in a three qubit quantumcode. Note that this figure is similar to the circuit for correcting bit flip errors, except at the encoding anddecoding portions.

10.7.3 Bit-Phase flip errors - Shor Code [24][23]

We now consider the possibility of both bit and phase flip errors occurring simultaneously15. Therefore takingthe action of such an error on some codeword (k = 2), we get: α|0+ β|1 → α|1 − β|0. We can generalisethis to higher dimensional codewords also, keeping in mind that only one qubit will be affected by both theseerrors. Hence, on this error: (α|0+ β|1)⊗l → (α|0+ β|1)⊗r−1 ⊗ (α|1 − β|0)⊗ (α|0+ β|1)⊗l−r, wherethe error has occurred on the rth qubit.

To correct this error, we must employ both of the above techniques. We first construct a repetition code asin (eq. 10.20). Then we take each qubit in this code to the X basis using the H gate, to correct the phase

15Notice that these two errors are independent. Hence if the probabilities of bit and phase flip errors are (upto linear order)℘b and ℘f respectively, then the probability of both occurring together is ℘b℘f , which again is of linear order.


DRAFT COPY


flip errors16. The encoding map is therefore given by:

|0 −→ |0L =|000+ |111√

2

⊗

|000+ |111√

2

⊗

|000+ |111√

2

(10.21a)

⇒ |000000000+ |000000111+ |000111000+ |000111111+ |111000000+ |111000111+ |111111000+ |11111111√8

|1 −→ |1L =|000 − |111√

2

⊗

|000 − |111√

2

⊗

|000 − |111√

2

(10.21b)

⇒ |000000000 − |000000111 − |000111000+ |000111111 − |111000000+ |111000111+ |111111000 − |11111111√8

With this, we see that |ψ = α|0+ β|1 transforms as: |ψ → |ψ = α|0L + |1L.

|ψ = α

|000000000+ |000000111+ |000111000+ |000111111+ |111000000+ |111000111+ |111111000+ |11111111√

8

+ β

|000000000 − |000000111 − |000111000+ |000111111 − |111000000+ |111000111+ |111111000 − |11111111√

8

(10.22)

We now see the effect of phase and bit flip errors on this code. Starting with the bit flip first, notice that ifthere is a bit flip error on the jth qubit, then in the block (of 3 qubits) where qubit j is present, the parityof qubits would differ (this parity check is done using a CNOT gate). That is, if qubit j is present in theblock with indices l, l + 1, l + 2 then the signs of ψ|ZlZl+1|ψ (or/and ψ|Zl+1Zl+2|ψ) would differ fromthe rest. Depending on which one(s) differ(s), we apply the X operator at the right position17.

Coming to the phase flip errors, as seen earlier we apply the H gate to every qubit to get the codrword interms of the X eigenvectors. Now the same process as for the bit flip error is repeated18.

Summarising, the encoding circuit for correcting arbitrary phase and bit errors on a single qubit, is givenby:

16Each time we take three repetitions of the basis vector because by the distance property, to correct a single error, thedistance of the code must be at least 3. Therefore, by the singleton bound we must have n−k to be atleast 3. Before repetition,n = k = 1 and after repetition, we satisfy the condition n = 4 (∴ n− k = 3).

17To state the actions performed based on the particular sign difference we have:

Table 10.1: Table showing the various measurement results for bit flip error location.ψ|ZlZl+1|ψ ψ|Zl+1Zl+2|ψ Flipped qubit position Correction Action: apply

+ + No Error I⊗n

+ - l + 2 Xl+2

- + l Xl

- - l + 1 Xl+1

18We note an interesting property that both |0L and |1L are eigenvectors of the operators X⊗6 ⊗ I⊗3, I⊗3 ⊗ X

⊗6 andI⊗3 ⊗X

⊗3 ⊗ I⊗3. So, when a phase flip occurs, measuring the sign difference in the expectation values of the above operators,we can detect the location of the phase flip error.


DRAFT COPY


Figure 10.3: Circuit showing the correction of single qubit bit and phase flip errors. In this case, aftertransmission across an erroneous channel, the bit and phase flip error is taken to affect the first qubit.


DRAFT COPY

Chapter 11

Stabilizer Codes

11.1 Pauli Group

The pauli group contains the set of pauli matrices1.It is a matrix group over the field C2 (or a qubit), denotedby: (Π, ·) and contains the set of operators:

Π = ±I2,±iI2,±σX ,±iσX ,±σY ,±iσY ,±σZ ,±iσZ (11.1)

The pauli group is the group generated by the pauli matrices: (Π, ·) = G where G = σX , σY , σZ. Thecommutation relations between the pauli matrices:2

[σY , σZ ] = iσX (11.2a)[σX , σZ ] = iσY (11.2b)[σX , σY ] = iσZ (11.2c)

In short: [σi, σj ] = ii,j,kσk ∀i, j, k ∈ σX , σY σZ (11.2d)

where i,j,k is the levi-civita tensor such that i,j,k =

0 if i = j or j = k

1 if i, j, k is an even permutation−1 if i, j, k is an odd permutation

We note some properties of the pauli group:

1. Any two elements of the pauli group either commute or anticommute3. Hence ∀gi, gj ∈ (Π, ·) :[gi, gj ] = 0 or gi, gj = 0.Justification: It suffices to verify this for every pair of elements in (Π, ·). Note that, in general the iden-tity (and hence ±I,±iI) commutes with every operator and all operators commute with themselves.Hence we only need to check if [gi, gj ] = 0 or gi, gj, ∀gi, gj ∈ (Π, ·) , gi = gj . From the property ofpauli matrices we see than the latter case is true. Hence we have the justification. As a consequence:∀A, B ∈ (Π, ·) we have: A ·B = A ·B or A ·B = −A ·B.

2. All elements of the pauli group square to identity: g · g = I ∀g ∈ (Π, ·).Justification: We can verify this for individual elements of the pauli group, keeping in mind that thepauli matrices have the property: A ·A = I ∀A ∈ σX , σY , σZ.

1Now onwards, Pauli matrices refer to the four matrices: σX =

„0 11 0

«σY =

„0 i

−i 0

«σZ =

„1 00 −1

«I2 =

„1 00 1

«. Also we set = 1.

2The commutator is defined for two matrices as: [X, Y ] = X · Y − Y · X, where ’·’ stands for the matrix multiplicationoperation.

3Anticommutator for for matrices of defined as: X, Y = X · Y + Y ·X.

151

DRAFT COPY

11.2. MOTIVATION FOR STABILIZER CODES CHAPTER 11. STABILIZER CODES

3. All elements of the pauli group are hermitian, i.e; E† = E, ∀E ∈ (Π, ·). This can also be verifiedexplicitly for each element of the pauli group, keeping in mind the property that the pauli matrices arehermitian. Since the operators are hermitian and square to I, we can see that they are unitary too.

4. All the above properties of the pauli group are satisfied iff they hold for the generating set of the pauligroup, which is σX , σY , σZ.Justification: We can justify the contrapositive by showing that if any property does not hold for thegenerating set then it does not hold for the Pauli Group too. Notice that any element g ∈ (Π, ·) can beexpressed as a finite string of elements in G connected by ’·’, say g = h1 · h2 · · · · · hn h1, . . . , hn ∈ G.It suffices to show that the above group properties fail to hold for this string if they fail to hold for G.

• Assumption: If ∃hi such that hi · hi = I then g · g = I, for any g ∈ (Π, ·).Justification: We have: g · g = (h1 · h2 · · · · · hn) · (h1 · h2 · · · · · hn) Notice that since hi eithercommutes or anticommutes with each h1, h2, . . . hn (this is a property of pauli matrices and canbe verified easily), we can exchange the position of hi−1 and hi by adding a +, in case of hi

commuting with hi−1, or − sign otherwise. After this, we can we can exchange hi−2 and hi againby introducing either a + or a − sign. Doing this inductively we can have: g = ±hi ·h1 ·h2 · · · · ·hn.Similarly, we can also get g = ±·h1 ·h2 · · · · ·hn ·hi. Putting these expressions for g in the productg · g, we get:

g · g = ± (·h1 · h2 · · · · · hn · hi) · (hi · h1 · h2 · · · · · hn)= ± (h1 · h2 · · · · · hn) · (hi · hi) · (h1 · h2 · · · · · hn) (11.3)

Now suppose hi · hi = h, we need to show that the only solution to g · g = I is h = I. Notice thath ∈ (Π, ·) since it is generated by the elements of G. Hence h would commute or anticommute withevery element of (Π, ·). We can now apply a similar technique on (eq. 11.3) to get:

g · g = ±h · (h1 · h2 · · · · · hn) · (h1 · h2 · · · · · hn)

The same exercise of getting hi out (as performed in (eq. 11.3)) can be applied on every hj

in the expression for g, starting with h1. Also, without loss of generality we may assume thathj · hj = I ∀j = i. Hence we will have:

g · g = ±h · (h2 · h3 · · · · · hn) · (h1 · h1) · (h2 · · · · · hn)= ±h · (h3 · h4 · · · · · hn) · (h1 · h1) · (h2 · · · · · hn)

and so on ... untill we finally get:

g · g = ±h

Note that if we demand g · g = I, the only solution to this equation forces h = I. That is, only ifthe generators square to I, so will the elements of the group. Hence we have the justification.

11.2 Motivation for Stabilizer codes

This stabilizer subgroup concept leads to an important class of Error correcting codes.We see that errors ona single qubit (with state |ψ element of H2) are caused by operators on this field. Since we know that thepauli matrices σX , σY , σz and I form a basis for the vector space containing all (2× 2) matrices, all singlequibit error operators can be expressed as a linear combination of these matrices. Moreover we see that thepauli matrices. Also these matrices form the pauli group: Π, under the matrix multiplication operation.From the closure law for (Π, ·) we see that all the single qubit errors can be seen as action (in the sense ofgroup action on sets) of (Π, ·) on the field H2 (which is the qubit). Now we consider a stabilizer subgroupof this pauli group for a particular qubit q with state |ψ ∈ H2, denoted as: S|ψ (Π, ·). By the definitionof the stabilizer subgroup, as in (eq. 3.41), we see that error operators in this subgroup show an importantcharacteristic: they leave the qubit invariant, or equivalently, leave its state unchanged.

σ · |ψ = |ψ ∀σ ∈ S|ψ (Π, ·) (11.4)


DRAFT COPY

CHAPTER 11. STABILIZER CODES 11.3. CONDITIONS ON STABILIZER SUBGROUPS

In otherwords, |ψ is in the common eigenspace of all σ ∈ S|ψ (Π, ·) (corresponding to eigenvalue 1), whichcan be denoted by CS. This eigenspace CS is called a Quantum Stabilizer Code over a qubit.

11.3 Conditions on stabilizer subgroups

11.3.1 Generating set of the stabilizer subgroup

Theorem: An [[n, k]] stabilizer code can be generated using a generating set containing n − k independentcommuting generators.

Proof:Let the generating set of the stabilizer subgroup S (of the pauli group (Π, ·)) be denoted as . Notice thatan [[n, k]] stabilizer code is a subspace of C2n

having 2k basis vectors. To generate this code, we need to mapthe 2n basis vectors of C2n

to the 2k vectors basis vectors of CS . This mapping is done using the stabilizersgenetated by .We now provide a proof to the above theorem by constructing the codespace CS of the maximum possibledimension, each time adding one more element to . We stop our construction when the dimension CSreached 2k. To start with there is only one generator (which by requirement of the group structure must be)I and the stabilizer subgroup S1 has one element: I⊗n. This element can map each of the 2n basis vectors tothemselves. Hence the codespace formed is 2n dimensional.

Assumption: Each new addition to reduces the dimensionality of CS by a factor of12.

Justification: We now provide a justification by induction on the number of generators being added:

• Base case: Suppose contains I and we add one more element to it, say σi1. The stabilizer subgroup

S2 would now consist of operators of the form:

I⊗(n−m) ⊗ σi1⊗m

∀0 ≤ m ≤ n

. Notice that all these

elements have the 2n basis vectors (in C2n

) as their eigenvectors but only 2n−1 of them correspond toeigenvalue +1. These are elements of S2 for which m is even. Hence after this addition of an generator,we see that |CS2 | = 2n−1.

• Induction step: After q steps:CSq

= 2n−q. Now it suffices to show that after q + 1 steps,CSq+1

=2n−q−1. Now after q steps, suppose S = Si (each Si is a stabilizer containing n tensor factors). Let ad-dition to the generating set on the (q+1)th step be σi

q+1. Hence Sq+1 =

S⊗(n−m)i ⊗ σi

q+1⊗m

∀0 ≤ m ≤ n, ∀Si ∈ Sq

.

Notice the all Si ∈ Sq have eigenvalue +1. Hence operators with eigenvalue −1 in Sq+1 can only comefrom σi

q+1⊗m, in other words, only from odd values of m. But we know that exactly half of the values

for m are odd. As a result, only half of the set Sq would be used in constructing the valid stabilizersfor Sq+1. Hence the stabilizer code CSq+1 would have only half as many elements as Sq. As Sq has 2n−q

elements, we see thatCSq+1

= 2n−q−1, thereby justifying our assumption.

As an immediate consequence of the above assumption, we see that if we add n − k generators, then theresulting codespace would be such that:

CSn−k

= 2k. We now see that n − k independent commuting4

generators are needed for a [[n, k]] stabilizer code, thereby proving the theorem.

4We make use of the fact that the generators all commute when we assume that Sq+1 =nS⊗(n−m)i ⊗ σ

iq+1

⊗m˛˛ ∀0 ≤ m ≤ n, ∀Si ∈ Sq

ofor any q. That is, we consider all permutations of S

⊗(n−m)i ⊗ σ

iq+1

⊗m

to represent the same element because each of them can be rearranged (since they commute) to obtain the other. Theindependence of the generating set is used when we assume that a stabilizer subgroup can be constructed by adding anothergenerator.


DRAFT COPY

11.4. ERROR CORRECTION FOR STABILIZER CODES CHAPTER 11. STABILIZER CODES

11.3.2 Structure of the stabilizer subgroup

We say that a code is trivial5 if it has only the null vector. In general we want code spaces that are non-trivial.In order to have a non-trivial stabilizer code corresponding to this subgroup, we see that there must be atleast one vector |ψ, other than the null vector, that is left invariant by all the elements in S. Hence thesubspace common to the eigenspaces of elements in S must have more than one vector. From section (sec.3.41) we see that the elements of S must commute with each other. Hence ∀gi, gj ∈ S : [gi, gj ] = 0. Thismeans that S is an abelian group. Also by the property of a general Stabilizer group, we see that −I /∈ Ssince if −I ∈ S, then from (eq. 11.4) we have −I|ψ = |ψ, whose only solution for |ψ is the null vector,implying that stabilizer code is trivial. Hence we have the following assumption:

Theorem: Any stabilizer subgroup of the pauli group corresponding to a non-trivial stabilizer code, is abelian.

Proof: Let M,N be two operators in the stabilizer subgroup S of the pauli group (Π, ·) and CS be thestabilizer code corresponding to S. As M and N are arbitrary, it suffices to show that [M,N ] = 0. Wenow provide a justification by showing that [M,N ] = 0 (which implies, from statement (stat. 1) of section(sec. 11.1), that M, N = 0) leads to a contradiction to the fact that CS is non-trivial. Since M and Nare elements of a group, by closure law, so is M ·N and N ·M . As elements of a stabilizer subgroup (from(eq. 11.4)) they follow: |ψ = M · N |ψ. Since M and N anticommute, M · N = −N · M . Hence we seethat:

|ψ = M ·N |ψ = −N ·M |ψ = −|ψ

which gives |ψ = −|ψ, with the only solution to |ψ being the null vector which in turn implies that CSis trivial, thereby brining a contradtion. Now we see that for CS to be non-trivial, [M,N ] = 0 and hence Smust be abelian, thereby proving the theorem.

11.4 Error Correction for Stabilizer codes

11.4.1 Notion of an Error in a Stabilizer code

An error in a code is represened as an operator acting on every code-vector in that code. As every erroroperator is an element of the pauli group, it either commutes or anticommutes with every other error operator.Hence, between every error operator and every element of any Stabilizer of the Pauli group, there is eithera commutation or an anticommutation relation. We have seen in (stat. 4) of (sec. 11.1) that in order toverify the commutator relations for the elements of the pauli group, it suffices to verify the same with thegenerators of the pauli group. Therefore, every error operator either commutes or anticommutes with everygenerator. Depending upon whether it commutes or anticommutes, we define the error syndrome of a code,βl as: βlgl = E · gl · E† where E ∈ Πn and gl is a generator of S.

11.4.2 Measurement on the stabilizer code

Given any operator A in (Π, ·) it would do one of the following:

• commutes with every generator: A ∈ Z (S)

• anticommutes with at least one generator: A ∈ Πn − Z (S)

5It is in the same sense as in a trivial subgroup of a group, which is the subgroup containing just the identity. For a vectorspace, the identity element is nothing but the null vector.


DRAFT COPY

CHAPTER 11. STABILIZER CODES 11.4. ERROR CORRECTION FOR STABILIZER CODES

in = g1, g2, . . . , g, . . . gn−k. To measure an observable6 A ∈ Z (S), we need to consider a projectivemeasurement. Using the spectral decomposition of A, we have: A =

m

mPm where the sum is over

all eigenvalues m, with corresponding projection operators Pm. Since any element of the pauli group haseigenvalues ±1 (and let the corresponding projectors be labelled P±), we see that A = P+ + P−. Theresults of the measurement are also ±1. As the eigenstates of A form a complete set, we have the identityP+ + P− = I. Combining these two expressions we get the projection operators P± corresponding to themeasurement results ±1, in terms of A.

A = P+ − P−

= I− 2P−

P− =I−A

2

and similarly, A = 2P+ − I ⇒ P+ =I + A

2. Hence measuring A is the projective measurement using

P± =I±A

2. We now see the probability of getting the results ±1: Let |ψ be an element of the stabilizer

code CS . Then we see that the probability of getting the results ±1 are:

p (±1) = tr (P±|ψψ|) (11.5)

Since |ψ ∈ CS , we see that ∀g ∈ we have: g|ψ = |ψ. Note that untill now, we have not assumed anythingabout the commuting or the anticommuting relations between A and elements of , which is summarized inthe two points at the begining of this section.Let the consider the latter possibility first, where ∃g ∈ such that g, A = 0. Calculating the measurementoutcome probabilities, we get:

p (+1) = tr (P+|ψψ|)

= tr

I + A

2|ψψ|

=

ψ

I + A

2g

ψ

=

ψ

gI−A

2

ψ

=

ψ

I−A

2

ψ

Using (eq. 11.5) : = p (−1)

Using the conservation of probability: p (+1) + p (−1) = 1, we immediately conclude: p (+1) = p (−1) =12.

Consider the former possibility (from the list at the begining of this section): The probability of getting +1and −1 would then be different. Hence one can perform this measurement on an ensemble of states andreveal the state with a high probability (by choosing the ensemble to be very large).

Let us now consider the post mesurement states corresponding to the measurement results ±1. The general

post-measurement state is given by: |ψ± =I±A

2|ψ. Notice that A fixes the state |ψ+:

|ψ+ =I + A

2|ψ

A|ψ+ =A + I

2|ψ

6In the pauli group, all elements are unitary and hermitian. Hence they can be compared to physical observables.


DRAFT COPY

11.4. ERROR CORRECTION FOR STABILIZER CODES CHAPTER 11. STABILIZER CODES

and moreover with an eigenvalue of +1. Also note that since A anticommutes with g, g no longer fixes thenew state after the measurement, which implies (from the defenition of the stabilizer) that g can no longerbe an element of . (Note that irrespective of whether A commutes or anticommutes with g, the elementsg1, g2, . . . gn−k are still part of the the stabilizer since they commute with both g and A. As a result thestates fixed by A (and g) are fixed by the other elements of as well.) Therefore the new stabilizer code willhave P+|ψ as the codewords and = g1, g2, . . . , A, . . . gn−k as the new generating set of the stabilizersubgroup. Consider now the post-measurement state corresponding to the result −1 which is |ψ−. Noticethat A does fix this state, but with an eigenvalue of −1, which means that |ψ− cannot be part of thestabilizer code and A cannot be a generator. Since A anticommutes with g, neither can A nor can g be in .Therefore, after the measurement, we see that we have lost a codeword (upon which the measurement wasperformed) and a generator (which anticommutes with this measurement) of the stabilizer subgroup. This iscertainly undesirable and we need to correct this effect of the measurement. Observe that since g,A = 0,we have:

g · P− · g = g · I−A

2· g

=I− g ·A · g

2

=I + A

2= P+

We have seen that A clearly fixes the state P+|ψ, which we see from above, can be obtained by the conjugationof g with P−. Hence we see that if we get a measurement result of −1, we must perform this conjugationaction. Summarising, after the measurement of A, on the stabilizer code CS , the generating set7 of thestabilizer code becomes = g1, g2, . . . , A, . . . gn−k.

11.4.3 Error Correction condition for Stabilizer codes

As in the previous subsection (sec. 11.4.2), we see that E would:

• commute with every generator: E ∈ Z (S)

• anticommute with at least one generator: E ∈ Πn − Z (S)

of the Stabilizer subgroup. Notice that a stabilizer code is defined to be the common eigenspace of all thegenerators (of the Stabilizer subgroup). Consider the latter case: As the error operator does not commutewith some generator, the common eigenspace corresponding to the generators as well as the error operatoris no longer the same as before (In other words, the code is affected by an error). Moreover, we get a neweigenspace (errorroneous code), orthogonal to the old one (correct code). The key point here is that theresulting code is orthogonal to the old code, which means that the errorneous code can be distinguished fromthe original correct code. Hence the error can be detected.

Let us now take up the former case: Following the same reasoning as before, if the error operator commuteswith every element of the stabilizer group, then the errorneous code cannot be distinguished from the originalcode. This includes two other situations: E ∈ S (in which case, E stabilizes the code and hence the errorhas no effect on the codewords) and E /∈ S but [E,Si] = 0, ∀Si ∈ S which implies E ∈ Z (S) − S. Thiscase is not so obvious, and shall be dealt with in the below theorem.More precisely8:

Theorem: A set of errors Ei ⊆ G is correctable iff E†k · Ej /∈ N (S)− S, ∀Ej , Ek ∈ G.

7But we see that = . So it means that after the measurement, we no longer have the same stabilizer code.8In some places, the theorem is stated by using N (S) in place of Z (S). These two are equivalent.

Assumption: For any stabilizer subgroup S of the pauli group (Π, ·), the normalizer and centralizer of that subgroup (definedin section (sec. 3.1.4)) are equal, i.e; N (S) = Z (S).


DRAFT COPY

CHAPTER 11. STABILIZER CODES 11.5. FAULT TOLERANCE

Proof: We already have the error correction condition for quantum codes in general in (eq. 10.7). Sincethe stabilizer codes are just a subset of quantum codes, it suffices to show that the condition in (eq. 10.7)is satisfied for the above case too. Let P be the projector on to the stabilizer code CS . Notice that thenon-trivial task is to prove the theorem for the case of errors in Z (S)− S. For the other cases:

• Ei ∈ S: As the error leaves the code invariant,

Ei · P · Ei = P ∀Ei ∈ S (11.6)

which is equivalent to the condition in (eq. 10.7) with E†k replaced by I. Hence we have the proof.

• E ∈ Πn − Z (S): In this case Ei takes P to an orthogonal space P = Ei · P ·E†i and hence P · P = 0

which implies P · Ei · P · E†i = 0. This can be rearranged (note: P · E · P = ±E as P ∈ (Π, ·)n and

P · P = I.) to give:

Ei · P · E†i = 0 ∀E ∈ Πn − Z (S) (11.7)

which again is equivalent to (eq. 10.7) with Γi,j = 0. Hence we have the proof.

Consider the final case: Errors in Z (S) − S cannot be distinguished as the resulting codespaces are non-orthogonal. All such errors would result in non-orthogonal codespaces. As a result, for any two errors Ej andEk the projectors (Ej · P ·E†

j and Ek · P ·E†k respectively) on the resulting codespaces would be equivalent,

upto a constant (say λ). Hence Ej · P · E†j = λEk · P · E†

k, ∀Ej , Ek ∈ (Π, ·). Noting that the elements ofthe pauli group are unitary:

Ej · P · E†j = λEk · P · E†

k

E†k · Ej · P · E†

j · Ek = λP

The above equation has two possible solutions:

• Substituting E†k · Ej = E and comparing the above equation with (eq. 11.6) we see that E†

k · Ej ∈ S.

• Putting E†k ·Ej ·P ·E†

j ·Ek we see that the equation is trivially satisfied. On substuting E = E†k ·Ej = E

and camparing with (eq. 11.7) we see that E†k · Ej ∈ Πn − Z (S).

Hence we see that E†k · Ej ∈ (S)

(Πn − Z (S)). Now notice that Πn = S

(Πn − Z (S))

(Z (S)− S).

Hence we san say that E†k · Ej /∈ (Z (S)− S) ,∀ correctable errors Ei. Hence we have proved the theo-

rem.

11.5 Fault tolerance

All operators, error operators, measurments or gates are one and the same. We saw that there are certainmethods by which a code can be recovered from an error. This error recovery step in turn uses gates on the

Justification: We can define the normalizer and the centralizer of S as:

N (S) =n

E ∈ (Π, ·) |E · Si · E† ∈ S, Si ∈ So

Z (S) =n

E ∈ (Π, ·) |E · Si · E† = Si, Si ∈ So

In the conditional statement for the normalizer we have: E ·Si ·E† ∈ S, ∀Si ∈ S, which can be rewritten as: ∃Sj ∈ S such that

E ·Si ·E† = Sj , ∀Si ∈ S. It now suffices to show that Sj = Si, ∀Si ∈ S since putting Si = Sj in the conditional statement for

the normalizer, we obtain N (S) = Z (S). From statement (stat. 1) of (sec. 11.1), we see that: E ·Si ·E† = E ·Si ·E† = ±E† ·E ·Si,

which, using the property that all elements of the pauli group are unitary (stat. 3) of (sec. 11.1), simplifies to: E ·Si ·E† = ±Si.Comparing this equation with the initial conditional statement for a normalizer we see that Sj = ±Si∀Si ∈ S. We are giventhat Si, Sj ∈ S and from (sec. 11.3) we see that −I /∈ S which implies (from closure law) −I · Si /∈ S. Hence Sj = −Si, whichsays that Si = Sj , ∀Si ∈ S. Hence we have the justification.


DRAFT COPY

11.5. FAULT TOLERANCE CHAPTER 11. STABILIZER CODES

codewords (which are again nothing but operations which can cause error). In general it is not desirable tohave a recovery operation that in turn causes error on the codewords. Hence we demand some restrictions onthese revovery operators or gates which are used in the process of error recovery. If we make the right demandsto ensure that these gates do not introduce any error, then along with our error correction implementation, wecan make the entire quantum computation process error-resistant, or more precisely, Fault Tolerant.

11.5.1 Unitary gates in the stabilizer formalism

Unitary gates are nothing but measurements on the state that they do not reveal any information about thestate (of the qubit being passed acorss the gate). In the earlier section, we saw that if a measrement A isperformed on a code, with a stabilizer S = S1, S2, . . . , Sn, (where A, S1 = 0 and [A, Si] = 0 ∀i ∈ 2 . . . n)then the stabilizer for the code after the measurement is performed becomes S =

A†S1A, S2, . . . , Sn

. The

code is therefore changed after the measurement. Moreover the new code is orthogonal to the old one sincethe unitary gate (measurement) anticommutes with one of the stabilizer generators. This can be comparedto an error in the code. But we do not expect unitary gates to induce error on the code as these gates areused for error recovery (If the error recovery process itself is erroneous, then the entire recovery operationhas no meaning.). However, if the stabilizer remains invariant after the conjugation of S1 with A, that isS = S, then we can conclude that the code would not be changed after the measurement. Hence we needthat condition that AS1A

† ∈ S for all measrements A that do not alter the code. Therefore more precisely,we are looking for all measurements in the set:

A|ASiA

† ∈ S ∀Si ∈ S. In otherwords, from (eq. 3.32) we

are looking at all the unitary gates in the normalizer of the stabilizer subgroup in Πn: N (S) where n is thenumber of qubits9.

Notice some important properties of H, Λ (X) and S:

1. Defenitions: The Hadamard, CNOT and the phase gates are given by:

H =1√2

1 11 −1

S =

1 00 i

Λ (X) =

I 00 X

=

1 0 0 00 1 0 00 0 0 10 0 1 0

(11.8)

2. Action of Hadamard gate: H on single qubit operators: X and Z:

H : Z → HZH† =12

1 11 −1

1 00 −1

1 11 −1

=

0 11 0

⇒ X (11.9a)

H : X → HXH† =12

0 11 0

1 00 −1

1 11 −1

=

1 00 −1

⇒ Z (11.9b)

3. Action of the CNOT gate: Λ (X) on single qubit operators10 Notice that it can reproduce the samenon-trivial action (as on the first qubit) on the second qubit.

Λ (X) : X ⊗ I −→ X ⊗X (11.10a)Λ (X) : I⊗ Z −→ Z ⊗ Z (11.10b)

The above operations can be expressed using the below figure:

9We needn’t assume A ∈ N (Πn) in Un (the group of all (2× 2) matrices). We only know that ASiA† ∈ S ∀Si ∈ S, whereas

action of A on elements outside the stabilizer subgroup (in Π− S) is not conditioned (and is also not relevant) by our demandfor fault tolerance and hence can be arbitrary.

10Single qubit operators, in the sense that those which act non-trivially on only one qubit. It is not the actual ”single” qubitoperator with only one tensor factor.


DRAFT COPY


Figure 11.1: Figure showing the operations of CNOT, Hadamard and Phase gates on stabilizers of qubits.We will use each wire in the circuit to represent the stabilizer stabilizing the qubit being transmitted acorssthat particular wire.

With this, we now claim that the Hadamard, Phase and the CNOT gates can be used to generate all n-qubitoperations (in the normalizer of S in Πn as well as in the normalizer of Πn in Un (the group of all 2 × 2matrices).

Theorem: All n-qubit operations in the normalizer of S in Πn as well as in the normalizer of Πn in Un, canbe generated using H,S and Λ (X) gates.

Proof:

We need to show that we can create a generating set Sn = g1, g2, . . . gn where gi ∈ σX , σZ, ∀i ∈ 1 . . . n,using H,S and Λ (X). We will now give an inductive proof.

1. Base Case, n = 1: We need to show that all single qubit operators (those with only one tensor factor)in N (S) can be expressed as a finite composition of H and S. We will make use of the fact that theseoperators are in N (S), by assuming that they are also in Z (S), from the footnote at (sec. 11.4.3).Notice that all elements (single qubit operators) of N (S) must commute with each other. This leavesus with the only option that the generating set of N (S) can only be of the form: I, σ, where σdenotes one of the pauli X,Y, Z operators. This is beacuse N (S) cannot have more that one paulioperator (as they don’t commute and hence it would imply N (S) = Z (S), thereby contradicting thefact that S is a stabilizer subgroup.). Hence to show that H and S can generate N (S), it suffices toshow that H and S can generate elements of I, X, Y, Z.

Assumption: Pauli operators X and Z can be generated using H,S.

Justification: It suffices to show that the operators X and Z can be obtained from a finite combination


DRAFT COPY


of H and S. We have:

H2 =12

1 11 −1

1 11 −1

=

12

2 00 2

⇒ I (11.11)

S2 =

1 00 i

1 00 i

=

1 00 −1

⇒ Z (11.12)

Using (eq. 11.9a) and (eq. 11.12) : HS2H = Z (11.13)

2. Base Case, n = 2: Given11 S1 = X,Z, we need to construct S2 = gi ⊗ gj |gi, gj ∈ X,Z. Let usevaluate this case explicitly, thereby presenting an algorithm to construct the each element of S2 =X ⊗X,X ⊗ Z, Z ⊗X,Z ⊗ Z, starting from the elements of S1, using H and Λ (X). Starting with X,we now introduce an ancilla qubit in an arbitrary state (the exact state is not relavant as its stabilizeris taken as I). This new state12 (call it |ψ) is trivially stabilized by X ⊗ I. Passing |ψ acorss a CNOTgate, with the first qubit as the control and the second as the target, we get a new state (say |ψ) whosestabilizer, using (eq. 11.10a) is X⊗X. Now applying the gate H to the first qubit and leaving the second,or equivalently, applying H ⊗ I to the two-qubit state |ψ (stabilized by X ⊗X), we get a new state(say |ψ = (H ⊗ I) |ψ), whose stabilizer is given by: (H ⊗ I) (X ⊗X) (H ⊗ I)† = (IXI)⊗ (HXH) ⇒X⊗Z. Similarly, applying the H gate to the second qubit of |ψ leaving the first, we get (H ⊗ I) |ψ,whose stabilizer is given by: (I⊗H) (X ⊗X) (I⊗H)† ⇒ Z ⊗X. Similarly, applying H gate to boththe qubits in |ψ, we get (H ⊗H) |ψ, which is stabilized by: (H ⊗H) (X ⊗X) (H ⊗H)† ⇒ Z ⊗ Z.Hence we see that all the elements of S2 have been obtained from elements of S1 with the help of CNOTand Hadamard operations.

3. Induction step: We assume that the generating set Sn for all n-qubit operations, each being a tensorproduct of the pauli matrices X,Z can be constructed from H and S using(stat. 1). With this, weneed to show that the generating set for all n + 1 qubit operations, Sn+1 can also be constructed usingH and Λ (X). Before this we look closely into the structure of the elements in Sn and Sn+1. LetSn = s1, s2, . . . , s2n and Sn+1 = s1, s2, . . . , s2n+1.

(a) As Sn consists of all distinct n-qubit operations, we see that sj ∈ Sn would have n tensor factors∀j ∈ 1 . . . n, moreover each of which would be either X or Z. Similarly, each si ∈ Sn+1 wouldhave n + 1 tensor factors ∀i ∈ 1 . . . n + 1.

(b) Every string in Sn falls under exactly one of the two categories:

• all its n tensor factors are X.

• at least one of its tensor factors is Z.

(c) Every string in Sn+1 can be reduced to some element in Sn by just removing exactly one tensorfactor, from some position (call this position k). Or conversely, every element of Sn+1 can beformed by adding exactly one tensor factor to some element of sn, at position some k. Moreover,we see that without loss of generality, we may assume k = n.

Assumption: ∀si ∈ Sn+1 ∃sj ∈ Sn such that: si = sj ⊗ g where g ∈ X,Z.

Justification: Consider a particular encoding from si ∈ Sn to a string over 0, 1 where every Ztensor factor in si is mapped to 0 and X to 1, thereby mapping si to a binary string bi. As everyelement in Sn is distinct and Sn has all operators with n tensor factors (of length X and Z), wesee that the image of Sn under this bijection consists of all binary strings of length n. Let this setbe Bn. As #Sn = #Bn (= 2n), we see that Sn is isomorphic to Bn. It now suffices to show thatevery element bi ∈ Bn+1 can be obtained by appending 0 or 1 to some bj ∈ Bn. To show this, weuse a systematic procedure to generate strings in Bl for any l, using a binary tree of height l. Each

11X and Z can generate all possible one qubit operations (upto an arbitrary phase) as they are the generators of the pauli

group over 1 qubit.12In the quantum circuit picture, this is equivalent to incresing the number of wires by introducing a new empty wire.


DRAFT COPY


left edge corresponds to 0 and the right to 1 and each node contains the binary string with letterscorresponding to each edge from the root to this node. Hence, at height n, each of the 2n leavescontain a (distinct, because each string corresponds to a path from the root to a particular node,which is different for different nodes) different binary sequence of length n (since the height of thetree = # edges in a path from the root to a leaf). As expected, we get all the 2n binary sequences(of length n) in Bn. Similarly, using a tree of height n + 1, we would get all sequences in Bn+1.It is clearly evident (also from the figure given below) that every string in Bn+1 corresponds to aleaf of the n + 1-height binary tree, each of which is obtained by drawing an edge from every leafof the n-height tree, which corresponds to adding the string 0 or 1 to every string in Bn.

Figure 11.2: Figure showing the construction of binary sequences of length (n + 1), represented byc1, c2, . . . c2n+1 (Or equivalently, an n-qubit operator) by appending 0 or 1 to binary sequences of length(n + 1), represented by b1, b2, . . . b2n (analogous to adding an X or a Z tensor factor to an n-qubit opera-tion.).

Hence we see that every string in Bn+1 can be obtained by appending 0 or 1 to some string in Bn,thereby justifying our assumption.

We now need to show that sj ⊗ X or sj ⊗ Z can be constructed from sj , ∀j ∈ 1 . . . n. Considerthe former case in (stat. 3b) from above. To apppend an X tensor factor, we need to first introducean ancilla qubit: |ψ (shown as a red line in the figure below), in some state with stabilizer I. Thenperform a CNOT operation with nth qubit as the control and |ψ as the target. This new 2-qubit stateoutput from CNOT, using (eq. 11.10a), is stabilized by X ⊗X. Hence, the n-qubit state stabilized bysj = X ⊗ · · ·⊗X, is changed to (n + 1)-qubit state bit introduction of an ancilla which is stabilized byX⊗ · · ·⊗X⊗ I and finally using Λ (X) operation on the nth and (n+1)th qubits, we get the final statestabilized by X ⊗ · · ·⊗X ⊗X which is sj ⊗X. To append an Z tensor factor, we apply the H gate onthe (n + 1)th qubit and using (eq. 11.9b), we get that the final state is stabilized by X ⊗ · · ·⊗X ⊗ Zwhich is sj ⊗ Z.


DRAFT COPY


Figure 11.3: Figure showing the process of creating an (n + 1)-qubit stabilizer from an n-qubit stabilizeralong with an alcilla stabilizer I, using CNOT and hadamard gates, for the case where the particular n-qubitstabilizer consists of all X tensor factors.

Hence we have shown that both sj ⊗X and sj ⊗ Z can be constructed using H and Λ (X) gates, forall si in the form described by the former case of (stat. 3b).

We now consider the latter case of (stat. 3b) where we are given the existence of atleast one Z tensorfactor, in the operator sj ∈ Sn. We use this Z along with I (stabilizer for the ancilla qubit) and (eq.11.10b), to convert the two qubit state stabilized by I ⊗ Z to one stabilized by Z ⊗ Z. For this, weintroduce an ancilla state |ψ (or an empty wire in the quantum circuit) which is given as a controlqubit to a CNOT gate, the target being the state stabilized by Z. From the action of CNOT gate,we see that this gives a state that is stabilized by Z ⊗ Z. Hence we have appended a Z tensor factor,thereby creating sj ⊗Z. We now need to append X tensor factor to sj . For this, we pass the output ofthe control qubit of the CNOT gate (stabilized by Z) across a H gate. Using (eq. 11.9a), we see thatthe output state would be stabilized by X, thereby making the final (added) tensor factor as X. Thus,we have the new stabilizer generator sj ⊗X.

Figure 11.4: Figure showing the process of creating an (n + 1)-qubit stabilizer from an n-qubit stabilizeralong with an alcilla stabilizer I, using CNOT and hadamard gates, for the case where the particular n-qubitstabilizer consists of at least one Z tensor factor.

Hence we have shown that both X and Z tensor factors can be appended to sj ∈ Sn, ∀sj with structureas described by the latter case of (stat. 3b).


DRAFT COPY


Hence we have shown that X and Z tensor factors can be appended to any element in Sn and therebyobtaining all elements in Sn+1. Moreover, this can be done using only Hadamard and CNOT gates. Hencewe see that the generating set Sn can be constructed using H,S and Λ (X), for any n, thereby proving thetheorem.


DRAFT COPY



DRAFT COPY

Part VII

References

165

DRAFT COPY

DRAFT COPY

Bibliography

[1] V. Arvind, K.R. Parthasarathy, A Family of Quantum Stabilizer Codes Based on the Weyl CommutationRelations over a Finite Field http://arxiv.org/abs/quant-ph/0206174v1

[2] Upper Bounds on the Size of Quantum Codes, Alexei Ashikhmin and Simon Litsyn, IEEE Trans-actions on Information Theory, Issue Date: May 1999, Volume: 45, Issue:4, Digital Ob-ject Identifier: 10.1109/18.761270, http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=761270, Also available at http://www.physics.princeton.edu/~mcdonald/examples/QM/ashikhmin_ieeetit_45_1206_99.pdf

[3] Daniel A. Lidar, (University of Toronto), K. Birgitta Whaley, (University of California, Berkeley),Decoherence-Free Subspaces and Subsystems http://arxiv.org/abs/quant-ph/0301032v1

[4] David P. DiVincenzo (IBM), Peter W. Shor (AT & T), Fault-Tolerant Error Correction with EfficientQuantum Codes http://lanl.arxiv.org/abs/quant-ph/9605031v2

[5] Peter W. Shor (AT & T Research), Fault-tolerant quantum computation http://arxiv.org/abs/quant-ph/9605011v2

[6] John Preskill, Fault-tolerant quantum computation http://lanl.arxiv.org/abs/quant-ph/9712048v1

[7] Daniel Gottesman, A Theory of Fault-Tolerant Quantum Computation, http://lanl.arxiv.org/abs/quant-ph/9702029v2

[8] Stabilizer Codes and Quantum Error Correction, (PhD. Thesis submitted by) Daniel Gottesman, Cali-fornia Institute of Technology, Pasadena, California, (Submitted: May 21, 1997), http://arxiv.org/pdf/quant-ph/9705052v1

[9] Web Article: Bounds on the parameters of a code, http://www.usna.edu/Users/math/wdj/book/node125

[10] Lectures Notes for: Chapter 7: Quantum Error Correction, Prof. John Preskill (Physics 219/ComputerScience 219 Quantum Computation (Formerly Physics 229)), http://www.theory.caltech.edu/people/preskill/ph229/notes/chap7.pdf

[11] Group representation theory and quantum physics, http://chaos.swarthmore.edu/courses/Physics093_2009/Lectures/GroupTheoryC0.pdf

[12] Lectures Notes: Lecture 14: Error-Correcting Codes, Prof. Salil Vadhan, (Based on scribe notes bySasha Schwartz and Adi Akavia, April 3, 2007.) http://people.seas.harvard.edu/~salil/cs225/lecnotes/lec14.pdf

167

http://arxiv.org/abs/quant-ph/0206174v1

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=761270

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=761270

http://www.physics.princeton.edu/~mcdonald/examples/QM/ashikhmin_ieeetit_45_1206_99.pdf

http://www.physics.princeton.edu/~mcdonald/examples/QM/ashikhmin_ieeetit_45_1206_99.pdf


http://lanl.arxiv.org/abs/quant-ph/9605031v2






http://arxiv.org/pdf/quant-ph/9705052v1

http://arxiv.org/pdf/quant-ph/9705052v1

http://www.usna.edu/Users/math/wdj/book/node125

http://www.usna.edu/Users/math/wdj/book/node125

http://www.theory.caltech.edu/people/preskill/ph229/notes/chap7.pdf

http://www.theory.caltech.edu/people/preskill/ph229/notes/chap7.pdf

http://chaos.swarthmore.edu/courses/Physics093_2009/Lectures/GroupTheoryC0.pdf

http://chaos.swarthmore.edu/courses/Physics093_2009/Lectures/GroupTheoryC0.pdf

http://people.seas.harvard.edu/~salil/cs225/lecnotes/lec14.pdf

http://people.seas.harvard.edu/~salil/cs225/lecnotes/lec14.pdf

DRAFT COPY

BIBLIOGRAPHY BIBLIOGRAPHY

[13] Lecture Notes: Lecture 1: Introduction, Error Correcting Codes: Combinatorics, Algorithms and Ap-plications (Fall 2007), August 27, 2007, Lecturer: Atri Rudra Scribe: Atri Rudra, http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect1.pdf

[14] Lecture Notes: Lecture 2: Error Correction and Channel Noise, Error Correcting Codes: Combinatorics,Algorithms and Applications (Fall 2007), August 29, 2007, Lecturer: Atri Rudra Scribe: Yang Wang &Atri Rudra, http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect2.pdf

[15] Lecture Notes: Lecture 3: Error Correction and Distance, Error Correcting Codes: Combinatorics,Algorithms and Applications (Fall 2007), August 31, 2007, Lecturer: Atri Rudra Scribe: Michael Pfetsch& Atri Rudra, http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect3.pdf

[16] Lecture Notes: Notes 2: Gilbert-Varshamov bound, Introduction to Coding Theory CMU: Spring 2010,January 2010, Lecturer: Venkatesan Guruswami, Scribe: Venkatesan Guruswami, http://www.cs.cmu.edu/~venkatg/teaching/codingtheory/notes/notes2.pdf

[17] Lecture Notes: Notes 4: Elementary bounds on codes, Introduction to Coding Theory CMU: Spring2010, January 2010, Lecturer: Venkatesan Guruswami, Scribe: Venkatesan Guruswami, http://www.cs.cmu.edu/~venkatg/teaching/codingtheory/notes/notes4.pdf

[18] Lecture Notes: The Orbit-Stabilizer Theorem Rahbar Virk, Department of Mathematics, University ofWisconsin, Madison, WI 53706 [email protected] http://www.math.wisc.edu/~virk/notes/pre08/pdf/orbit-stabilizer_thm.pdf

[19] A. M. Steane, Error Correcting Codes in Quantum Theory, Clarendon Laboratory, Parks Road, Oxford,OX1 3PU, England Physical Review Letters, Volume 77 29 July 1996 Number 5, (Received 4 October1995) http://prl.aps.org/abstract/PRL/v77/i5/p793_1

[20] BOOK: Information, physics, and computation (Oxford graduate texts), Marc Mezard and Andrea

Montanari, Oxford University Press, 2009 - 569 pages, ISBN : 019857083X, 9780198570837 http://www.stanford.edu/~montanar/BOOK/book.html

[21] Web Resource: Group Theory definitions and concepts from Wikipedia Wikipedia List of Group TheoryTopics

[22] Web Resource: The Lindblad Master Equation, Andrew Fisher, Department of Physics and Astronomy,University College London, http://www.cmmp.ucl.ac.uk/~ajf/course_notes/node36.html

[23] Presentation on Shor Code, Marek Andrzej Perkowski, ECE 510 - Quantum Computing, School of Engi-neering and Computer Science, Portland State University, Course Schedule, Spring 2005, www.ee.pdx.edu/~mperkows/CLASS_FUTURE/2005-quantum/2005-q-0018.error-models-9-bit-Shor.ppt

[24] BOOK: Quantum computation and quantum information, Michael A. Nielsen, Isaac L. Chuang, CambridgeUniversity Press, 2000 - Science - 676 pages, http://books.google.com/books?id=65FqEKQOfP8C&dq=nielsen+and+chuang&source=gbs_navlinks_s

[25] BOOK: Information theory, Robert B. Ash, Courier Dover Publications, 1990 - Computers -339 pages, http://books.google.com/books?id=ngZhvUfF0UIC&dq=Information+Theory+Robert+B+Ash&source=gbs_navlinks_s


http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect1.pdf




http://www.cs.cmu.edu/~venkatg/teaching/codingtheory/notes/notes2.pdf





http://www.math.wisc.edu/~virk/notes/pre08/pdf/orbit-stabilizer_thm.pdf

http://www.math.wisc.edu/~virk/notes/pre08/pdf/orbit-stabilizer_thm.pdf

http://prl.aps.org/abstract/PRL/v77/i5/p793_1

http://www.stanford.edu/~montanar/BOOK/book.html

http://www.stanford.edu/~montanar/BOOK/book.html

http://en.wikipedia.org/wiki/List_of_group_theory_topics

http://en.wikipedia.org/wiki/List_of_group_theory_topics

http://www.cmmp.ucl.ac.uk/~ajf/course_notes/node36.html

www.ee.pdx.edu/~mperkows/CLASS_FUTURE/2005-quantum/2005-q-0018.error-models-9-bit-Shor.ppt

www.ee.pdx.edu/~mperkows/CLASS_FUTURE/2005-quantum/2005-q-0018.error-models-9-bit-Shor.ppt

http://books.google.com/books?id=65FqEKQOfP8C&dq=nielsen+and+chuang&source=gbs_navlinks_s

http://books.google.com/books?id=65FqEKQOfP8C&dq=nielsen+and+chuang&source=gbs_navlinks_s

http://books.google.com/books?id=ngZhvUfF0UIC&dq=Information+Theory+Robert+B+Ash&source=gbs_navlinks_s

http://books.google.com/books?id=ngZhvUfF0UIC&dq=Information+Theory+Robert+B+Ash&source=gbs_navlinks_s

DRAFT COPY


[26] BOOK: Coding theorems of classical and quantum information theory, K. R. Parthasarathy, Hin-dustan Book Agency, 2007 - 158 pages, http://books.google.com/books?id=miu8PAAACAAJ&dq=Coding+Theorems+Of+Quantum+and+Classical&hl=en&ei=zQNHTIveJ4KmvQP_m5XNAg&sa=X&oi=book_result&ct=result&resnum=8&ved=0CEwQ6AEwBw

[27] BOOK: Quantum computing: from linear algebra to physical realizations, Mikio Nakahara, Tetsuo Ohmi,CRC Press, 2008 - Computers - 421 pages, http://books.google.com/books?id=VdHJsTdoyAMC&dq=Quantum+Computing+Nakahara&source=gbs_navlinks_s

[28] Collected Work: (1989) Fundamental Theories of Physics, Gravitation, Gauge Theories and The EarlyUniverse B.R Iyer, N.Mukunda and C.V Vishveshwara

Table 11.1: List of References

S/No Text Author Other Details1 Information Theory Robert B. Ash Dover Publications2 Elements of Information Theory Cover and Thomas McGraw Hill3 Quantum Computing Jozef Gruska McGraw Hill4 Introduction to Coding Theory Van Lint –5 Introduction To Quantum Mechanics David J Griffiths –6 Modern Quantum Mechanics Jun John Sakhurai –7 Fundamentals of Quantum Mechanics R.Shankar –8 Quantum Computation and Quantum Information Neilson and Chuang –9 Quantum Computing Mikio Nakahara and Ohmi –10 Preskill’s Lecture Notes John Preskill –11 Nilanjana Dutta’s Lecture Notes Nilanjana Dutta –12 Introduction to the Theory of Computation Michel Sipser –13 Automata Theory Dexter C Kozen –14 Automata and Languages Hopcroft and Uhlmann –


http://books.google.com/books?id=miu8PAAACAAJ&dq=Coding+Theorems+Of+Quantum+and+Classical&hl=en&ei=zQNHTIveJ4KmvQP_m5XNAg&sa=X&oi=book_result&ct=result&resnum=8&ved=0CEwQ6AEwBw



http://books.google.com/books?id=VdHJsTdoyAMC&dq=Quantum+Computing+Nakahara&source=gbs_navlinks_s

http://books.google.com/books?id=VdHJsTdoyAMC&dq=Quantum+Computing+Nakahara&source=gbs_navlinks_s

DRAFT COPY



DRAFT COPY

Appendix A

Solutions to Exercises

Exercises in Chapter-2,Book:Quantum computation and quantum information by Michael A. Nielsen,Isaac L. Chuang

2.13 Given |v, |w two vectors. To show: (|wv|)† = |vw|.

LHS : (|wv|)† = (v|)†(|w)† ⇒ (|vw).

Hence, proved.

2.14 To show:

i

aiAi

†

=

i

a∗i A†i

LHS :

i

aiAi

†

⇒

i

(aiAi)† ⇒

i

a∗i A†i .

Hence, proved.

2.15 To showA†

†= A.

a|A† = A|a (A.1)a|A†

†= (A|a)†

A†

†(a|)† = a|A†

A†

†|a = a|A† (A.2)

On comparing equations (eq. A.1) and (eq. A.2):

A|a =A†

†|a

Therefore, we make the operator relation: A =A†

†

Hence, proved.

2.16 To show that any projector operator P satisfies the relation P2 = P . This can be obtained from thegeneralized form Pn = P proved by considering Pk = |ekek|. So,

(Pk)n = |ekek|ekek|ekek| . . . |ekek|.

171

DRAFT COPY

APPENDIX A. SOLUTIONS TO EXERCISES

Now, we can simplify this statement by taking inner product. We use the fact that |ekek| = 1. Onlythe terminal terms |ek and ek| will remain after all the others become 1 through formation of innerproduct. ∴ the RHS will be Pk. Hence, proved.

2.17 To Show that a Normal matrix is Hermitian if and only if it has real eigen values .

Let A be a hermitian operator.A|a = a|a

⇒ a|A|a = aa|a (A.3)

Also: a|A† = a|a∗

⇒ a|A†|a = a∗a|a (A.4)subtracting equation (eq. A.3) from equation (eq. A.4):

a|(A† −A)|a = (a∗ − a)a|a (A.5)

Since A† = A ⇒ a∗ = a, and therefore, a is real. (A.6)

Hence, proved.

2.18 To show that all the eigen values of a unitary operator are of the form eiθ for some θ .

Let U be an unitary operator with eigen value a and eigen vector a|.U|a = a|a (A.7)

a|U† = a|a∗ (A.8)On taking an inner product of equation (eq. A.7) with equation (eq. A.8), we get:

a|U†U|a = |a|2a|aSince, U is unitary: U†U = I. Also eigen vectors are orthonormal.

∴ 1 = |a|2 ⇒ a ≡ eiθ

Hence, proved.

2.19 To show that the Pauli matrices are unitary and Hermitian .

We have: X = (|+−|+ |−+|)∴ X† = (|−+|+ |+−) Since X† = X,X is Hermitian.

⇒ X†X = |−+|+−|+ |+−|+−|+ |−+|−+|+ |−+|+−|∴ X†X = |++|+ |++|⇒ I. Therefore, X is Unitary.

X = (|+−|− |−+|) Z = (|−−|− |++|)

2.20 Given |vi and |wi are two orthonormal basis. A’ and A” are two matrix representations of the operatorA in the two basis respectively. Aij = vi|A|vj , Aij = wi|A|wj. To find: the relation between A

and A .

Define a unitary operator U such that: U|vi = |wi and U|vj = |wj∴ wi| = vi|U† and |wj = U†|vj

Aij = wi|A|wj ⇒ vi|U†AU|vj∴ if A → U†AU, then A” → A’

∴ A” = U†A’U.


DRAFT COPY


2.22 To show that the eigen vectors of a hermitian operator, with different eigen values, are orthonormal .

Let A be an hermitian operator with eigen values a and a respectively .A|a = a|a

a|A|a = aa|a (A.9)A|a = a|a

a|A|a = aa|aa|A†|a = (a)∗a|a (A.10)

on subtracting the equation (eq. A.9) from equation (eq. A.10):

a|(A−A†)|a = (a − a∗)a|a (A.11)from equation (eq. A.11), we see that either a − a∗ = 0 or a|a = 0 .

Since it is given that a and a are different, we have a|a = 0 .

Hence, proved .

2.23 To show that all the eigen values of the Projector operator are either 1 or 0 .

Let P be a projection operator with eigenvector |a and eigenvalue a.P|a = a|a (A.12)

∴ P2|a = a2|a (A.13)on subtracting: (eq. A.13) - (eq. A.12)

(P2 −P)|a = (a2 − a)|asince, P2 = P, RHS = 0. Since |a = 0 we have:

(a2 − a) = 0 ⇒ a(a− 1) = 0. ∴ a = 0 or a = 1. (A.14)

Hence, proved .

2.24 To show that a positive operator is always Hermitian .

A|a = a|aa|A|a = aa|a (A.15)

also, (a|A|a)∗ = a∗a|aa|A†|a = a∗a|a (A.16)

since A is a positive operator, a = a∗ .

a|A†|a = a|A|a = a|A|a∴ A = A† ⇒ A is Hermitian .

2.25 To show that for any operator A, AA† is positive .

Let A be some operator with eigen value a and eigen vector |a .A|a = a|a (A.17)

a|A† = a|a∗ (A.18)taking an inner product of equation (eq. A.18) with equation (eq. A.18)

a|A†A|a = |a|2a|a ⇒ (A†A) is a +ve operator .

Hence, proved .


DRAFT COPY


2.26 Given |ψ =|0+ |1√

2. To find |ψ⊗2 and |ψ⊗3, both in terms of Kronecker product and outer product

form .

Outer product notation:

|ψ =|0+ |1√

2

|ψ⊗2 = |ψ ⊗ |ψ ⇒ |0+ |1√2

⊗ |0+ |1√2

∴ |ψ⊗2 =|00+ |01+ |10+ |11

2(A.19)

Similarly: |ψ⊗3 =|0+ |1√

2⊗ |0+ |1√

2⊗ |0+ |1√

2

⇒ |ψ⊗2 ⊗ |ψ ⇒ |00+ |01+ |10+ |112

⊗ |0+ |1√2

∴ |ψ⊗3 =|000+ |001+ |010+ |011+ |100+ |101+ |110+ |111

2√

2(A.20)

(A.21)Kronecker product form:

|0 =

01

|1 =

10

∴ |ψ =1√2

01

+

10

⇒ 1√

2

11

∴ |ψ⊗2 =12

11

11

⇒ 1

2

1111

(A.22)

Similarly: |ψ⊗3 = |ψ⊗2 ⊗ |ψ

|ψ⊗3 =12

1111

1√

2

11

⇒ 1

2√

2

111111

(A.23)


DRAFT COPY


2.27 Given Pauli matrices, compute (a).X⊗ Z (b).I⊗X (c).X⊗ I .

X =

0 11 0

Y =

0 −ii 0

Z =

1 00 −1

I =

1 00 1

(A.24)

(a). X⊗ Z =

0 11 0

⊗

1 00 −1

⇒

0 0 1 00 0 0 −11 0 0 00 −1 0 0

(A.25)

Similarly Z⊗X =

1 00 −1

⊗

0 11 0

⇒

0 1 0 01 0 0 00 0 0 −10 0 −1 0

(A.26)

∴ The tensor product is not commutative.

(b). I⊗X =

1 00 1

⊗

0 11 0

⇒

0 1 0 01 0 0 00 0 0 10 0 1 0

(A.27)


DRAFT COPY


2.28 To show that (a). (A⊗B)∗ = A∗ + B∗ (b). (A⊗B)T = AT ⊗BT (c). (A⊗B)† = A† ⊗B† .

Let A =

A11 A12 A13 . . . A1n

A21 A22 A23 . . . A2n

A31 A32 A33 . . . A3n

. . . . . . .

. . . . . . .

. . . . . . .Am1 Am2 Am3 . . . Amn

and B be any matrix.

∴ A⊗B =

A11B A12B A13B . . . A1nBA21B A22B A23B . . . A2nBA31B A32B A33B . . . A3nB

. . . . . . .

. . . . . . .

. . . . . . .Am1B Am2B Am3B . . . AmnB

(a). (A⊗B)∗ =

A∗11B∗ A∗12B

∗ A∗13B∗ . . . A∗1nB∗

A∗21B∗ A∗22B

∗ A∗23B∗ . . . A∗2nB∗

A∗31B∗ A∗32B

∗ A∗33B∗ . . . A3nB

. . . . . . .

. . . . . . .

. . . . . . .A∗m1B

∗ A∗m2B∗ A∗m3B

∗ . . . AmnB

But RHS =

A∗11 A∗12 A∗13 . . . A∗1n

A∗21 A∗22 A∗23 . . . A∗2n

A∗31 A∗32 A∗33 . . . A3n

. . . . . . .

. . . . . . .

. . . . . . .A∗m1 A∗m2 A∗m3 . . . A∗mn

B∗

∴ RHS = A∗ ⊗B∗

(b). (A⊗B)T =

A11B A12B A13B . . . A1nBA21B A22B A23B . . . A2nBA31B A32B A33B . . . A3nB

. . . . . . .

. . . . . . .

. . . . . . .Am1B Am2B Am3B . . . AmnB

T

RHS =

A11 A21 A31 . . . Am1

A12 A22 A32 . . . Am2

A13 A23 A33 . . . Am3

. . . . . . .

. . . . . . .

. . . . . . .A1n A2n A3n . . . Amn

BT

2.33 To show that the hadamard operator which has the form H =1√2

[(|0+ |1)0|+ (0 − |1)1|] for a


DRAFT COPY


single qubit system, can be generalized for the n-qubit system as: H⊗n =1√2n

x,y

(−1)x.y|xy| .

We try to prove this result by induction:

H⊗n =1√2n

x,y

(−1)x.y|xy|

for n = 1 : H⊗n =1√2

(|00|+ |01|+ |10|+ |11|)

For n = k : H⊗k =1√2k

x,y

(−1)x·y|xy|

To prove: H⊗k+1 =1√

2k+1

x,y

(−1)x·y|xy|

To prove the above statement for the (k + 1) case, we can consider the n = k case, and multiply H onboth sides to that equation. For this we may take the definition of H from the n = 1 case.

H⊗k ⊗H =

1√2k

x,y

(−1)x·y|xy|

1√2(|00|+ |01|+ |10|+ |11|)

On expanding the RHS of the above, equation we have:

|00|

x,y

(−1)x·y|xy|+ |01|

x,y

(−1)x·y|xy|+ |10|

x,y

(−1)x·y|xy|+ |11|

x,y

(−1)x·y|xy|

⇒

x,y

(−1)x·y|00|xy|+

x,y

(−1)x·y|01|xy|+

x,y

(−1)x·y|10|xy|+

x,y

(−1)x·y|11|xy|

In all of the four terms of the RHS, we see that the inner product forces certain definite values for xand y. In the first term, we see that 0|x forces the value of x to be 0 if the product must be nonvanishing. So, putting x = 0 in the expression (−1)x·y, we get (−1)0·y = 1 . Similarly, we can see thesecond term. Here x is forced to be 1. On summing up, we get the following expression for the RHS:

⇒ 1√2k+1

y

|0y|+

y

(−1)y|0y|+

y

|1y|+

y

(−1)y|1y|

⇒ RHS :1√

2k+1

x

y

(−1)x·y|xy|

LHS : H⊗k ⊗H ⇒ H⊗k+1

∴ H⊗k+1 =

x,y

(−1)x·y|xy|

Hence, proved .

2.34 To find the logarithm and the square root of the matrix:

4 33 4

.

Let M be the given matrix. Let the eigenvalues be λ1 and λ2 and the corresponding eigenvectors be|λ1 and |λ2 respectively. In order to perform any operation on the matrix, we must first write theoperator as a linear superposition of its projection operators. In order to find the projection operators,we need to determine the eigenvectors. We can start by solving the secular equation.

∴ det|M− λI| = 0 ⇒ ((4− λ)2 − 9) = 0 ⇒ (4− λ− 3)(4 + λ + 3) = 0the solutions of λ are the different eigenvalues: ∴ λ1 = 1, λ2 = 7


DRAFT COPY


Eigenvectors:

To find |λ1:

4 33 4

xy

= 1

xy

⇒ 4x + 3y = x and 3x + 4y = y ∴ −x = y ⇒ |λ1 =

1−1

To find |λ2:

4 33 4

xy

= 7

xy

⇒ 4x + 3y = 7x and 3x + 4y = 7y ∴ x = y ⇒ |λ2 =

11

Now that we have found the eigenvectors, we can construct the projection operators and express M asan outer product:

|λ1λ1| =

1−1

1 −1

⇒

1 −1−1 1

Similarly, |λ2λ2| =

11

1 1

⇒

1 11 1

From the completeness relation for M, we have M = λ1|λ1λ1|+ λ2|λ2λ2|.

M = 1

1 −1−1 1

+ 7

1 11 1

∴√

M =√

1

1 −1−1 1

+√

7

1 11 1

⇒ ±1

1 −1−1 1

+√

7

1 11 1

Also: log M = log 1

1 −1−1 1

+ log 7

1 11 1

⇒ log 7

1 11 1

2.35 Given: −→v is some real three dimensional unit vector and θ, a real number. To show that: eiθ−→v ·σ =

(cos θ)I + i(sin θ)−→v · σ, where −→v · σ =3

i=1

viσi. Here σi’s denote the pauli matrices.

−→v · σ = v1σ1 + v2σ2 + v3σ3 ⇒

v3 v1 − iv2

v1 + iv2 −v3

∴ (−→v · σ)2 =

v3 v1 − iv2

v1 + iv2 −v3

v3 v1 − iv2

v1 + iv2 −v3

⇒

v3(v1 − iv2) + (v1 − iv2)(−v3) v21 + v2

2 + v23

v21 + v2

2 + v23 (v1 + iv2)v3 − v3(v1 + iv2)

Since, −→v is an unit vector, v21 + v2

2 + v23 = 1 .

∴ (−→v · σ)2 =

0 11 0

⇒ I

∴ (−→v · σ)n =(−→v · σ)2

n2 ⇒ [I]

n2 ⇒ I where n ∈ 2, 4, 6, . . . (A.28)

From the series expansion for the exponential, we have that:

eiθ−→v ·σ = (θ−→v · σ)0 +iθ−→v · σ

1!− (θ−→v · σ)2

2!− i (θ−→v · σ)3

3!+

(θ−→v · σ)4

4!+ . . . . . .

From equation (eq. A.28): eiθ−→v ·σ = I +iθ−→v · σ

1!− θ

2!I− i (θ−→v · σ)

3!+

θ

4!I + . . . . . .

On rearranging the terms: eiθ−→v ·σ =

1− θ

2!+

θ

4!

I + i−→v · σ

θ − θ

3!+ . . . . . .

∴ eiθ−→v ·σ = (cos θ) I + i−→v · σ (sin θ)


DRAFT COPY


Hence, proved.

2.36 To show that all pauli matrices except I have trace zero.

X =

0 11 0

⇒ tr(X) = 0 + 0 = 0

Y =

0 −ii 0

⇒ tr(Y) = 0 + 0 = 0

Z =

1 00 −1

⇒ tr(Z) = 1 +−1 = 0

I =

1 00 1

⇒ tr(I) = 1 + 1 = 2

Therefore, we see that the pauli matrices X, Y and Z have trace zero and I has trace two.

2.37 To show the cyclic property of the trace . That is, tr(AB) = tr(BA) .

Trace is defined as: tr(A) =

i

i|A|i (A.29)

∴ tr(AB) = i|AB|i

Inserting a complete set of states: tr(AB) =

i

i|A

j

|jj|

B|i

⇒ tr(AB) =

i

j

(i|A|j)(j|B|i)

The two entities inside the summation, on the RHS are numbers, we can interchange them:

∴ tr(AB) =

i

j

(j|B|i)(i|A|j)

⇒ tr(AB) =

i

j

(j|A|i)(i|A|j)

Using the completeness relation to sum over i: tr(AB) =

j

j|BA|j

⇒ tr(AB) = BA

Hence, proved . Note: While computing trace, we have: tr(A) =

i

i|A|i where i is just some

“dummy” index. It may as well be replaced by some other j, still preserving the value of trace. Also,the trace is independent of the basis used to express the matrix.

2.38 To show the linearity of trace, i,e; tr(A + B) = tr(A) + B and the the property tr(zA) = ztr(A)

We take the definition of trace from equation (eq. A.29):

∴ tr(A + B) =

i

i|(A + B)|i

⇒ tr(A + B) =

i

i|A|i+

i

i|B|i

∴ tr(A + B) = tr(A) + tr(B)

Hence, proved. The other statement can also be proved:

again, from definition (eq. A.29): tr(zA) =

i

i|zA|i

Since z is a number, it can be taken out of the summation: tr(A + B) = ztr(A)


DRAFT COPY


2.39 To verify The Hilbert-Schmidt inner product on operators. The set of linear operators on ahilbert space also forms a vector space over the field of complex numbers, as it satisfies the criterion ofclosure under “operator addition” and under scalar multiplication. The inner product of the elementsof such a vector space is called a Hilbert-Schmidt inner product.To show show that (·, ·) defined by (A,B) = tr(A†B) is an inner product.

2.40 To verify the commutation relations for the pauli matrices.

We have the pauli matrices defined as in equation (eq. A.24). So, we have:

[X,Y] = XY−YX =

0 11 0

0 −ii 0

−

0 −ii 0

0 11 0

=

2i 00 −2i

⇒ 2iZ

[Y,Z] = YZ− ZY =

0 −ii 0

1 00 −1

−

1 00 −1

0 −ii 0

=

0 2i2i 0

⇒ 2iX

[Z,X] = ZX−XZ =

1 00 −1

0 11 0

−

0 11 0

1 00 −1

=

0 2−2 0

⇒ 2iY

Hence, proved .

2.41 To verify the anti-commutation relations for the pauli matrices.

We take the definition of the pauli matrices from equation (eq. A.24).The anti-commutator is defined as: A,B = AB + BA:

X,Y =

0 11 0

0 −ii 0

+

0 −ii 0

0 11 0

⇒

i 00 −i

+

−i 00 i

⇒ 0

Y,Z =

0 −ii 0

1 00 −1

−

1 00 −1

0 −ii 0

⇒

0 ii 0

+

0 −i−i 0

⇒ 0

Z,X =

1 00 −1

0 11 0

+

0 11 0

1 00 −1

⇒

0 1−1 0

+

0 −11 0

⇒ 0

We can also verify that σ2i = I.

2.42 To verify that: AB =[A,B] + A,B

2Taking the RHS, and using the definition of the commutator and the anticommutator, we have:

RHS :[A,B] + A,B

2⇒ AB−BA + AB + BA

2⇒ AB

LHS = RHS

Hence, verified.

2.43 To show that σiσj = δijI + i3

i=1

ijkσi., where σi, σj and σk are the pauli matrices.

We can make use of the statement of the previous problem. σiσj =[σi, σj ] + σi, σj

2

Since [σi, σj ] = 2i3

i=1

ijkσk and σi, σj = δijI.

On plugging in the definitions, we get:

σiσj =δijI + 2i

3i=1 ijkσk

2⇒ 1

2δijI + i

3

i=1

ijkσk

Hence, proved .


DRAFT COPY


2.44 Given A and B are invertible operators. To show that B = 0 if [A,B] = 0andA,B = 0 .

Since, from problem [2.42] we have AB =[A,B] + A,B

2:

∴ AB = 0 .

Since A is invertible, it cannot be = 0. ⇒, B = 0. Hence, proved .

2.45 To show that [A,B]† = −[B†,A†] .

[A,B]† = (AB−BA)† ⇒ (AB)† − (BA)†

∴ RHS : B†A† −A†B† ⇒ [B†,A†]∴ RHS = LHS

Hence, proved.

2.46 To show that [A,B] = −[B,A]

RHS : [A,B] = AB−BA = −(BA−AB) ⇒ −[B,A]∴ RHS = LHS

Hence, proved .

2.47 Given A and B are Hermitian. To show that i[A,B] is also Hermitian

To show that i[A,B] is Hermitian, we must show that:

(i[A,B])† = i[A,B] (A.30)

From problem [2.45], we have that [A,B]† = [B†,A†]

given that A and B are hermitian, we can say [A,B]† = [B,A] ⇒ [A,B]† = −[A,B] (A.31)

From the LHS of equation (eq. A.30), we have: (i[A,B])† = −i([A,B])†

Using equation (eq. A.31) on the RHS, we get: (i[A,B])† = i[A,B] ⇒ i[A,B] is Hermitian.

Hence, proved .

2.48 To find the polar decomposition of a (a).Positive matrix P (b).Hermitian matrix H (c).Unitary matrixU .To find the polar decomposition of a matrix A, we need to find positive matrices J =

A†A and

K =

AA† and a unitary matrix T such that A = TJ = KT.

(a). J =

P†P K =

PP†

since, a positive matrix is hermitian too, we have P† = P: J = K = P∴ Right polar decomposition: TP and Left polar decomposition: PT,

for some unitary operator T(b). Same as above. Right polar decomposition: HT , Left polar decomposition: TH

(c). J =

U†U and K =

UU†

Since, U is unitary: J = I and K = I∴ Left polar decomposition: IT , Right polar decomposition: TI


DRAFT COPY


2.49 To express the polar decomposition of a Normal matrix in outer product form:

let A be some normal operator having a spectral decomposition. A =

i

λi|λiλi| (A.32)

For finding the polar decompositions, we need: J =

A†A and K =

AA†

Since A is normal, from equation (eq. A.32): J = K =

i

|λi|2|λiλi| (A.33)

—————— Solution is still incomplete —————————–

2.50 To find the right and left polar decompositions of the matrix

1 01 1

Let J and K be positive matrices: J =

A†A K =

AA†

We have: A =

1 01 1

⇒ A† =

1 10 1

∴ AA† =

1 01 1

1 10 1

⇒

1 11 2

similarly, A†A =

1 10 1

1 01 1

⇒

2 11 1

To take the square roots of the matrices A†A and AA†, we must consider the operation on theireigenvalues. To find their eigenvalues, we need to solve the secular equation.

Eigenvalues of A†A: det|A− λ1I|⇒ (1− λ1)(2− λ1)− 1 = 0

⇒ λ21 − 3λ1 + 1 = 0 ⇒ λ1 =

3 +√

52

or λ1 =3−

√5

2

λ1 =3 +

√5

2:

2 11 1

xy

= λ1

xy

⇒ 2x + y = λ1x and x + y = λ1y

∴ |λ1 =

01

λ1 =3−

√5

2:

2 11 1

xy

= λ1

xy

⇒ 2x + y = λ1x and x + y = λ1y

∴ |λ1 =

10

since, A†A =

i

λi|λiλi|: A†A = 1

Eigenvalues of AA†: det|A− λ2I| = 0 ⇒ (1− λ2)(2− λ2) = 0 ⇒ λ2 = 1 or λ2 = 2

λ2 = 1:

1 11 2

xy

= 1

xy

⇒ x + y = x and x + 2y = y ∴ |λ2 =

10

λ2 = 2:

1 11 2

xy

= 2

xy

⇒ x + y = 2x and x + 2y = 2y ∴ |λ2 =

00


introduction to quantum computing and quantum...

Documents