aclosestvectorproblemarisingin radiationtherapyplanningoptimal solution of a natural lp relaxation...

arX

iv:0

907.

0138

v1 [

cs.D

M]

1 J

ul 2

009

A closest vector problem arising in

radiation therapy planning∗

Celine Engelbeen†, Samuel Fiorini‡, Antje Kiesel§

February 23, 2019

Abstract

In this paper we consider the problem of finding a vector that can be written asa nonnegative, integer and linear combination of given 0-1 vectors, the generators,such that the ℓ1-distance between this vector and a given target vector is minimized.We prove that this closest vector problem is NP-hard to approximate within anadditive error of (ln 2− ǫ)d ≈ (0, 693 − ǫ) d for all ǫ > 0 where d is the dimension ofthe ambient vector space. We show that the problem can be approximated withinan additive error of

(

e4 +

ln 22

)

d3/2 ≈ 1.026 d3/2 in polynomial time, by rounding anoptimal solution of a natural LP relaxation for the problem. We also give a proofthat in the particular case where the vectors satisfy the consecutive ones property,the problem can be formulated as a min-cost flow problem, hence can be solved inpolynomial time.

The closest vector problem arises in the elaboration of radiation therapy plans.In this context, the target is a nonnegative integer matrix and the generators arecertain binary matrices whose rows satisfy the consecutive ones property. Here wemainly consider the version of the problem in which the set of generators comprisesall those matrices that have on each nonzero row a number of ones that is at least acertain constant. This set of generators encodes the so-called minimum separation

constraint.

Keywords: closest vector problem, decomposition of integer matrices, consecutiveones property, minimum separation constraint, radiation therapy.

∗This research is supported by an “Actions de Recherche Concertees” (ARC) project of the “Commu-naute francaise de Belgique”. Celine Engelbeen is a research fellow of the “Fonds pour la Formation ala Recherche dans l’Industrie et dans l’Agriculture” (FRIA) and Antje Kiesel is a research fellow of theGerman National Academic Foundation.

†Department of Mathematics, Universite Libre de Bruxelles CP 216, Boulevard du Triomphe, 1050Brussels, Belgium. E-Mail: [email protected].

‡Department of Mathematics, Universite Libre de Bruxelles CP 216, Boulevard du Triomphe, 1050Brussels, Belgium. E-Mail: [email protected].

§Institute for Mathematics, University of Rostock, 18051 Rostock, Germany. E-Mail:[email protected].

1

http://arxiv.org/abs/0907.0138v1

1 Introduction

Figure 1: A complete treatment unit at Saint-Luc Hospital (Brussels, Belgium).

Nowadays, radiation therapy is one of the most used methods for cancer treatment. Theaim is to destroy the cancerous tumor by exposing it to radiation while trying to preservethe normal tissues and the healthy organs located in the radiation field. Radiation iscommonly delivered by a linear accelerator (see Figure 1) whose arm is capable of doinga complete circle around the patient in order to allow different directions of radiation. Inintensity modulated radiation therapy (IMRT) a multileaf collimator (MLC, see Figure 2)is used to modulate the radiation beam. This has the effect that radiation can be deliveredin a more precise way by forming differently shaped fields and hence improves the qualityof the treatment.

Figure 2: The multileaf collimator (MLC).

2

2 1 1

Figure 3: Leaf positions and irradiation times determining an exact decomposition of a4× 4 intensity matrix.

After the physician has diagnosed the cancer and has located the tumor as well asthe organs located in the radiation field, the planning of the radiation therapy sessions isdetermined in three steps.

In the first step, a number of appropriate beam directions from which radiation willbe delivered is chosen [13].

Secondly, the intensity function for each direction is determined. This function isencoded as an integer matrix in which each entry represents an elementary part of theradiation beam (called a bixel). The value of each entry is the intensity of the radiationthat we want to send throughout the corresponding bixel.

Finally, the intensity matrix is segmented since the linear accelerator can only senda uniform radiation. This segmentation step mathematically consists in decomposing anm× n intensity matrix A into a nonnegative integer linear combination of certain binarymatrices S = (sij) that satisfy the consecutive ones property in each row, that is, siℓ = 1and sir = 1 for l 6 r implies sij = 1 for all ℓ 6 j 6 r. Such a binary matrix is called asegment.

In this paper, we focus on the case where the MLC is used in the so-called step-and-shoot mode, in which the patient is only irradiated when the leaves are not moving.Actually, segments are generated by the MLC and the segmentation step amounts tofinding a sequence of MLC positions (see Figure 3). The generated intensity modulatedfield is just a superposition of homogeneous fields shaped by the MLC.

Throughout the paper, [k] denotes the set {1, 2, . . . , k} for an integer k, and [ℓ, r]denotes the set {ℓ, ℓ+ 1, . . . , r} for integers ℓ and r with ℓ 6 r. We also allow ℓ = r + 1where [ℓ, r] = ∅. Thus, an m × n matrix S = (sij) is a segment iff there are integralintervals [ℓi, ri] for i ∈ [m] such that

sij =

{

1 if j ∈ [ℓi, ri],

0 otherwise.

A segmentation of the intensity matrix A is a decomposition

A =k∑

j=1

ujSj,

where uj ∈ Z+ and Sj is a segment for j ∈ [k]. The coefficients are required to be integersbecause in practice the radiation can only be delivered for times that are multiples of a

3

given unit time, called a monitor unit. In clinical applications a lot of constraints mayarise that reduce the number of deliverable segments. For some technical or dosimetricreasons we might look for decompositions where only a subset of all segments is allowed.Those subsets might be explicitely given or defined by constraints like the interleaf colli-sion constraint (denoted by ICC, also called interleaf motion constraint or interdigitationconstraint, see [3], [7], [18] and [20]), the interleaf distance constraint (IDC, see [14]), thetongue-and-groove constraint (TGC, see [8], [21], [22], [23], [24], [25] and [31]), the min-imum field size constraint (MFC, see [26]) or the minimum separation constraint (MSC,see [22]).

For some of those constraints, it is still possible to decompose A exactly using theset of feasible segments (like for the ICC, the IDC and the TGC), for others an exactdecomposition might be impossible. In this last case, if S := {S1, . . . , Sk} is our set offeasible segments, our aim is to find an approximation B that is decomposable with the

segments in S and that minimizes ‖A− B‖1 :=m∑

i=1

n∑

j=1

|aij − bij |.Later on in this paper, we will focus on the minimum separation constraint, that im-

poses a minimum leaf opening λ ∈ [n] in each open row of the irradiation field. Moreformally, a segment S given by its leaf positions ([ℓ1, r1] , . . . , [ℓm, rm]) satisfies the mini-mum separation constraint iff ri > ℓi implies ri− ℓi > λ−1 for all i ∈ [m]. This constraintwas first introduced by Kamath, Sahni, Li, Palta and Ranka in [22], where the problemof determining if it is possible to decompose A or not under this constraint was solved.Here we show that the approximation problem under the minimum separation constraintcan be solved in polynomial time with a minimum cost flow algorithm.

The approximation problem described above motivates the definition of the followingClosest Vector Problem (CVP). Recall that the ℓ1-norm of a vector x ∈ R

d is defined by‖x‖1 :=

∑di=1 |xi|. We say that x is binary if xi ∈ {0, 1} for all i ∈ [d]. The CVP is stated

as follows:

Input: A set G = {g1, . . . , gk} of binary vectors in Rd (the generators) and a vector a in

Rd (the target vector).

Goal: Find coefficients uj ∈ Z+ for j ∈ [k] such that ||a − b||1 is minimum, where

b :=∑k

j=1 uj gj .

Measure: The total change TC := ||a− b||1.

The CVP that is the focus of the present paper differs significantly from the well-studied CVP on a lattice that is used in cryptography (see, e.g., the recent survey byMicciancio and Regev [30]). In that context, G is any set of k independent vectors in R

d

(in particular, k 6 d), and one seeks a vector of the lattice

L :={

k∑

j=1

ujgj | uj ∈ Z ∀j ∈ [k]}

4

that is closest to the target vector a. In our case, the coefficients uj are assumed to benonnegative integers, and the vectors of G are not assumed to be independent (thus k canbe a lot bigger than d).

In order to cope with the NP-hardness of the CVP, we design (polynomial-time) ap-proximation algorithms. For the version of the CVP studied here it is natural to considerapproximation algorithms with additive approximation guarantees (whereas approxima-tion algorithms for the CVP on lattices have multiplicative guarantees).

We say that a polynomial-time algorithm A approximates the CVP within an additiveerror of ∆ if it outputs a feasible solution whose cost is at most OPT+∆, where OPT isthe cost of an optimal solution.

The rest of the paper is organized as follows: Section 2 is devoted to general resultson the CVP. We start by proving that the particular case where the generators have theconsecutive ones property is solvable in polynomial time. We afterwards show that, for ageneral collection G of generators, the CVP is NP-hard to approximate within an additiveerror of (ln 2− ǫ) d ≈ (0.693− ǫ) d, for all ǫ > 0 (which in particular implies that the CVPis NP-hard). We conclude this section with an analysis of a natural approximation algo-rithm for the problem based on randomized rounding. We prove that the approximationb returned by the algorithm has a TC that exceeds the optimum by an additive error ofat most

(

e4+ ln 2

2

)

d3/2 ≈ 1.026 d3/2.In Section 3, we focus on the particular instances of the CVP arising in IMRT described

above. We first show, using results of Section 2, that the problem can be solved inpolynomial time when the generators encode the minimum separation constraint. Weconclude the section with a further hardness of approximation result in case A is a 2× nmatrix: it is NP-hard to approximate the problem within an additive error of ǫ n, for someǫ > 0 (in particular, the corresponding restriction of the CVP is NP-hard).

In Section 4, we generalize our results to the case where one does not only want tominimize the total change, but a combination of the total change and the sum of thecoefficients uj for j ∈ [k]. In the IMRT context, this sum represents the total time duringwhich the patient is irradiated, called the beam-on time. It is desirable to minimize thebeam-on time, e.g., in order to reduce the side effects caused by diffusion of the radiationas much as possible.

Finally, in Section 5, we conclude the paper with some open problems and point toconnections between the CVP and discrepancy theory.

2 The closest vector problem

In this section we consider the CVP in its most general form. We first consider the par-ticular case where the generators are 0-1-vectors satisfying the consecutive ones propertyand prove that the CVP is polynomial in this case. We afterwards prove that, for allǫ > 0, there exists no polynomial-time approximation algorithm for the general case withan additive approximation guarantee of (ln 2− ǫ) d unless P = NP , and conclude the sec-tion by providing an approximation algorithm with an additive approximation guaranteeof(

e4+ ln 2

2

)

d3/2.

5

2.1 Polynomial case

In this subsection we study the CVP under the assumption that the set G of generators isformed by k binary vectors in R

d satisfying the consecutive ones property. Generators willbe denoted by gℓ,r where [ℓ, r] is the interval of ones of this generator, i.e., g = gℓ,r iff gi = 1for i ∈ [ℓ, r] and 0 otherwise. Let I be the set intervals such that G = {gℓ,r | [ℓ, r] ∈ I }.We assume that there is no generator with an empty interval of ones (i.e. ℓ 6 r alwaysholds).

We solve such an instance of the CVP by formulating it as a minimum cost flowproblem in an appropriate network. We show that the minimum cost of a flow in thisnetwork equals the minimum cost of a solution to the CVP.

Let a0 = ad+1 := 0 and let the network D = (V,A) be defined as follows: The set ofnodes is V := [d+ 1] = {1, 2, . . . , d+ 1} and the set of arcs is

A :={

(j, j + 1) | j ∈ [d]}

∪{

(j + 1, j) | j ∈ [d]}

∪{

(ℓ, r + 1) | [ℓ, r] ∈ I}

.

Let us notice that parallel arcs can appear when the interval of a generator only containsone element. In such a case we keep both arcs: the one representing the generator andthe other one.

The demand d(j) of each node j ∈ V is given by

d(j) := aj−1 − aj .

The capacities of the arcs are not restricted and the costs are 1 for all arcs of type (j, j+1)and (j + 1, j) and 0 for the other arcs. An example of the network is shown in Figure 4.

Figure 4: The network for an instance with d = 6 and k = 9.

Throughout this subsection we denote by δ+(j) the set of arcs of D that leave node jand by δ−(j) the set of arcs of D that enter node j. The amount of flow on the arcs of aset B ⊆ A is denoted by φ(B). So, mathematically:

δ+(j) := {a ∈ A | ∃k ∈ V : a = (j, k)}δ−(j) := {a ∈ A | ∃i ∈ V : a = (i, j)}φ(B) :=

∑

a∈B

φ(a).

6

Theorem 1. Let G and D = (V,A) be as above, let a ∈ Rd, and let OPT denote the

optimal value of the corresponding CVP instance. Then, OPT equals the minimum costof a flow in D.

Proof. Let first φ : A → R+ be a minimum cost flow inD. Since all capacities and demandsin D are integral we may assume that φ is an integer flow. Let b be the correspondingapproximation vector defined by its decomposition b =

∑

[ℓ,r]∈I uℓ,rgℓ,r where

uℓ,r := φ(ℓ, r + 1) for all [ℓ, r] ∈ I.

Because φ is a minimum cost flow, we have that

φ(j, j + 1) = 0 or φ(j + 1, j) = 0, (1)

for all j ∈ [d]. Otherwise both flow values can be reduced by 1 and this reduces the cost,in contradiction to the optimality of φ.

In the decomposition of b the j-th component receives a contribution of all generatorshaving an interval including the (j−1)-th component except the intervals ending in position(j − 1); and it also receives a contribution from generators having their interval of onesstarting in position j. Hence

bj − bj−1 =∑

[j,r]∈I

uj,r −∑

[ℓ,j−1]∈I

uℓ,j−1. (2)

We will prove by induction on the vertices of D that

φ (j, j + 1) = (aj − bj)+ and φ (j + 1, j) = (bj − aj)+ , (3)

where x+ := max {x, 0}, for all x ∈ R. Notice that x− y = (x− y)+ − (y − x)+ holds forall x, y ∈ R. This fact is used repeatedly below.

For j = 1, we have d (1) = a0 − a1 = −a1 and

a1 = φ(

δ+ (1))

− φ(

δ− (1))

=∑

[1,r]∈I

u1,r + φ(1, 2)− φ(2, 1)

= b1 + φ(1, 2)− φ(2, 1)

and thus φ(1, 2) = (a1 − b1)+ and φ(2, 1) = (b1 − a1)+. For j > 1, we have

aj − aj−1 = φ(

δ+ (j))

− φ(

δ− (j))

=∑

[j,r]∈I

uj,r + φ(j, j − 1) + φ(j, j + 1)−∑

[ℓ,j−1]∈I

uℓ,j−1 − φ(j − 1, j)− φ(j + 1, j)

= bj − bj−1 + (bj−1 − aj−1)+ − (aj−1 − bj−1)+ + φ(j, j + 1)− φ(j + 1, j)

= bj − bj−1 + bj−1 − aj−1 + φ(j, j + 1)− φ(j + 1, j)

7

where the third equality holds by (2) and induction. Then we get

aj = bj + φ(j, j + 1)− φ(j + 1, j)

and (3) holds for j. Hence, in particular, φ(j, j + 1) + φ(j + 1, j) = |aj − bj | for all j ∈ [d]by (1) and the cost of φ equals ‖a− b‖1. Therefore a minimum cost flow in D leads to asolution of the CVP with the same cost.

Conversely, let an optimal decomposition b =∑

[ℓ,r]∈I uℓ,rgℓ,r for the CVP instance begiven. We define φ by

φ(ℓ, r + 1) = uℓ,r for all [ℓ, r] ∈ I,φ(j, j + 1) = (aj − bj)+ for all j ∈ [d],

φ(j + 1, j) = (bj − aj)+ for all j ∈ [d].

It is easy to verify using (2), that φ defines a flow in D. Furthermore, the cost of φ is thesame as that of b.

Corollary 2. The restriction of the CVP to instances where the set G of generators isformed by vectors satisfying the consecutive ones property can be solved in polynomial time.

Proof. The proof of Theorem 1 gives a polynomial reduction from the considered restric-tion of the CVP to the minimum cost flow problem, which implies that the former ispolynomial.

2.2 Hardness

In this subsection we prove that the CVP is in general hard to approximate within anadditive error of (ln 2 − ǫ)d, for all ǫ > 0. To prove this, we consider the particular casewhere a is the all-one vector, i.e., a = 1. The given set G is formed of k binary vectorsg1, . . . , gk in R

d. Because a is binary, the associated coefficients uj for j ∈ [k] can beassumed to be binary as well.

Theorem 3. For all ǫ > 0, there exists no polynomial-time approximation algorithm forthe CVP with an additive approximation guarantee of (ln 2− ǫ) d, unless P = NP .

Proof. We define a gap-introducing reduction from 3SAT to the CVP (see Vazirani [33]for definitions). By combining the PCP theorem (see Arora, Lund, Motwani, Sudan andSzegedy [2]) and a reduction due to Feige [15] (see also Feige, Lovasz and Tetali [16]and Cardinal, Fiorini and Joret [9]), we obtain a reduction from 3SAT to CVP with thefollowing properties. For any given constants c > 0 and ξ > 0, it is possible to set thevalues of the parameters of the reduction in such a way that:

• The generators from G have all the same number dtof ones, where t can be assumed

to be larger than any given constant.

8

• If the 3SAT formula φ is satisfiable, then a can be exactly decomposed as a sum oft generators of G.

• If the 3SAT formula φ is not satisfiable, then the support of any linear combinationof x generators chosen from G is of size at most d

(

1−(

1− 1t

)x+ ξ)

, for 1 6 x 6 ct.

Let u be an optimal solution of the instance of the CVP corresponding to the 3SATformula φ and let x be the number of components of u equal to 1. Let us recall that inthis proof, the target vector a is the all-one vector and hence all components of u canbe assumed to be binary. Let b =

∑kj=1 ujgj and let OPT be the optimal TC of this

instance, i.e., OPT = ‖a− b‖1.From what precedes, if φ is satisfiable then OPT = 0. We claim that if φ is non-

satisfiable, then OPT > d (ln 2− ǫ), provided t is large enough and ξ is small enough.Clearly, the result follows from this claim. We distinguish three cases.

• Case 1: x = 0.

In this case OPT = d (we have b = 0 and ‖a− b‖1 = d).

• Case 2 : 1 6 x 6 ct.

Let ρ denote the number of components bi of b that are nonzero. Thus d− ρ is thenumber of bi equal to 0. The total change of b includes one unit for each componentof b that is zero and a certain number of units caused by components of b largerthan one. More precisely, we have:

OPT = d− ρ+ xd

t− ρ

> d(x

t+ 1)

− 2d

(

1−(

1− 1

t

)x

+ ξ

)

= d

(

x

t+ 2

(

1− 1

t

)x

− 1− 2ξ

)

= d ((1− β)x+ 2βx − 1− 2ξ) ,

where β := 1− 1t. Note that β < 1 and taking t large corresponds to taking β close

to 1. In order to derive the desired lower bound on OPT we now study the functionf(x) := (1− β)x+ 2βx. The first derivative of f is

f ′(x) = (1− β) + 2 ln β · βx.

It is easy to verify (since the second derivative of f is always positive) that f isconcave and attains its minimum at

xmin =1

ln β· ln(

β − 1

2 lnβ

)

9

Hence we have, for all x > 0,

f(x) > f(xmin)

= (1− β)xmin + 2βxmin

= (1− β) · 1

lnβ· ln(

β − 1

2 lnβ

)

+β − 1

ln β

=β − 1

ln β

(

ln

(

2 ln β

β − 1

)

+ 1

)

.

By l’Hospital’s rule,

limβ→1

(β − 1)

ln β= 1,

hence we havef(x) > ln 2 + 1 + 2ξ − ǫ

for t sufficiently large and ξ sufficiently small, which implies

OPT > d (ln 2− ǫ) .

• Case 3: x > ct.

Let again ρ be the number of components bi of b that are nonzero. The first ctgenerators used by the solution have some common nonzero entries. By taking intoaccount the penalties caused by components of b larger than one, we have:

OPT > ct · dt− d

(

1−(

1− 1

t

)ct

+ ξ

)

= d

(

c− 1−(

1− 1

t

)ct

+ ξ

)

> d(

c− 1 + e−c − ǫ)

> d (ln 2− ǫ) .

The third inequality holds for t sufficiently large and ξ sufficiently small and the lastone holds for c large enough.

This concludes the proof of the claim. The theorem follows.

2.3 Approximation algorithm

In this subsection we give a polynomial-time approximation algorithm that returns a vectorb such that ‖a− b‖1 6 OPT+

(

e4+ ln 2

2

)

d3/2, where OPT denotes the cost of an optimal

10

solution of the CVP. This algorithm rounds an optimal solution of the following naturallinear relaxation of the CVP:

(LP) min

d∑

i=1

γi

s.t. γi > ai − bi ∀i ∈ [d] (4)

γi > bi − ai ∀i ∈ [d] (5)

bi =k∑

j=1

uj gij ∀i ∈ [d] (6)

uj > 0 ∀j ∈ [k] . (7)

In this formulation uj is the coefficient of the generator gj , bi is the i-th component of theapproximation b of the target vector a and γi is (essentially) the deviation between ai andbi. Let LP be the value of an optimal solution of (LP). Obviously, we have OPT > LP.

Note that for each vertex of the feasible region of (LP), there are at most d componentsof u that are nonzero. This is the case, because if we assume that ℓ > d nonzero coefficientsexist, then only k− ℓ inequalities of type (7) are satisfied with equality. As we have 2d+kvariables, we need at least 2d + k independent equalities to define a vertex. Thus, theremust be (2d+ k)− (k− ℓ)− d = d+ ℓ > 2d equalities of type (4) and (5) that are satisfiedwith equality. This is a contradiction, as there are only 2d inequalities of type (4) and (5).Thus, for any extremal optimal solution of the linear program, at most d of the coefficientsuj are nonzero.

Algorithm 1

Input: The target vector a ∈ Rd and generators g1, . . . , gk ∈ {0, 1}d.

Output: An approximation b of a.Compute an extremal optimal solution (b∗, γ∗,u∗) of (LP).for all j ∈ [k] do

if u∗j is integer uj := u∗

j , otherwise uj :=

{

⌈u∗j⌉ with probability u∗

j −⌊

u∗j

⌋

,

⌊u∗j⌋ with probability

⌈

u∗j

⌉

− u∗j .

end for

b :=∑k

j=1 ujgj .

In the analysis of Algorithm 1 we use the following “Chernoff-Hoeffding type” estimatethat should be known although we could not find it in the literature.

Lemma 1. Let k be a positive integer and let X1, X2, . . . , Xk be k independent randomvariables such that, for all j ∈ [k], P [Xj = 1 − pj ] = pj and P [Xj = −pj ] = 1 − pj. LetX := X1 +X2 + . . . +Xk. Moreover, let σ2

j := pj(1 − pj) denote the variance of Xj andσ2 := σ2

1 + σ22 + · · ·+ σ2

k. Then

E[|X|] 6 1

2λeλσ2 +

ln 2

λ

for all λ > 0.

11

This lemma is proved in the appendix.

We are now ready to prove the main result of this section.

Theorem 4. Algorithm 1 is a randomized polynomial-time algorithm that approximatesthe CVP within an expected additive error of

(

e4+ ln 2

2

)

d3/2 ≈ 1.026 d3/2.

Proof. We have:

E[∥

∥

∥a− b

∥

∥

∥

1

]

6 E[∥

∥

∥a− b∗

∥

∥

∥

1

]

+ E[∥

∥

∥b∗ − b

∥

∥

∥

1

]

= LP + E

∥

∥

∥

∥

∥

k∑

j=1

u∗jgj −

k∑

j=1

ujgj

∥

∥

∥

∥

∥

1

= LP + E

[

d∑

i=1

∣

∣

∣

∣

∣

k∑

j=1

gij(

u∗j − uj

)

∣

∣

∣

∣

∣

]

.

Let Xij := gij(

u∗j − uj

)

for all i ∈ [d] and j ∈ [k]. For each fixed i ∈ [d], Xi1, . . . , Xik areindependent random variables satisfying Xij = 0 if gij = 0 or u∗

j ∈ Z and otherwise

Xij =

{

u∗j −

⌈

u∗j

⌉

with probability u∗j −

⌊

u∗j

⌋

,

u∗j −

⌊

u∗j

⌋

with probability⌈

u∗j

⌉

− u∗j .

We study the expectation of these random variables. For Xij satisfying gij = 1 and u∗j /∈ Z

we have

E [Xij] =(

u∗j −

⌈

u∗j

⌉) (

u∗j −

⌊

u∗j

⌋)

+(

u∗j −

⌊

u∗j

⌋) (⌈

u∗j

⌉

− u∗j

)

= 0.

If gij = 0 or u∗j ∈ Z the expectation is still zero. Let σ2

ij denote the variance of Xij . Incase gij = 0 or u∗

j ∈ Z, we have σ2ij = 0. Otherwise, we have σ2

ij = pj (1− pj) where

pj := u∗j −

⌊

u∗j

⌋

. Let Xi :=∑k

j=1Xij for all i ∈ [d]. We get

E[∥

∥

∥a− b

∥

∥

∥

1

]

6 LP + E

[

d∑

i=1

∣

∣

∣

∣

∣

k∑

j=1

Xij

∣

∣

∣

∣

∣

]

= LP +

d∑

i=1

E [|Xi|]

6 LP +

d∑

i=1

(

1

2λeλσ2

i +ln 2

λ

)

for all λ > 0, where σ2i := σ2

i1 + · · · + σ2ik. The last inequality holds by the Lemma 1. If

σi 6 1, we may take λ constant, e.g., λ = 1. In this case, we find

E[|Xi|] 61

2e+ ln 2 6

(

e

4+

ln 2

2

)

d1/2

12

assuming d > 4. If σ > 1, we take λ = 1/σi such that λ < 1 and

E[|Xi|] 61

2e · σi + ln 2 · σi.

And since pj (1− pj) is maximum when pj =12and, as it was previously said, at most d

of the u∗j are nonzero, we get σi 6

√

d4= 1

2

√d and thus

E[∥

∥

∥b− a

∥

∥

∥

1

]

6 LP + d

(

e

4+

ln 2

2

)

d1/2

6 OPT +

(

e

4+

ln 2

2

)

d3/2.

By derandomizing Algorithm 1 with the method of conditional probabilities and ex-pectation (see Alon and Spencer [1]), we obtain the following result:

Corollary 5. The CVP can be approximated within an additive error of(

e4+ ln 2

2

)

d3/2, inpolynomial time.

3 Application to IMRT

In this section we consider a target matrix A and the set of generators S formed bysegments (that is, binary matrices whose rows satisfy the consecutive ones property). Inthe first part of this section we consider that S is formed by all the segments that satisfythe minimum separation constraint. In the last part we consider any set of segments S.We show that in this last case the problem is hard to approximate, even if the matrix hasonly two rows.

3.1 The minimum separation constraint

In this subsection we consider the CVP under the constraint that the set S of generatorsis formed by all segments that satisfy the minimum separation constraint. Given λ ∈ [n],this constraint requires that the rows which are not totally closed have a leaf openingof at least λ. Mathematically, the leaf positions of open rows i ∈ [m] have to satisfyri − ℓi > λ − 1. We cannot decompose any matrix A under this constraint. Indeed, thefollowing single row matrix cannot be decomposed for λ = 3:

A =(

1 1 4 1 1)

.

The problem of determining if it is possible to decompose matrix A under this constraintwas proved to be polynomial by Kamath et al. [22].

13

Obviously, the minimum separation constraint is a restriction on the leaf openings ineach single row, but does not affect the combination of leaf openings in different rows.Again, more formally, the set of allowed leaf openings in one row i is

Si = {[ℓi, ri] | ri − ℓi > λ− 1 or ri = ℓi − 1},

and does not depend on i. If we denote a segment by the set of its leaf positions([ℓ1, r1], . . . , [ℓm, rm]) then the set of feasible segments S for the minimum separation con-straint is simply S = S1 × · · ·× Sm. Thus, in order to solve the CVP under the minimumseparation constraint, it is sufficient to focus on single rows.

Indeed, whenever the set of feasible segments has a structure of the form S = S1 ×· · · × Sm, which means that the single row solutions can be combined arbitrarily and wealways get a feasible segment, solving the single row problem is sufficient.

From Corollary 2, we then infer our next result.

Corollary 6. The restriction of the CVP where vectors are m × n matrices and the setof generators is the set of all segments satisfying the minimum separation constraint canbe solved in polynomial time.

3.2 Further hardness results

As we have seen in Subsection 2.1, the CVP with generators satisfying the consecutiveones property is polynomial (see Theorem 1). Moreover, we have proved in Theorem 3that the CVP is hard to approximate within an additive error of (ln 2− ǫ) d for a generalset of generators. We now prove that, surprisingly, the case where generators contain atmost two blocks of ones, which corresponds in the IMRT context of having a 2×n intensitymatrix A, and a set of generators formed by 2×n segments, is hard to approximate withinan additive error of ǫ n, for some ǫ > 0.

Theorem 7. There exists some ǫ > 0 such that it is NP-hard to approximate the CVP,restricted to 2 × n matrices and generators with their ones consecutive on each row, withan additive error of at most ǫ n.

Proof. We prove this by reduction from the MAX-3SAT6 problem. Let us recall theproblem:

Instance: A CNF formula in which each clause contains three literals, and each variableappears non-negated in three clauses, and negated in three other clauses.

Goal: Find truth values for the variables such that the number of satisfied clauses ismaximized.

Measure: The number of satisfied clauses.

Let a MAX-3SAT6 instance Φ in the variables x1, . . . , xs be given. Let c1, . . . , ct denotethe clauses of Φ. We say that a variable xi and a clause cj are incident if cj involves xi or

14

its negation xi. By double-counting the incidence between variables and clauses, we find6s = 3t, that is, t = 2s.

We build an instance of the restricted CVP as follows: A is a matrix with two rows.In the first row of A, we put s consecutive intervals, each containing 5 consecutive ones,followed by 5s consecutive zeroes. In the second row of A, we put t = 2s consecutiveintervals, each containing 5 consecutive ones. Thus A has 5s ones, followed by 5s zeroesin its first row, and 10s ones in its second row. The size of A is 2×n, where n = 10s = 5t.

To each variable xi there corresponds an interval I(xi) := [5i − 4, 5i] of 5 ones in thefirst row of A (in this proof, for the sake of simplicity, we identify intervals of the form[ℓ, r] and the 1× n row vector they represent). We let also I(xi) := I(xi).

For each interval I(xi) = I(xi) = [5i− 4, 5i] we consider two decompositions in threesub-intervals that correspond to setting the variable xi true or false. The decompositioncorresponding to setting xi true is I(xi) = I1(xi) + I2(xi) + I3(xi), where I1(xi) := [5i −4, 5i−4], I2(xi) := [5i−3, 5i−2] and I3(xi) := [5i−1, 5i]. The decomposition correspondingto setting xi false is I(xi) = I1(xi) + I2(xi) + I3(xi), where I1(xi) := [5i − 4, 5i − 3],I2(xi) := [5i− 2, 5i− 1] and I3(xi) := [5i, 5i]. An illustration is given in Figure 5.

−4 −3 −2−1 0

I1(xi)

−4 −3 −2−1 0

I1(xi)

I2(xi) I2(xi)

I3(xi) I3(xi)

Figure 5: The sub-intervals used for the variable gadgets.

Similarly, to each clause cj there corresponds an interval I(cj) := [5j−4, 5j] of 5 ones inthe second row of A. We define ten sub-intervals that can be combined in several ways todecompose I(cj). We let I1(cj) := [5j−4, 5j−4], I2(cj) := [5j−2, 5j−2], I3(cj) := [5j, 5j],I4(cj) := [5j−3, 5j−3], I5(cj) := [5j−1, 5j−1], I6(cj) := [5j−4, 5j−3], I7(cj) := [5j−1, 5j],I8(cj) := [5j−3, 5j−1], I9(cj) := [5j−4, 5j−1] and I10(cj) := [5j−3, 5j]. An illustrationis given in Figure 6.

The three first sub-intervals Iα(cj) (α ∈ {1, 2, 3}) correspond to the three literals ofthe clause cj . The last seven sub-intervals Iα(cj) (α ∈ {4, . . . , 10}) alone are not sufficientto decompose I(cj) exactly. In fact, if we prescribe any subset of the first three sub-intervals in a decomposition, we can complete the decomposition using some of the lastseven intervals to an exact decomposition of I(cj) in all cases but one: If none of the threefirst sub-intervals is part of the decomposition, the best we can do is to approximate I(cj)by, e.g., I9(cj), resulting in a total change of 1 for the interval.

The CVP instance has k = 20s = 10t allowed segments. The first 3t segments cor-respond to pairs (ℓi, cj) where ℓi ∈ {xi, xi} is a literal and cj is a clause involving ℓi.Consider such a pair (ℓi, cj) and assume that ℓi is the α-th literal of cj , and cj is the β-thclause containing ℓi (thus α, β ∈ {1, 2, 3}). The segment associated to the pair (ℓi, cj)is S(ℓi, cj) := (Iβ(ℓi), Iα(cj)). The last 7t segments are of the form Sγ(cj) := (∅, Iγ(cj))

15

−4 −3−2 −1 0

I1(cj)

−4 −3−2 −1 0

I4(cj)

−4 −3 −2−1 0

I8(cj)

I2(cj) I5(cj) I9(cj)

I3(cj) I6(c7) I10(cj)

I7(cj)

Figure 6: The sub-intervals used for the clause gadgets.

where cj is a clause and γ ∈ {4, . . . , 10}. We denote the resulting set of segments byS = {S1, . . . , Sk}. This concludes the description of the reduction. Note that the reduc-tion is clearly polynomial.

Now suppose we have a truth assignment that satisfies σ of the clauses of Φ. Then wecan find an approximate decomposition of A with a total change of t− σ by summing all3s segments of the form S(ℓi, cj) where ℓi = xi if xi is set to true, and ℓi = xi if xi is setto false, and a subset of the segments Sγ(cj).

Conversely, assume that we have an approximate decomposition B :=∑k

j=1 ujSj of Awith a total change of TC = ||A − B||1. Because A is binary, we may assume that u isalso binary. We say that a variable xi is coherent (w.r.t. B) if the deviation TC(xi) forthe interval I(xi) is zero. The variable is said to be incoherent otherwise. Similarly, wesay that a clause cj is satisfied (again, w.r.t. B) if the deviation TC(cj) for the intervalI(cj) is zero, and unsatisfied otherwise.

We can modify the approximation B =∑k

j=1 ujSj in such a way to make all variablescoherent, without increasing TC, for the following reasons.

First, while there exist indices α, i, j and j′ such that S(xi, cj) and S(xi, cj′) are bothused in the decomposition and cj , cj′ are the α-th clauses respectively containing xi andxi, we can remove one of these two segments from the decomposition and change thesegments of the form Sγ(cj) or Sγ(cj′) that are used, in such a way that TC does notincrease. More precisely, TC(xi) decreases by at least one, and TC(cj) or TC(cj′), butnot both, increases by at most one. Thus we can assume that, for every indices α, i, jand j′ such that cj is the α-th clause containing xi and cj′ is the α-th clause containingxi, at most one of the two segments S(xi, cj) and S(xi, cj′) is used in the decomposition.Similarly, we can assume that, for every such i, j and j′, exactly one of the two segmentsS(xi, cj) and S(xi, cj′) is used in the decomposition.

Second, while some variable xi remains incoherent, we can replace the segment ofthe form S(ℓi, cj), where ℓi ∈ {xi, xi}, with a segment of the form S(ℓi, cj′) and changesome segments of the form Sγ(cj) or Sγ(cj′) ensuring that TC does not increase (again,TC(xi) decreases by at least one and either TC(cj) or TC(cj′) increases by at most one).Therefore, we can assume that all variables are coherent.

Now, by interpreting the relevant part of the decomposition of B as a truth assignment,

16

we obtain truth values for the variables of Φ satisfying at least t− TC of its clauses.Letting OPT(Φ) and OPT(A,S) denote the maximum number of clauses of Φ that can

be simultaneously satisfied and the optimal total change of an approximation of A usingthe segments in S, we have OPT(Φ) = t−OPT(A,S).

There exists some δ ∈ (0, 1) such that it is NP-hard to distinguish, among all MAX-3SAT6 instances Φ with t clauses, instances such that OPT(Φ) = t from instances suchthat OPT(Φ) < δ t. (This is one of the many consequences of the PCP theorem, see, e.g.,[16].) Using the above reduction, we conclude the following. It is NP-hard to distinguish,among all instances (A,S) of the restricted CVP, instances such that OPT(A,S) = 0

from instances such that OPT(A,S) > (1−δ)5

n. The result follows (we may take, e.g.,ǫ = (1− δ)/5).

4 Incorporating the beam-on time in the objective

function.

In this section we generalize the results of this paper in the case where we do not onlywant to minimize the total change, but a combination of the total change and the sum ofthe coefficients uj for j ∈ [k], that is,

min{

α ·d∑

i=1

|ai − bi|+ β ·k∑

j=1

uj | a ∈ Zd, u ∈ Z

k}

where b =∑k

j=1 ujgj and α and β are arbitrary nonnegative importance factors. Through-out this section, we study the CVP under this objective function. The resulting problemis denoted CVP-BOT.

Let us recall that in the IMRT context the generators from G represent segments thatcan be generated by the MLC. The coefficient uj associated to the segment gj for j ∈ [k]gives the total time during which the patient is irradiated with the leaves of the MLC in acertain position. Hence, the sum of the coefficients exactly corresponds to the total timeduring which the patient is irradiated (beam-on time).

In order to avoid overdosage in the healthy tissues due to unavoidable diffusion effectsas much as possible, it is desirable to take the beam-on time into account in the objectivefunction. We observe here that the main results of the previous sections still hold withthe new objective function. For hardness results, this is obvious because taking α = 1 andβ = 0 gives back the original objective function.

Corollary 8. Let G and D = (V,A) be as described in Section 2.1, let a ∈ Rd, and let

OPT denote the optimal value of the corresponding CVP-BOT instance. If we redefinethe costs of the arcs of D by letting the cost of arcs of the form (j, j + 1) or (j + 1, j) (forj ∈ [d]) be α, and the costs of the other arcs be β, then OPT equals the minimum cost ofa flow in D.

Proof. The result follows easily from the proof of Theorem 1.

17

We can also find an approximation algorithm with an additive error of at most α ·(

e4+ ln 2

2

)

d3/2. The algorithm is identical to Algorithm 1, except it uses the followinglinear relaxation:

(LP-BOT) min α ·d∑

i=1

γi + β ·k∑

j=1

uj

s.t. γi > ai − bi ∀i ∈ [d]γi > bi − ai ∀i ∈ [d]

bi =

k∑

j=1

ujgij ∀i ∈ [d]

uj > 0 ∀j ∈ [k] .

We refer to this algorithm as Algorithm 1’. Adapting the proof of Theorem 4 yields thefollowing result.

Corollary 9. Algorithm 1’ is a randomized polynomial-time algorithm that approximatesCVP-BOT within an expected additive error of α ·

(

e4+ ln 2

2

)

d3/2 ≈ 1.026 α · d3/2.Proof. By linearity of expectation, we have:

E

[

α ·∥

∥

∥a− b

∥

∥

∥

1+ β ·

k∑

j=1

uj

]

= α · E[∥

∥

∥a− b

∥

∥

∥

1

]

+ β · E[

k∑

j=1

uj

]

6 α · E[∥

∥

∥a− b∗

∥

∥

∥

1

]

+ β ·k∑

j=1

uj + α · E[∥

∥

∥b∗ − b

∥

∥

∥

1

]

= LP-BOT + α ·(

e

4+

ln 2

2

)

d3/2.

Finally, by derandomizing Algorithm 1’, we obtain the following result.

Corollary 10. There exists a polynomial-time algorithm that approximates CVP-BOTwithin an additive error of α ·

(

e4+ ln 2

2

)

d3/2.

5 Relation to discrepancy theory and open questions

We begin this section by relating our results to discrepancy theory and conclude with theopen questions. For background on discrepancy theory and its applications see, e.g., Beckand Chen [4], Alon and Spencer [1], Beck and Sos [6], Matousek [28] and Chazelle [10].

In this section, A is again an m×n matrix, but no longer an intensity matrix. Rather,we assume that A is an m × n binary matrix whose rows may be interpreted as thecharacteristic vectors of sets of a set system of size m on n points. This assumption keepsus within the realm of combinatorial discrepancy. Note that much of the notions andresults are nevertheless valid for general m× n matrices A.

18

The linear discrepancy of A with respect to a vector w ∈ [0, 1]n is defined by

lindisc (A,w) := minz∈{0,1}n

‖A(w− z)‖∞ , (8)

and the linear discrepancy of A by

lindisc (A) := maxw∈[0,1]n

lindisc (A) . (9)

For each vector w ∈ [0, 1]n, we may regard each z ∈ {0, 1}n as being obtained by roundingthe coordinates of w in some way, and the linear discrepancy of A with respect to w as ameasure of how close the rounded vector z is from the original (fractional) vector w.

The linear discrepancy of A is closely related to the discrepancy of A and its submatri-ces. The discrepancy of A is defined as disc(A) := minχ∈{−1,1}n ‖Aχ‖∞ = 2 lindisc

(

A, 121)

,and the hereditary discrepancy herdisc(A) as the maximum discrepancy of a submatrix ofA. We have 1

2disc(A) 6 lindisc(A) 6 herdisc(A), where the first inequality is trivial and

the second is due to Lovasz, Spencer and Vesztergombi [27].For p ∈ [1,∞), and w ∈ [0, 1]n, we may also define the ℓp linear discrepancy of A (with

respect to w) by replacing the ℓ∞-norm in (8) and (9) by the ℓp-norm and scaling downby a factor of m−1/p. These two quantities are respectively denoted by lindiscp(A,w) andlindiscp(A). The definition of ℓp linear discrepancy above is compatible with the existingdefinition of ℓp discrepancy, see Matousek [29].

In order to relate the CVP to discrepancy theory, consider the d × k matrix G :=(g1, . . . , gk) with the k generators of G as columns, and (b∗, γ∗,u∗) an optimal solutionof (LP). Thus u∗ minimizes ||a − Gu||1 among all nonnegative vectors u ∈ R

k+. Now

write u∗ = ⌊u∗⌋ + {u∗}, where the i-th component of ⌊u∗⌋ and {u∗} is the floor and thefractional part of u∗

i , respectively.From a high-level point of view, our strategy to approximate the CVP can be described

as “project and round”. After projecting the target vector a to the point b∗ = Gu∗ of thecone generated by G that is closest to a, we round b∗ = Gu∗ to a feasible solution b = Gu

with u = ⌊u∗⌋ + v and v ∈ {0, 1}k.The analysis given in the proof of Theorem 4 uses the triangular inequality, and the

fact that no feasible solution b can be closer to a than b∗, to get:

‖a−Gu‖1 6 ‖a−Gu∗‖1 + ‖Gu∗ −Gu‖16 OPT + ‖G ({u∗} − v)‖1 .

The second term in the above inequalities determines the approximation guarantee of ouralgorithm. We showed that the rounding vector v ∈ {0, 1}k found by the randomizedrounding achieves ‖G ({u∗} − v)‖1 6 ( e

4+ ln 2

2) d3/2, on average. Obtaining the best pos-

sible rounding vector v amounts to determining minv∈{0,1}k ‖G ({u∗} − v)‖1, a quantitythat equals d times the ℓ1 linear discrepancy of G with respect to {u∗}.

Assume now that (b∗, γ∗,u∗) is an extremal optimal solution of (LP). As shown above,this implies that at most d of the components of u∗ are nonzero. This means that we canrestrict the matrix G to a d × d matrix, and forget all but d components of the vectors{u∗} and v. From now on, we assume G ∈ {0, 1}d×d, {u∗} ∈ [0, 1]d and v ∈ {0, 1}d.

19

It follows from a result of Spencer [32] that herdisc(G) < 6√d. By Lovasz et al.’s

inequality stated above and the easy inequality d−1 ‖x‖1 6 ‖x‖∞ for a vector x ∈ Rd, we

then have:lindisc1(G) 6 lindisc(G) 6 herdisc(G) < 6

√d.

This implies that there always exists v ∈ {0, 1}d with ‖G ({u∗} − v)‖1 < 6 d3/2. Thusa result similar to Theorem 4 readily follows from two standard results in discrepancytheory. However, we remark that our proof of Theorem 4 is elementary and that ourconstant in front of the d3/2 is smaller (roughly 1.026 instead of 6).

Other results from discrepancy theory have implications in our work. For instance,Ghouila-Houri has proved that a m×n matrix A is totally unimodular iff herdisc(A) 6 1.Hence, if G is totally unimodular then we get lindisc1(G) 6 1. By a result of Doerr [11],this can be sharpened to lindisc1(G) 6 1 − 1

d+1. Therefore, it is possible to approximate

the CVP within an additive error of d− dd+1

in the case where G is totally unimodular.

Here are some questions we left open for future work:

• The first type of question concerns the tightness of the (in)approximability resultsdeveloped here. We proved that there is no polynomial-time approximation algo-rithm with an (additive) approximation guarantee of (ln 2− ǫ) d, unless P = NP. Onthe other hand, we give an approximation algorithm with an approximation guar-antee of ( e

4+ ln 2

2)d3/2. What is the true approximability threshold of the CVP? We

believe that Θ(d3/2) is closer to the truth than Θ(d).

Should that question be too difficult, determining the integrality gap of the LPformulation of the CVP used in Section 2.3, would already be interesting, in ouropinion. Here, the integrality gap equals the worst difference OPT − LP over in-stances of the CVP. As the above discussion indicates, this is essentially a questionin discrepancy theory.

• The second type of question addresses further ways in which our results could beextended. For instance, it would be interesting to understand how our results gen-eralize when the ℓ1-norm is replaced by an ℓp-norm with p ∈ (1,∞].

• The third type of question is more algorithmic and concerns the application of theCVP to IMRT. There exists a simple, direct algorithm for checking whether an inten-sity matrix A can be decomposed exactly under the minimum separation constraintor not [22, 19]. Is there a simple, direct algorithm for approximately decomposingan intensity matrix under this constraint as well?

6 Acknowledgments

We thank Cigdem Guler and Horst Hamacher from the University of Kaiserslautern forseveral discussions in the early stage of this research project. We also thank Maude Gathyand Yvik Swan from the Universite Libre de Bruxelles for their help in the probabilisticaspects of our work.

20

References

[1] N. Alon and J. Spencer, The probabilistic Method, Wiley-Interscience Series in Dis-crete Mathematics and Optimization, New York, 1992.

[2] S. Arora, C. Lund, R. Motwani, M. Sudan and M. Szegedy, ”Proof verification andthe hardness of approximation problems”, Journal of the ACM, 45(3), 501–555, 1998.

[3] D. Baatar, H.W. Hamacher, M. Ehrgott, and G.J. Woeginger, Decomposition ofinteger matrices and multileaf collimator sequencing, Discrete Appl. Math., 152, 6–34, 2005.

[4] J. Beck and W. Chen, Irregularities of distribution, Cambridge University Press,Cambridge, 1987.

[5] J. Beck and T. Fiala, Integer-making theorems, Discrete Appl. Math. 3: 1-8, 1981.

[6] J. Beck and V. Sos, Discrepancy theory, in : Handbook of Combinatorics, North-Holland, AMsterdam, 1995, pages 1405-1446.

[7] N. Boland, H. W. Hamacher and F. Lenzen, Minimizing beam-on time in cancerradiation treatment using multileaf collimators, Networks, 43(4), 226-240, 2004.

[8] T.R. Bortfeld, D.L. Kahler, T.J. Waldron and A.L. Boyer, X–ray field compensationwith multileaf collimators, Int. J. Radiat. Oncol. Biol. Phys., 28, 723–730, 1994.

[9] J. Cardinal, S. Fiorini, G. Joret. Tight results on minimum entropy set cover. Algo-rithmica, 51(1), 2008.

[10] B. Chazelle, The discrepancy method: Randomness and complexity, Cambridge Uni-versity Press, Cambridge, 2000.

[11] B. Doerr, Linear discrepancy of totally unimodular matrices, Linear Algebra and ItsApplications, 420 (2007), 663-666.

[12] K. Engel, A new algorithm for optimal multileaf collimator field segmentation, Dis-crete Appl. Math., 152(1-3), 35-51, 2005.

[13] K. Engel, T. Gauer, A dose optimization method for electron radiotherapy usingrandomized aperture beams, submitted to Phys. Med. Biol., 2009.

[14] C. Engelbeen, S. Fiorini, Constrained decompositions of integer matrices and theirapplications to intensity modulated radiation therapy, Networks, 2009.

[15] U. Feige, A threshold of lnn for approximating set cover. J. ACM, 45(4), 634-652,1998.

[16] U. Feige, L. Lovasz, and P. Tetali. Approximating min sum set cover. Algorithmica,40(4), 219-234, 2004.

21

[17] A. Ghouila-Houri. Caracterisation des matrices totalement unimodulaires. C. R.Acad. Sci. Paris, 254, 1192-1194, 1962.

[18] T. Kalinowski, A duality based algorithm for multileaf collimator field segmentationwith interleaf collision constraint, Discrete Appl. Math., 152, 52–88, 2005.

[19] T. Kalinowski, Realization of intensity modulated radiation fields using multileafcollimators, Information Transfer and Combinatorics, 1010–1055, Ahlswede et al.,R., vol. 4123, LNCS, Springer-Verlag, 2006.

[20] T. Kalinowski, Multileaf collimator shape matrix decomposition, Optimization inMedicine and Biology, pp. 253–286, Lim, G.J., Auerbach Publishers Inc., 2008.

[21] T. Kalinowski, Reducing the tongue-and-groove underdosage in MLC shape matrixdecomposition, Algorithmic Operations Research,(3), 2, 2008.

[22] S. Kamath, S. Sahni, J. Li, J. Palta and S. Ranka, Leaf sequencing algorithms forsegmented multileaf collimation, Phys. Med. Biol., 48-(3), 307-324, 2003.

[23] S. Kamath, S. Sartaj, J. Palta, S. Ranka and J. Li, Optimal leaf sequencing withelimination of tongue–and–groove underdosage, Phys. Med. Biol., (49), N7-N19, 2004.

[24] S. Kamath, S. Sartaj, S. Ranka, J. Li, and J. Palta, A comparison of step–and–shoot leaf sequencing algorithms that eliminate tongue–and–groove effects, Phys.Med. Biol., (49), 3137-3143, 2004.

[25] A. Kiesel, T. Kalinowski, MLC shape matrix decomposition without tongue-and-groove underdosage, submitted to Networks.

[26] A. Kiesel, T. Gauer, Approximated segmentation considering technical and dosimetricconstraints in intensity-modulated radiation therapy, submitted to JORS.

[27] L. Lovasz, J. Spencer and K. Vesztergombi, Discrepancy of set systems and matrices,Eur. J. Comb. 7: 151-260, 1986.

[28] J. Matousek, Geometric Discrepancy: An Illustrated Guide. Algorithms and combi-natorics 18. Springer, Berlin. 1999.

[29] J. Matousek, An Lp version of the Beck-Fiala Conjecture, Europ. J. Combinatorics,19, 175-182, 1998

[30] D. Micciancio, O. Regev, Lattice-based Cryptography, in: D.J. Bernstein and J. Buch-mann (eds), Post-quantum Cryprography, Springer, 2008.

[31] W. Que, J. Kung and J. Dai, ‘Tongue–and–groove’ effect in intensity modulatedradiotherapy with static multileaf collimator fields, Phys. Med. Biol., (49), 399-405,2004.

22

[32] J. Spencer, Six standard deviations suffice, Trans. Am. Math. Soc. 289: 679-706,1985.

[33] V. Vazirani, Approximation Algorithms, Springer, 2002.

[34] V. M. Zolotarev, Modern theory of summation of random variables, Modern Proba-bility and Statistics, VSP, Utrecht, The Netherlands, 1997.

23

Appendix

Lemma. Let k be a positive integer and let X1, X2, . . . , Xk be k independent randomvariables such that, for all j ∈ [k], P [Xj = 1 − pj ] = pj and P [Xj = −pj ] = 1 − pj. LetX := X1 +X2 + . . . +Xk. Moreover, let σ2

j := pj(1 − pj) denote the variance of Xj andσ2 := σ2

1 + σ22 + · · ·+ σ2

k. Then

E[|X|] 6 1

2λeλσ2 +

ln 2

λ

for all λ > 0.

Proof. For each i, we have

eλXi = 1 + λXi +1

2λ2X2

i +

∞∑

k=3

1

k!λkXk

i

= 1 + λXi +1

2λ2X2

i + λ2X2i

∞∑

k=3

1

k!λk−2Xk−2

i

6 1 + λXi +1

2λ2X2

i + λ2X2i

∞∑

k=3

1

k!λk−2

6 1 + λXi +1

2λ2X2

i +1

2λ2X2

i

∞∑

k=3

1

(k − 2)!λk−2

= 1 + λXi +1

2λ2X2

i

(

1 +∞∑

k=3

1

(k − 2)!λk−2

)

= 1 + λXi +1

2λ2X2

i eλ.

In the third line we have used |Xi| ≤ 1.It follows that

E[eλXi ] 6 1 + λE[Xi] +1

2λ2eλE[X2

i ]

= 1 + 0 +1

2λ2eλσ2

i

6 e1

2λ2eλσ2

i ,

where the last inequality follows from 1 + x 6 ex.Because the random variables X1, X2, . . . , Xn are independent, we get

E[eλX ] =n∏

i=1

E[eλXi ] 6n∏

i=1

e1

2λ2eλσ2

i = e1

2λ2eλσ2

.

24

It follows

eE[λ|X|]6 E[eλ|X|]

6 E[eλX ] + E[e−λX ]

6 e1

2λ2eλσ2

+ e1

2λ2eλσ2

= e1

2λ2eλσ2+ln 2,

where the first inequality follows from Jensen’s inequality (and the fact that the exponen-tial is convex), the second inequality follows from the easy inequality e|x| 6 ex + e−x, andthe third inequality follows from symmetry. Taking logarithms and using λ > 0, we get

E[|X|] 6 1

2λeλσ2 +

ln 2

λ.

25

aclosestvectorproblemarisingin radiationtherapyplanningoptimal solution of a natural lp relaxation...

Documents