distributed computing 9. sorting - a lower bound on bit complexity shmuel zaks...
TRANSCRIPT
Michael C. Loui , The Complexity of Sorting on Distributed Systems, 1984
The problem
p p
Given N processors on a ring network
Each processor p has initial value I V(p)
and fi nal value FV(p)
L >N, p: 0 I V(p) L
I V(i) I V(j ) if i j
{I V(p)|} ={FV(p)|}
sort the I V(p)'s s.t. base b s.t.
FV(b)
" £ £
¹ ¹
$
<FV(b +1) <FV(b +2) <... <FV(b +N - 1)
Model
Asynchronous network
N identical processors on a ring
Known topology
No failures
Bounded memory O(logN)
Bounded message size O(logN)
N=8L=100
p0 p1
p2
p3
p5
p7
p6
IV(2) = 34
IV(3) = 90
IV(6)=16
IV(7)=4
IV(1) =29IV(0) = 45
p4
IV(4) = 8IV(5) = 28p0 p1
p2
p3
p5
p7
p6
FV(2) = 90
FV(3) = 4
FV(6)=28
FV(7)=29
FV(1) =45FV(0) = 34
p4
FV(4) = 8FV(5) = 16
Upper bound
2O( log ) bitsL
NN
Simple algorithm
Program for i:Phase 1 : Select a leader
Phase 2if leader then send(Sorted( {IV{i} ) else [ receive Sorted ({IV(leader), IV(leader+1) … IV(i - 1)}) send Sorted( {IV(leader), IV(leader+1) … IV(i) } ) ]
Phase 3:
receive Sorted(S)FV(i) = min( S ) send Sorted( S \ FV(i))
Phase 1: find a leader. This will be the base b=0.
Phase 2: b sends its value.
each processor i at its turn receives the sorted
values IV(0),…IV(i-1), and sends the sorted values
IV(0),…IV(i)
Phase 3: each i (starting at i=0) receives the suffix
of length N-i of the sorted list, and send the
suffix of length N-(i+1) of the sorted list.
At the end each I holds the i+1st value
Phase 1: O(N logN) messages of size O(logL) each
Phase 2: i sends i+1 messages of size O(logL)
Phase 3: i sends N-(i+1) messages of size O(logL)
Hence, i sends at phases 2 and 3 N messages of size
O(log L)
Messages:
O( N log N) + O ( N2) = O ( N2)
Bits : O (N2 log L )
Time : O(N)
Optimal algorithm
Decode the sorted sequence more efficiently
Decode a sequence S = (a1, … , ak ) by (a1 – a0 , a2 – a1, … , ak – ak-1)
(assume a0 =0)
For a given S = (a1, … , ak ), decode
(a1 – a0 , a2 – a1, … , ak – ak-1).
write ai – ai-1 in binary, and then replace
0 by 00 1 by 10
, by 01 ( and ) by 11
E(S) is the resulting binary string
Length of E(S)
( )k k
j j-1 j j-1j=1 j=1
k
j j-1 kj=1
1E(S) = 2log(a - a ) +O(1) =2k log(a - a ) +O(k)
k
12klog (a - a ) +O(k) =2klog(a / k) +O(k)
k
2klog(L/ k) +O(k)
æ ö÷ç ÷ £ç ÷ç ÷÷çè ø
æ ö÷ç ÷ £ç ÷ç ÷÷çè ø
å å
å
klog(L/ k) Nlog(L/ N) +O(N)£ then £k Nif
( )( )Hence, E(S) =O Nlog L/ N
( Better than O(NlogL) )
(prove)
insert and delete_min of E(S)
1 ii aba
a1 ... ai+1-aiai-ai-1 an-an-1...
a1 ... ai-ai-1 an-an-1...ai+1 - bb - ai
a1 a2-a1 an-an-1...
an-an-1...a2
insert b,
Delete_min
Optimal algorithm
Program for i:
Phase 1 : Select a leader
Phase 2if leader then send Encode( {IV{i} )else [ receive SortedEncoding({IV(leader), IV(leader+1)… IV(i - 1)})
send SortedEncoding( {IV(leader), IV(leader+1) … IV(i) } )]
Phase 3:
receive SortedEncoding(S)FV(i) = min( S ) send SortedEncoding( S \ FV(i))
complexity
Phase 2 and 3 each processor sends a decoding of a sequence of length at most k, hence the bit complexity is
2
2 2
LO(NlogNlogN) + O(N log ) +
NL L
O(N log ) =O(N log )N N
In the model the memory is bounded by O(logN), hence L=O(logN), and the message complexity is
2 2L LO(N log / logL) =O(N log / logN)
N N
Lower bound
2( log ) bitsL
NN
W
Sketch: find many distributions of initial value so that N/4 values will have to travel a distance of at least N/16.
4
N
16
N
SBS – set of finite sequences of binary strings where each component is non-empty
Lemma 1: Less than 4b+1/6 sequences in SBS have at most b bits
n(b) – number of sequences with b bits
b+1b b+11 14 - 4
N(b) =n(1) +…+n(b) = (4 +…4 ) = <4 / 62 2 4 - 1
b
n(1) =2
n(b) =4n(b- 1)
n(b) =4 / 2Þ
n(b) – number of sequences with b bits
: Let S be a set of
diff erent sequences in SBS. The total number of bits
1among the sequences in S is at least log
61
Proof : Let b=1+ log2
2 43 6
by Lemma 1 at least 1/ 3 of the
Lemma
se
nc
2
que
b
s
s s
s
s
ê úê úê úë û
³
es
in Shave at least b bits each.
The total number of bits is at least
1 1log
3 6bs s s³
p0 p1
p2
p3
p4
p5
p8
p7
p6
P1
P2
IV(0) IV(1)
IV(2)
IV(3)
IV(4)
IV(5)
IV(6)
IV(7)
IV(8)
c1 m1 c1 m2 c2 m3 c1 m4
1 2
1 2
distribution: p I V(p)
P set of processors, distribution f or P
induced by d
d and d agree on P
destination of p - processor
where I V(p) resides at the end
partition P into P and P.
cut induced
®
1 2
by the partition:
all links connecting P and P
A - distributed algorithm, c a cut,
d a distribution
signature of A f or d on c: sequence of
messages sent on links in c during
the execution
S1 S2 S2p . p . p .
p .p .
1 2
2
Lemma 3: Let c be a cut induced by a partition P,P.
Let D be a collection of distibutions that agree on all
processors in P.
I f algorithm A had f ewer than |D| diff erennt signatures
on c f or the dist
2
ributions in D, then f or two diff erent
distributions of D, algorithm A produces the same set of
fi nal values in P.
b=b'+mlog|c| (1+log|c| )b'é ù é ù£ê ú ê ú
Lemma 4: I n a signature with b bitson cut c the
bnumber of message bits is at least
1+log |c|é ùê ú
Let b' the number of message bits that were
sent and let m be the number of messages.
Since m b , we have £
2
Theorem: On a bidirectional ring of N processors
with initial values in {0,...,L}, every sorting
algorithm has bit complexity of (N log (L/ N))W
Let A be an algorithm that solves the
sorting problem.
Let R=L/ N.
Assume R is integer, N-1 divisible by 16.
p pR I V(p) < +1 R
2 2
æ ö÷ç£ ÷ç ÷çè ø f or even p
N +p N +pR I V(p) < +1 R
2 2
æ ö÷ç£ ÷ç ÷÷çè ø f or odd p
NR : distribution of initial values,
satisf ying:
Example:
N=17 L=51.(R=3)
p0 p1
p2
p3
p4
p15
p5
p6
p7
p8
p9
p16
p14
p13
p12
11p
p10
0,1,2
3,4,5
30,31,32
6,7,8
33,34,35
9,10,11
36,37,38
12,13,14
39,40,41
15,16,17
42,43,44
18,19,20
45,46,47
21,22,23
48,49,50
24,25,26
27,28,29
p0 p1
p2
p3
p4
p15
p5
p6
p7
p8
p9
p16
p14
p13
p12
11p
p10
0,1,2
3,4,5
30,31,32
6,7,8
33,34,35
9,10,11
36,37,38
12,13,14
39,40,41
15,16,17
42,43,44
18,19,20
45,46,47
21,22,23
48,49,50
24,25,26
27,28,29
If b is chosen as the base , then the destination of IV(p) is
2
pb
f or even p
2N p
b+
+
f or odd p
Pigeonhole principle
If you put n pigeons in k holes, then at least one holeWill contain at least N/k pigeons
If the sum of k numbers is N, then at least one of them is at least N/k
N
There are only N diff erent bases.
Theref ore, there is
Ra processor b s.t. in at least distributions
Nb will be chosen as base.
Denote this set of distributions by D.
Let:
bN
q 2116
16
b
Nr 2
16
110
bN
s 2116
111
b
Nt 2
16
15
s
t
r
q
1
N-1Let P be the set of processors q,q+1,...,r
4
2
5(N-1)Let P be the set of processors s,s+1,...,t
8
p1
p2
s
t
r
q
4
N
16
N
1
1
2
Let p P.
The destination of I V(p) is
N+q rbetween s=b+ and q=b+ .
2 2Hence each initial value in P has
N-1to travel a distance of at least 1
16to reach its destination in P
Î
+
p1
p2
q = 15 r = 1 s = 3 t = 13
p0 p1
p2
p3
p4
p15
p5
p6
p7
p8
p9
p16
p14
p13
p12
11p
p10
0,1,2
3,4,5
30,31,32
6,7,8
33,34,35
9,10,11
36,37,38
12,13,14
39,40,41
15,16,17
42,43,44
18,19,20
45,46,47
21,22,23
48,49,50
24,25,26
27,28,29 r
sq
t
'1 1
3N+1'4
1
'1
N
(3N+1
3N+1Let P be the set of the processors not in P.
4
There are R distributions of P corresponding
to our conditions.
Theref ore there is at least one distribution of P
so that at least
R / NR
(N-1)/ 4
)/ 4
'1
R=
N
distribution in D will agree with it on P.
(N-1)/ 4RWe will choose exactly
Nsuch distributions and denote them by D' .
s =
2Pand 1P disjoint cuts that separate 16
11
N Let C be a set of
,16
1,,
16
1,
16
1,,,
16
1
,1,22,1,1,2,2,1
,,11,,,1,1,
Nrss
Nr
Nqtt
Nq
rrrrqqqq
rrrrqqqq
P1
P2
P1
P2
'0
1 2
2
1 2
For any distribution in D
I V(P) are distinct thus FV(P ) are distinct
I V(P ) are identical
by Lemma 3: at least | D'| distinct signatures
on each cut separating P and P .
by Lemma 2: in signatures
s
s
=
there are at least
1b= log bits of signature
6b
by Lemma 4: at least are message bits1+|log|c||
s s
log
18
1
|)|log1(6
log),(
'
Dd c
dcn
Summing over all cuts:
log18
1
16
11),(
'
Ndcn
DdCc
Let n(c,d) be the number of message bits
sent on cut c f or initial distribution d
Hence there exists a distribution d* such that
)log()log(loglog4
1
16
15
18
1
log16
11
18
1),(*),(
22
'
N
LNRNNR
NN
Ndcndcn
DdCcCc
Namely, there exists a distribution s.t. the total number of bits sent over the cut c is
)log( 2
N
LN
Notes:- The proof applies also to synchronous systems- The memory at each processor can be unbounded, or even infinite.- By the model each message is bounded by O(logN), thus the lower bound on the number of messages is
2 LΩ (N log / logN)
N