evaluation of placement techniques for dna probe array layout andrew b. kahng 1 ion i. mandoiu 2...

Evaluation of Placement Techniques for DNA Probe Array Layout

Andrew B. Kahng1 Ion I. Mandoiu2 Sherief Reda1 Xu Xu1 Alex Zelikovsky3

(1) CSE Department, University of California at San Diego

(2) CSE Department, University of Connecticut

(3) CS Department, Georgia State University

Introduction to DNA microarrays and border minimization challenges

Outline

Partitioning-based probe placement

Comparison of probe placement heuristicsQuantified sub-optimality of placement

Conclusions and future research directions

Previous probe placement algorithm

Introduction to DNA Probe ArraysDNA Arrays are composed of probes where each probe is a sequence of 25 nucleotides

Images courtesy of Affymetrix.

Optical scanning

Laser activation

Tagged fragments flushed over array

Probe Synthesis

array probes

A 3 X 3 array

CG AC G

AC ACG AG

CG AG C

Nu

cle

otid

e D

ep

ositi

on

Se

que

nce

AC

G

A Mask 1

A

A

A

A

A

Probe Synthesis

array probes

CG AC G

AC ACG AG

CG AG C

Nu

cle

otid

e D

ep

ositi

on

Se

que

nce

AC

G

C

C

C C

C

CA

A

A

A

A

A 3 X 3 array

C Mask 2

Probe Synthesis

array probes

CG AC G

AC ACG AG

CG AG C

Nu

cle

otid

e D

ep

ositi

on

Se

que

nce

AC

G

C

C

C C

C

CA

A

A

A

A

G

G G

G

G

G

A Nucleotide Deposition Sequence defines the order of nucleotide deposition

A Probe Embedding specifies the steps it uses in the sequence to get placed

A 3 X 3 array

G Mask 3

Border Minimization Challenges

Lamp

Mask

Array

Problem: Diffraction, internal reflection, scattering, internal illumination

Occurs at sites near to intentionally exposed sites

Reduce Border

Increase yield

Reduce cost

Design objective: Minimize the border

Intentionally exposed sites

Unwanted illumination

Border

Border Reduction with Probe PlacementProbe Placement

Similar probes should be placed close together

Dep

ositi

on S

eque

nce

A

A

C

C

G

GT

T

CT

TA

Probes CT

C

T

C

T

TA

Border = 8

CT

CT

TA

C

T

T

T

A

C

Border = 4

Optimize

Border Reduction in Probe Embedding

Synchronous embedding: deposit one nucleotide in each group of “ACGT”

Probe Embedding

Asynchronous embedding: no restriction

Dep

ositi

on S

eque

nce

A

A

C

C

G

GT

T

CT

TAProbes

C

T

TA

Border = 4

CT

TA

C

T TA

Border = 2

Basic DNA Array Design FlowProbe Selection

Design of Test Probes

Probe Placement

Probe Embedding

DNA Array

Logic Synthesis

BIST and DFT

Placement

Routing

VLSI Chip

Physical Design

Probe Placement

Probe Embedding

Probe Selection


Logic Synthesis

BIST and DFT

Physical Design

Routing

Placement

Analogy

Lithography Lithography

DNA Microarrays Physical Design Problem

Placement of probes in n x n sites

Give: n2 probes

Total border cost

Find:

Embedding of the probes

Minimize:


Outline





Previous Work

Border minimization was first introduced by Feldman and Pevzner. “Gray Code masks for sequencing by hybridization,” Genomics, 1994, pp. 233-235

Work by Hannenhalli et al. gave heuristics for the placement problem by using a TSP formulation.

Kahng et al. “Border length minimization in DNA Array Design,” WABI02, suggested constructive methods for placement and embedding

Kahng et al. “Engineering a Scalable Placement Heuristic for DNA Probe Arrays ,” RECOMB03, suggested scalable placement improvement and embedding techniques

1-D Probe Placement (TSP)

How to place the 1-D ordering of probes onto the 2-D chip?

Probe 1 Probe 2 Probe 3 Probe 4

ACGACG

CTTTTC

ACGATC

CCTATC

ACGACG

Probe 1

ACGATC

Probe 3 Probe 4

CCTATC

Probe 2

CTTTTC

Hamming Distance (P1, P2) = number of nucleotides which are different from its counterpart= border (synchronous embedding)

Hamming Distance =4

Placement By ThreadingThread on the chip

1

2 3

4

ACGACG

Probe 1

ACGATC

Probe 2 Probe 3

CCTATC

Probe 4

CTTTTC

Optimized EdgeNot Optimized Edge

Row-Epitaxial Placement

For each site position (i, j):

Find the best probe which minimize border

(i, j)

Move the best probe to (i, j) and lock it in this position

Switch


Outline





Basic DNA Array Design Flow

Partitioning

Placement

Question: Shall we use partitioning in probe placement?

Probe Selection


Probe Placement

Probe Embedding

DNA Array

Logic Synthesis

BIST and DFT

Placement

Routing

VLSI Chip

Physical Design

Probe Placement

Probe Embedding

Probe Selection


Logic Synthesis

BIST and DFT

Physical Design

Routing

Placement

Analogy

Lithography Lithography

Single Nucleotide Placement

A A A A A A A AA A A A A A A ACC

CC

CC

CC

CC

CC

CC

CC

GGTT

GGTT

GGTT

GGTT

GGTT

GGTT

GGTT

GGTT

Row-Epitaxial Placement

Border = 48

A A A A

A A A AA A A A

A A A A

CC

CC

CC

CC

CC

CC

CC

CC

GG

TT

GG

TT

GG

TT

GG

TT

GG

TT

GG

TT

GG

TT

GG

TT

Partitioning Based Placement

Border = 32

Can partitioning based placement achieve improvement for 25-nucleotide probes?


Randomly choose a probe as seed 1.Choose a probe as seed 2 which has the largest Hamming distance with seed 1.

Choose a probe as seed 3 which has the largest total Hamming distance with seed 1 and seed 2.

Choose a probe as seed 4 which has the largest total Hamming distance with seed 1, seed 2 and seed 3.


Level 1 Partition

Level 2 Partition

Row epitaxial one by one

“Border aware”


Outline





2-D Gray code Placement

n=2n=4

For synchronous embedding, Border = 2 for any two neighbor probes.

C G

A T

AC

TC

CC

GC

TG

AG

GG

CG

AA

TA

TT

AT

CA

GA

GT

CT

Scaling Construction

n x n real chip

Ratio= <1 Solution quality scale wellnew border

4(old border)

A AA A CC

A GG A TT

A GA A TC

A AG A CT

CA A GC TG A AT

A CA A GC A TG A AT

Four isomorphic copies with the same border


Outline





Experiments Setup

Chip size range: between 100x100 and 500x500

Randomly generatedType of instances

2-D Gray codeScaled / suboptimality test cases

SynchronousEmbedding methods

Asynchronous

Total border costQuality measure

Gap from lower boundNormalized cost CPU

All tests are run on Xeon 2.4 GHz CPU.

Comparison of Synchronous Placement Results

0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

100 200 300 500 Chip size

Borders

0

20000

40000

60000

80000

100000

120000

100 200 300 500 Chip size

CPU

TSP + Threading Row EpitaxialPartitioning

Based(Level=2)

0

10

20

30

40

50

60

100 200 300 500Chip size

Gap from lower bound

0

5

10

15

20

25

30

100 200 300 500 Chip size

Normalized cost

Compared with row epitaxial, new method reduce the border cost by 3.7% and is 3 times faster.

Results on 2-D Gray code Test cases

0200000400000600000800000100000012000001400000160000018000002000000

16 32 64 128 256 512 Chip size

Borders

TSP + Threading

Row Epitaxial

Recursive Partitioning

0

10

20

30

40

50

60

70

80

16 32 64 128 256 512

Chip size

Gap from Optimal solution

5.6%

Suboptimality Experiments Results

0

5000000

10000000

15000000

20000000

25000000

30000000

35000000

40000000

100 200 300 400 500 Chip size

Borders

Row Epitaxial

Partitioning Based(Level=2)

0.660.68

0.70.720.74

0.760.780.80.82

0.840.86

100 200 300 400 500 Chip size

Scaling ratio

2.5%

Placement Polishing Using Re-Embedding

Use polishing algorithm to re-embed each probe with respect to its neighbors

Perform polishing one by one

Dep

ositi

on S

eque

nce

A

A

C

C

G

GT

T

TC

CG

Probes AC

C

T

C

AC

G

Border = 8Border = 4

Comparison of Asynchronous Placement Results

0

2000000

4000000

6000000

8000000

10000000

12000000

100 200 300 500Chip size

Borders

0

20000

40000

60000

80000

100000

120000

100 200 300 500

Chip size

CPU

TSP + Threading Row EpitaxialPartitioning Based

(Level=2)

0

20

40

60

80

100

120

100 200 300 500Chip size

Gap from lower bound

17

18

19

20

21

22

23

100 200 300 500 Chip size

Normalized cost

Compared with row epitaxial, new method reduce the border cost by 4% and is 2.65 times faster.


Outline





Conclusions

We draw a fertile analogue between DNA array and VLSI Design AutomationWe propose a new recursive partitioning-based placement algorithm and a new embedding algorithm which achieves 4% improvementWe study and quantify the performance of existing and newly proposed algorithms on benchmarks with known optimal cost as well as scaling suboptimality experiments

Open Research Directions

Stronger placement operators leading to further reduction in the border cost.Future work also covers next generation chips 10k × 10k.

Add flow-awareness to each optimization stepand introduce feedback loops.Add the pools of probes taken from probe selection tool.

evaluation of placement techniques for dna probe array layout andrew b. kahng 1 ion i. mandoiu 2...

Documents

c t t t

probes c t c t c t t

c border

c c g g t t c t t

probes c t t

array c mask

array slide

dna probe array layout