in copyright - non-commercial use permitted rights ...41651/... · dna-encoded chemical libraries...

Research Collection

Doctoral Thesis

DNA-encoded chemical libraries

Author(s): Mannocci, Luca

Publication Date: 2009

Permanent Link: https://doi.org/10.3929/ethz-a-005783014

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

https://doi.org/10.3929/ethz-a-005783014

http://rightsstatements.org/page/InC-NC/1.0/

https://www.research-collection.ethz.ch

https://www.research-collection.ethz.ch/terms-of-use

Luca

Man

nocc

i D

NA

-Enc

oded

Che

mic

al L

ibra

ries

Dis

s. E

TH N

o.18

153 Diss. ETH No. 18153

DNA-Encoded Chemical Libraries

Luca Mannocci

DISS. ETH NO. 18153

DNA-Encoded Chemical Libraries

A dissertation submitted to the

ETH Zurich

For the degree of

Doctor of Sciences

Presented by

Luca Mannocci

Dott. Chim. Università degli Studi di Pisa

Born September 7, 1979

Citizen of Pisa (Italy)

Accepted on the recommendation of

Prof. Dr. Dario Neri, examiner

Prof. Dr. Karl-Heinz Altmann, co-examiner

Zurich, 2009

“I believe in intuition and inspiration. Imagination is more important

than knowledge. Knowledge is limited. Imagination embraces the

entire world, stimulating progress, giving birth to evolution.

It is, strictly speaking, a real factor in scientific research.”

Albert Einstein

Alla mia famiglia

TABLE OF CONTENTS

1. SUMMARY ...........................................................................................7

RIASSUNTO .............................................................................................9

List of abbreviations ...............................................................................11

2. INTRODUCTION ..............................................................................14

2.1 DNA-Encoded Chemical Libraries ................................................................16

2.1.1 Libraries of DNA displaying one covalently linked chemical entity ........20

2.1.1.1 DNA-encoded “Split-&-Pool” ............................................................20

2.1.1.2 DNA-assisted “Split-&-Pool” .............................................................21

2.1.1.3 DNA-templated synthesis ....................................................................24

2.1.1.4 Stepwise coupling of coding DNA fragments to nascent organic molecules .........................................................................................................28

2.1.2 DNA libraries displaying multiple covalently linked chemical entities ESAC libraries.....................................................................................................30

2.2. The decoding of DNA-encoded chemical libraries........................................38

2.2.1 Microarray-based decoding .......................................................................35

2.2.2 Decoding by high throughput sequencing ................................................38

2.2.2.1 “454” technology.................................................................................40

2.2.2.2 Solexa technology ...............................................................................42

2.2.2.3 SOLiD techonlogy ...............................................................................44

2.2.2.4 Single Molecule DNA Sequencing – Helicos technology..................48

2

3. RESULTS ............................................................................................50

3.1 DNA-Encoded Library “DEL4000”...............................................................50

3.1.1 Library design and synthesis .....................................................................51

3.1.2 Model Compounds .....................................................................................53

3.1.3 Oligonucleotides.........................................................................................54

3.1.4 Compounds.................................................................................................55

3.1.5 HPLC Purification .....................................................................................56

3.1.6 Mass Spectrometry .....................................................................................57

3.1.7 Oligonucleotide concentration determination ..........................................58

3.1.8 Polymerase Klenow encoding ....................................................................59

3.1.9 Summary ....................................................................................................59

3.2 Selections using the DEL4000 library ............................................................61

3.2.1 Streptavidin selection .................................................................................62

3.2.1.1 Identification of streptavidin binding molecules ...............................64

3.2.1.2 Characterization of streptavidin binding molecules ..........................65

3.2.2 Polyclonal human IgG selection ...............................................................68

3.2.2.1 Identification of polyclonal IgG binding molecules ..........................68

3.2.2.2 Characterization of polyclonal IgG binding molecules by affinity chromatography resins ...................................................................................70

3.2.3 Matrix metalloproteinase 3 (MMP3) selection .........................................71

3

3.2.3.2 Characterization of MMP3 binding molecules..................................72

3.2.4 Computational simulation of DEL4000 selections ...................................73

3.3 General strategies for the stepwise construction of very large DNA encoded chemical libraries ...................................................................................................75

3.3.1 Selective deprotection and reaction of di-amine derivatives ....................75

3.3.1.1 Orthogonal protective group and selective deprotection ...................76

3.3.1.2 Core scaffolds design and synthesis strategy .....................................78

3.3.1.3 Model compounds for N-Fmoc, N’-Nvoc di-amino carboxylic acid core scaffold based library. .............................................................................80

3.3.2 Stepwise DNA-encoding ............................................................................82

3.3.2 Encoding by ligation ..............................................................................82

3.3.2.1 Encoding by a combination of Klenow polymerase and ligation......83

3.3.2.2 Encoding by Klenow polymerase........................................................84

3.3.3 Summary ....................................................................................................85

4. DISCUSSION ......................................................................................87

5. MATERIAL AND METHODS .........................................................89

5.1 Reagents and general remarks .......................................................................89

5.2 Synthesis of DEL4000 DNA Encoded Library..............................................89

5.2.1 Synthesis of library model compounds oligonucleotide conjugate. .........90

5.2.2 Coupling reactions of 20 Fmoc-protected amino acids. ...........................91

5.2.3 Coupling reactions of 200 carboxylic acids. .............................................91

4

5.2.4 Polymerase Klenow encoding of 200 carboxylic acids reactions. ............92

5.2.5 Preparation of D-desthiobiotin oligonucleotide-conjugate (positive control) ................................................................................................................92

5.3 Library DEL 4000 selections...........................................................................93

5.3.1 Streptavidin selection. ................................................................................93

5.3.1.1 Identification of binding molecules....................................................93

5.3.1.2 Synthesis of the binding molecules as fluorescein conjugates..........93

5.3.2 Affinity measurements. .............................................................................94

5.3.3 Polyclonal human IgG selection. ..............................................................95

5.3.3.1 Polyclonal human IgG coating of sepharose beads. .........................95

5.3.3.2 Identification of human IgG binding molecules. ..............................95

5.3.3.3 Synthesis of affinity chromatography resin containing the compound 02-40 or 16-40. ................................................................................................96

5.3.3.4 Polyclonal human IgG Cy5 labeling. .................................................97

5.3.3.5 Biotinylated polyclonal human IgG. ..................................................97

5.3.3.6 Affinity chromatography of CHO cells supernatant spiked with human IgG Cy5 labeled or biotinylated human IgG on IgG binding resin. 97

5.3.4 Human MMP3 selection. ...........................................................................98

5.3.4.1 Human MMP3 coating of sepharose beads. ......................................98

5.3.4.2 Identification of human MMP3 binding molecules. .........................99

5.3.4.3 Synthesis of the MMP3 binding molecules as fluorescein conjugates...........................................................................................................................99

5.3.5 Computational simulation .......................................................................100

5

5.4 Stepwise coupling by selective deprotection and reaction of di-amine derivatives.............................................................................................................100

5.4.1 DNA-compatible cleavage of different amino protective groups. ..........100

5.4.1.1 Synthesis of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid (1c).........................................................................................................................100

5.4.1.2 Synthesis of N-Bpoc cis-2-aminocyclopentanecarboxylic acid (1d).........................................................................................................................101

5.4.1.3 Synthesis of N-Nvoc cis-2-aminocyclopentanecarboxylic acid (1b).........................................................................................................................101

5.4.1.4 Synthesis of 4-pentenoic N-hydroxy succinimide ester (1e)............102

5.4.1.5 Synthesis of Nα-Fmoc-Nε-Nvoc-lysine (2). ......................................102

5.4.1.6 Oligonucleotide conjugation of N-protected cis-2-aminocyclopentanecarboxylic acid derivatives and Nα-Fmoc-Nε-Nvoc-lysine.........................................................................................................................103

5.4.1.7 Cleavage of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid oligonucleotide conjugate. ............................................................................103

5.4.1.8 Cleavage of N-Bpoc cis-2-aminocyclopentanecarboxylic acid. oligonucleotide conjugate. ............................................................................104

5.4.1.9 Cleavage of N-Nvoc cis-2-aminocyclopentanecarboxylic acid and N-Fmoc-N’-Nvoc-lysine oligonucleotide conjugate. .......................................104

5.4.2 Synthesis of model scaffolds for Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative based library. ............................................................................105

5.4.2.1 Synthesis of (1R,3R,4R)-methyl 3-azido-4-Boc-amino-cyclopentanecarboxylate (4). ........................................................................105

5.4.2.2 Synthesis of (1S,3R,4R)-methyl 3-amino-4-Boc-amino-cyclopentanecarboxylate (5). ........................................................................105

5.4.2.3 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Boc-amino-cyclopentanecarboxylate (6). ........................................................................105

6

5.4.2.4 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Nvoc-amino-cyclopentanecarboxylate (8). ........................................................................106

5.5 Stepwise encoding ..........................................................................................106

5.5.1 Stepwise encoding by Ligation. ...............................................................107

5.5.2 Stepwise encoding by a combination of Klenow polymerase and Ligation.............................................................................................................................107

5.5.3 Stepwise encoding by Klenow Polymerase. .............................................108

5.5.4 Stepwise coupling and encoding of model compound for Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative based library..................................109

5.5.5 Bacterial cloning and sequencing. ..........................................................111

6. REFERENCES..................................................................................112

7. CURRICULUM VITAE ..................................................................122

8. ACKNOWLEDGMENTS................................................................128

9. APPENDIX........................................................................................131

9.1 Model compounds oligonucleotide conjugate..............................................131

9.2 Library synthesis overview ...........................................................................133

7

1. SUMMARY

The isolation of small organic molecules capable of specific binding to biological

targets is a central problem in chemistry, biology and pharmaceutical sciences.

Consequently, there is a considerable interest in the development of powerful and

convenient technologies for the construction of large sets (“libraries”) of chemical

compounds and of novel screening methodologies for the identification of binding

molecules. DNA-encoded chemical libraries represent an innovative approach to the

construction and screening of libraries of unprecedented dimension and quality. Such

libraries consist of a collection of chemical compounds, each individually coupled to

a distinctive DNA fragment which serves as identification bar code. DNA-encoded

chemical libraries can be "panned" on a target protein immobilized on a solid support.

Typically, high-throughput sequencing reveals the different composition of the library

before and after panning, thus allowing the identification of binding molecules to the

target protein of interest. In this respect, DNA-encoded chemical libraries bear a

logical similarity to phage display libraries of proteins and peptides, in which the

binding specifically displayed on the tip of the phage surface (“phenotype”) is

physically linked to the gene coding for the polypeptide (“genotype”).

In the first part of the this thesis, I present a general strategy for the stepwise coupling

of coding DNA fragments to nascent organic molecules following individual reaction

steps, as well as the implementation of high-throughput sequencing for the

identification and relative quantification of library members. The methodology was

exemplified in the construction of a DNA-encoded chemical library containing 4’000

compounds (DEL4000) covalently attached to unique DNA-fragments serving as

amplifiable identification bar-codes. We have also assessed the relative composition

of the new library and its functionality by performing selection experiments on

sepharose resin coated with streptavidin. This study has led to the identification of

novel chemical compounds with submicromolar dissociation constants towards

streptavidin. Moreover we have found that selections can conveniently be decoded

using a recently described high throughput DNA sequencing technology (termed “454

technology”), originally developed for genome sequencing,

8

In a second selection experiment binding molecules to polyclonal human IgG were

identified. I could show that, upon coupling to resin, these compounds could be used

for the affinity purification of human IgG from culture supernatants.

Furthermore we also carried out a selection against the catalytic domain of human

matrix metalloproteinase 3 (MMP3). Matrix metalloproteinases (MMPs) are zinc-

dependent proteases which are involved in tissue remodelling of a variety of

physiological and pathological processes. The selection facilitated the identification of

a binding compound with dissociation constant in the low μM range.

Encouraged by these results we investigated methodologies for the construction of

very large DNA-encoded chemical libraries, featuring the stepwise addition of at least

three independent sets of chemical moieties onto an initial scaffold, using suitable

orthogonal chemical reactions and/or protecting strategies, followed by the sequential

addition of the corresponding DNA codes. Our experiments have shown that it should

be possible to construct DNA-encoded libraries containing over one million

individual chemical compounds. The construction of such libraries is currently in

progress.

9

RIASSUNTO L’isolamento di sostanze organiche in grado di interagire specificamente con target

biologici è un problema cruciale sia in chimica, biologia che in campo farmaceutico.

Di conseguenza sta emergendo un crescente interesse in sviluppare nuove rapide ed

efficienti tecnologie per la costruzione e lo screening di ampie raccolte (“librerie”) di

composti organici. Un’innovativa e brillante soluzione a questo problema è

rappresentato dalle librerie chimiche “DNA-encoded”. Essenzialmente queste

tecniche prevedono la costruzione di librerie di composti organici in cui ciascun

membro è covalentemente coniugato a uno specifico frammento di DNA che

“codifica” inequivocabilmente la sua natura. Per tanto, la selezione di composti

d’interesse con specifiche attività biologiche (“screening”) utilizzando librerie “DNA-

encoded” può essere facilmente eseguita incubando ad esempio la libreria con

l’opportuno target biologico immobilizzato su un supporto solido. Dopo aver escluso i

composti non-leganti, attraverso appropriati lavaggi del supporto, le moderne tecniche

di “high-throughput sequencing” permettono di sequenziare gli specifici codici di

DNA, di determinare la composizione della libreria prima e dopo la selezione e di

conseguenza di identificare i composti effettivamente interagenti con il target

biologico d’interesse. Da questo punto di vista le librerie chimiche “DNA-encoded”

racchiudono un’intrinseca analogia con le librerie di fagi utilizzate nella “phage

display”, in cui ciascuna proteina o peptide (“fenotipo”) è fisicamente associata al

corrispondente gene codificante (“genotipo”).

Nella prima parte di questa Tesi è descritta una strategia generale per la costruzione di

librerie chimiche “DNA-encoded” e l’implementazione delle tecniche di “high-

throughput sequencing” per l’identificazione e la relativa quantificazione dei membri

della libreria prima e dopo la selezione. La metodologia è qui esemplificata nella

costruzione di una libreria chimica “DNA-encoded” contenente 4’000 composti

(DEL4000) ciascuno univocamente identificato tramite specifici DNA-oligonucleotidi

covalentemente coniugati. In seguito è stata determinata la relativa composizione

della libreria e la sua funzionalità eseguendo esperimenti di selezione impiegando

strptavidina immobilizzata su resina di sefarosio. Questi studi hanno condotto

all’identificazione di nuovi composti chimici con costanti di dissociazione sub-

10

micromolare verso la streptavidina e hanno inoltre dimostrato che le tecniche di

“high-thoughput sequencing” (denominate “tecnologie 454”), originariamente

sviluppate per la sequenziazione del genoma, possono essere efficacemente impiegate

nel processo di decodifica delle selezioni.

In una seconda selezione utilizzando DEL4000 sono stati identificati composti

specifici per “polyclonal human IgG”. E’ stato quindi dimostrato che tali composti, in

seguito a immobilizzazione su resina cromatografica, possono essere utilizzati nella

purificazione per affinità di IgG umani da supernatanti derivanti da colture cellulari.

Infine è stata eseguita una selezione per l’identificazione di nuovi composti specifici

per il dominio catalitico del “human matrix metalloproteinase 3” (MMP3). Le “matrix

metalloproteinases” (MMPs) sono una famiglia di proteasi zinco-dipendenti coinvolte

nel rimodellamento del tessuto in una varietà di processi fisiologici e patologici. La

selezione ha permesso l’identificazione di un composto con costante di dissociazione

micromolare.

Incoraggiati da questi risultati, abbiamo deciso di approfondire le ricerche per la

costruzione di una libreria chimica “DNA-encoded” di dimensioni superiori

prevedendo la congiunzione sequenziale di almeno tre serie indipendenti di composti,

utilizzando reazioni chimiche ortogonali e/o strategie di protezione/deprotezione di

gruppi funzionali, seguita dall’introduzione di corrispondenti codici di DNA. E’ stata

quindi dimostrata la possibilità di costruire una libreria chimica “DNA-encoded”

contenente oltre un milione di composti. La costruzione di questa libreria (DEL10e6)

è attualmente in corso.

11

List of abbreviations

aq. aqueous

ATP Adenosine-5'-triphosphate

bp base pair

CAII Carbonic Anhydrase II

CHO Chinese Hamster Ovary

CNBr Cyanobromide

Cy3 2-((1E,3E)-3-(1-(5-(2,5-dioxopyrrolidin-1-yloxy)-5-oxopentyl)-3,3-dimethylindolin-2-ylidene)prop-1-enyl)-3,3-dimethyl-1-propyl-3H-indolium

Cy5 2-((1E,3E,5E)-5-(1-(5-(2,5-dioxopyrrolidin-1-yloxy)-5-oxopentyl)-3,3-dimethylindolin-2-ylidene)penta-1,3-dienyl)-1,3,3-trimethyl-3H-indolium

DCM Dichloromethane

DEL DNA Encoded Library

DIEA N,N'-Diisopropyethylamine

DMBAA Dimethylbuthylammonium acetate

DMF N,N'-Dimethylformamide

DMSO Dimethylsulfoxyde

DMT dimethoxytrityl

DNA Deoxyribonucleic acid

dNTPs deoxyribonucleotides

DTT Dithiothreitol

ECM Extracellular Matrix

EDC N-ethyl-N'-(3-dimethylaminopropyl)-carbodiimide

EDTA Ethylenediamineetracetic acid

12

equiv. equivalent

ESAC Encoded Self-Assembling Chemical library

ESI Electrospray ionization

FG Functional Group

FITC Fluorescein isothiocyanate

Fmoc (9-fluorenylmethoxycarbonyl)

HATU O-(7-Azabenzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate

HBTU 2-(1H-benzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate

HFIP 1,1,1,3,3,3-hexafluoroisopropanol

HOBt N-hydroxybenzotriazole

HPLC High Performance Liquid Chromatography

HSA Human Serum Albumin

HTS High Throughput Screening/Sequencing

IgG Immunoglobulin G

Kd Dissociation constant

LC Liquid Chromatography

MMP3 human Matrix MetalloProteinase 3

MS Mass Spectrometry

ND NanoDrop

NHS N-hydroxysuccinimmide

NMR Nuclear magnetic resonance

Nvoc 4,5-dimethoxy-2-nitrobenzylmethoxycarbonyl

PAGE Polyacrylamide gel electrophoresis

13

PBS Phosphate buffered saline

PCR Polymerase Chain Reaction

Prep Preparative

RNA Ribonucleic acid

RP Reverse Phase

SDS sodium dodecyl sulfate

SNP Single Nucleotide Polymorphism

SOLiD Sequencing by Oligonucleotides Ligation and Detection

sst single strand

TBE Tris-borate-EDTA

TEAA Triethylammonium acetate

THF Tetrahydrofuran

TFA Trifluoroacetic acid

TFE Trifluoroethanol

Tris 2-Amino-2-hydroxymethyl-propane-1,3-diol

tSMS True Single Molecule Sequencing

Tween 20 Polyoxyethylene (20) sorbitan monolaurate

UV Ultraviolet

14

2. INTRODUCTION

The discovery of molecules binding to macromolecular targets is formidable task in

chemistry, biology and pharmaceutical sciences. Following the sequencing of the

human genome1,2, the advances in proteome research3,4 and transcriptomics5, a

multitude of biological targets associated with relevant processes in healthy and

diseased cells have been discovered. With an aging population and an increased

understanding of the mechanisms of disease at a molecular level, biomedical scientists

are facing the demand for more and better drugs. Additionally, elucidation of the

biological function of proteins will, in many cases, require access to specific ligands

(an approach that is often termed ‘Chemical Genetics’4). Even though the specific

binding to the biological target is not per se sufficient to turn a binding molecule into

a drug, as it is widely recognized that other molecular properties (such as

pharmacokinetic behaviour and stability) contribute to the performance of a drug.

Nevertheless the isolation of specific binders against a relevant biological target

typically represents the starting point in the process, which leads to a new drug.6

Techniques for the general, fast, inexpensive isolation of small, organic, binding

compounds are lacking at present. Currently, hundreds of thousands of molecules

typically have to be screened, in order to find a suitable candidate.6 High-throughput

screening (HTS) in certain cases allows the screening of some 100,000 compounds

per day. However, HTS is cumbersome both in terms of costs (for robotic equipment

and material consumption) and technical development (set-up of sophisticated bio-

assays, storage and handling of the chemical archives). Similarly, the preparation,

storage and screening of very large synthetic libraries of organic molecules can be

very demanding, not only from the synthetic point of view, but also in terms of

logistics. Although combinatorial synthetic approaches such as the intriguing “split-

&-pool”7,8,9 methods and solid phase synthesis10,11,12 facilitated the construction of

chemical pools of compounds, inevitably the complexity associated to the specific

binding molecules grows together with the size of the chemical library to be screened

while the relative concentration of each individual member in the library decreases.

Consequently, chemical libraries as pool of compounds are often limited in size due to

15

sensitivity limits of biochemical assays and of the chemical analytical methods for

structural characterization.

Over the last decade, the interest in the development of powerful and convenient

technologies for chemical library construction and screening has increased

dramatically. Techniques such as phage display13,14, yeast display15, ribosome

display16 and covalent display17. In this light it would be useful to devise strategies for

the identification of small organic molecules, capable of binding to target proteins

with high affinity and specificity, based on the association of individual chemical

compounds to unique DNA-fragments serving as identification bar-codes.

16

2.1 DNA-Encoded Chemical Libraries The concept of DNA-encoding was first described in a theoretical paper by Brenner

and Lerner in 1992 who anticipated a “split-&-pool”-based combinatorial synthesis in

which monomeric chemical compounds and coding oligonucleotide tags would be

attached on beads in an alternated fashion (Figure 2-1).18 Shortly afterwards, the first

practical implementation of this approach was presented by S. Brenner and K. Janda19

and similarly by the group of M.A. Gallop20. Brenner and Janda suggested to generate

individual encoded library members by an alternating parallel combinatorial synthesis

of the heteropolymeric chemical compound and the appropriate oligonucleotide

sequence on the same bead in a “split-&-pool”-based fashion, using the solid support

as a structural linker between the nascent chemical entity and its corresponding

oligonucleotide label. Therefore they developed as a test system the synthesis of a

functionally active leucine-enkephalin pentapeptide, with the aim of testing the

feasibility of alternating peptide and oligonucleotide synthesis on the bead. Totally

they accomplished five alternating rounds of peptide and oligonucleotide synthesis.19

............

......

......

Split1 Pool1

aa1

aa2

aa3

aan

tag1

tag2

tag3

tagn

1. m roundssplit-&-pool

2. Release from beads

aax

aa3aa2aa1

tag1tag2tag3

tagm

nm compounds

Figure 2-1: Schematic representation of the DNA-encoding of peptides on beads. The coupling of

amino acids by peptide forming reaction to a growing peptidic chain, alternated to the stepwise

synthesis of a DNA bar-code lead to DNA encoded beads displaying peptides, which can be probed for

binding to selected target protein of interest. ‘aa’ represents the different amino acids, while ‘tag’ refers

to a DNA sequence encoding the corresponding amino acid added in the split-&-pool procedure.

17

Controlled-pore glass was used as a solid matrix to facilitate an efficient

oligonucleotide synthesis. The solid support was derivatized with a succinyl

aminohexanol-sarcosine appendage that allowed the easy detachment of the

oligonucleotide-encoded peptide after synthesis (Figure. 2-2a). In order to fulfill

orthogonality requirements, O-DMT-protected serine and N-Fmoc protected lysine

scaffolds were used for the attachment of the emerging oligonucleotide and peptide

sequences, respectively (Figure. 2-2a). The oligonucleotide-tagged peptides were

released from the beads and Edman-sequenced. The leucine-enkephalin pentapeptide

(YGGFL) constructed in this fashion (Figure. 2-2b) was shown to bind to the anti-

leucine-enkephalin antibody 3-E7 as efficiently as the reference peptide21 (Kd = 7.1

nM). Remarkably, the codes of released oligonucleotide-tagged peptides could be

amplified by standard polymerase chain reaction (PCR).

NH

NO (C H2)6N H

HN

OO

NH

OO

O

O

O O

O

DM T

O

O

NH Fm o c

Site for the nascent peptide

Site for the oligonucleotide code

Cleavage site

a)

5’-AGCTACTTCCCAAGG GAG CTG CTG CTA GTC GGGCCCTATTCTTAG-3’ LINKER LNHPGGY

Peptide sequencePeptide sequence

PCR priming sitePCR priming site

b)

Figure 2-2: Derivatized solid support allows the oligonucleotide encoding of a nascent peptide

sequence. a) Schematic representation of the derivatized support with succinyl aminohexanol-sarcosine

cleavable appendage. The cleavable linker enables the easy detachment of the oligonucleotide-encoded

peptide after synthesis, while O-DMT-protected serine and N-Fmoc protected lysine allows the

bidirectional synthesis of oligonucleotide and peptide sequences. Recent approaches to DNA-encoded

chemical libraries prefer to omit the beads and link the compounds directly to DNA. b) Leucine-

enkephalin pentapeptide (YGGFL) oligonucleotide conjugate after release from beads. The codes of

released oligonucleotide-tagged peptides could be amplified by standard polymerase chain reaction

(PCR). Leucine-enkephalin pentapeptide was shown to bind to the anti-leucine-enkephalin antibody 3-

E7 as efficiently as the reference peptide (Kd = 7.1 nM)

18

In the same year, Gallop and co-workers constructed an 823´543-member DNA-

encoded heptapeptide library performing seven alternating split-&-pool synthesis

cycles on spherical beads using seven different D- and L-amino acid building

blocks.20 Beads were conjugated with a mixture of two different linkers, one of which,

with a DMT-protected hydroxyl group serving for the stepwise nucleotide addition,

while the other in ca. 20-fold excess over the first one,with an Fmoc-protected amine

was used for building up the polypeptide. After removal of the Fmoc group the beads

were uniformly split into seven pools and reacted with one of the seven amino acid

building blocks. A dinucleotide coding tag was synthesized on the beads of the

individual pools and this process was repeated until the heptapeptide had been

obtained. An additional oligonucleotide sequence was attached to all beads to allow

PCR-based decoding. The final oligonucleotide cleavage in trifluoroacetic acid would

lead to the depurination of deoxyguanosine and deoxyadenosine, which were

therefore deliberately excluded from the oligonucleotide. The final library was

subjected to on-bead screening against the fluorescent monoclonal antibody D32.39

that specifically binds the heptapeptide RQFKVVT. The corresponding

oligonucleotide sequence could be revealed after FACS-based sorting and PCR.

Since unprotected DNA is restricted to a narrow window of conventional reaction

conditions, until the end of the 1990s a number of alternative chemical and physical

encoding strategies were envisaged (i.e. MS-based compound tagging, peptide

encoding, haloaromatic tagging, encoding by secondary amines, semiconductor

devices.) 22, mainly to avoid inconvenient solid phase DNA synthesis and to create

easily screenable combinatorial libraries in high-throughput fashion.

There is considerable evidence that the isolation of binding polypeptides (e.g.

antibodies) requires libraries comprising at least >107-108 members23. In full analogy,

it appears reasonable to assume that large libraries will facilitate the isolation of small

organic binders to protein of interest. However, using conventional methods, even the

largest pharmaceutical companies cannot screen more then few hundred thousands

compounds in HTS campaign. The selective amplifiability of DNA greatly facilitates

library screening and it becomes indispensable for the encoding of organic

compounds libraries of this unprecedented size. Consequently, at the beginning of the

2000s DNA-encoded combinatorial chemistry experienced a revival.

19

Around 2002 several groups realized that omitting beads and attaching chemical

compounds directly to oligonucleotides or DNA fragments could conveniently lead to

very large DNA-encoded chemical libraries. The set-up of DNA-encoded chemical

libraries (DEL) was pursued investigating completely novel avenues. The resulting

libraries can be grouped in libraries DNA-encoded presenting single or multiple

oligonucleotides displaying one covalently linked putative binding molecules

(Figure 2-3).

5‘

3‘

5‘

3‘ 5‘

3‘

a) b)

Multiple pharmacophore format Single pharmacophore format

Figure 2-3: Schematic representation of DNA-encoded library displaying chemical compounds

directly attached to oligonucleotides. a) DNA-encoded library presenting multiple pairing

oligonucleotides each displaying a covalently linked binding molecule. b) DNA-encoded library

presenting a single oligonucleotide covalently linked to a putative binding molecule.

20

2.1.1 Libraries of DNA displaying one covalently linked chemical entity

2.1.1.1 DNA-encoded “Split-&-Pool”

An alternative strategy to construct DNA-encoded library in full analogy with the

encoded “split-&-pool” technique described by Brenner and R. Lerner18, features the

synthesis of chemical compounds directly on the oligonucleotide, omitting the use of

the solid support (i.e., beads) (Figure 2-4). Initially a set of unique oligonucleotides

each containing a specific sequence is chemically conjugated to a corresponding set of

small organic molecules carrying a suitable reactive group. Typically a carboxylic

acid is coupled to amino-modified oligonucleotide. Consequently the oligonucleotide-

conjugate compounds are mixed and divided into a number of groups.

x

x

x

x

x

x

x

x

Split1

x

xx

x

Pool1

tag1

tag2

tag3

tagn

bb1

bb2

bb3

bbn

m roundssplit-&-pool

x

xx

x

Reactive site

x

x

x

x

nm compounds

Figure 2-4: Schematic representation of hypothetical DNA-encoded libraries of linear peptides

constructed in a split-&-pool fashion omitting bead support. An initial building block is conjugated to

oligonucleotide and encoded with a further set of oligonucleotide either by ligases or by polymerase.

Consequently the oligonucleotide-conjugate compounds are mixed, divided into a number of groups

and reacted again with an additional building block. Following encoding, these steps are repeated a

given number of times. ‘bb’ represents the different building block, while ‘tag’ refers to a DNA

sequence encoding the corresponding amino acid added in the split-&-pool procedure.

In appropriate conditions a second set of building blocks are coupled to the first one

and a further oligonucleotide which is coding for the second modification is

21

hybridized to the initial oligonuclotide and enzymatically encoded either by ligases or

by polymerase. In a “split-&-pool” fashion these steps are then repeated. In 2002 the

Danish company Nuevolution and the US company Praecis filed patent applications

for proprietary enzymatic ligation strategies for DNA code assembly enabling

sequential chemical synthesis and DNA-tagging steps.24,25,26,27 Thus far, the two

companies have not yet described practical library application in the literature.

2.1.1.2 DNA-assisted “Split-&-Pool”

In 2004, D.R. Halpin and P.B. Harbury presented a novel intriguing method for the

construction of DNA-encoded libraries.28 For the first time the DNA-conjugate

templates served for both encoding and programming the infrastructure of the “split-

&-pool” synthesis of the library components. The design of Halpin and Harbury

enabled alternating rounds of selection, amplification and diversification with small

organic molecules, in complete analogy to phage-display technology.

In a further milestone paper on DNA-encoded chemical libraries, Halpin and Harbury

demonstrated the efficiency of unique DNA-routing machinery, consisting of series-

connected columns bearing resin-bound anticodons, which could sequence-

specifically separate a population of DNA-templates into spatially distinct locations

by hybridization (termed DNA-routing), (Figure 2-5).28 A 340-mer oligonucleotide

template combinatorial library was constructed in two steps by PCR assembly of

overlapping complementary 40-mer oligonucleotides which contained a 20 base

coding and an adjacent 20-mer non-coding constant region. Therefore, a 108 member

340-mer DNA-duplex template library was obtained which was further converted into

single-stranded DNA format by reverse-transcription and sodium hydroxide

hydrolysis of the RNA strand. These templates were used for investigating the

feasibility of sequence-specific gene routing. A number of anticodon columns were

produced in which the anticodon sequences to the template genes were covalently

coupled to sepharose resin. In high salt conditions, the template genes hybridized

sequence-specifically to the corresponding anticodon columns connected in series.

The individual sequence-specific columns were then joined in series with weak anion-

exchange (DEAE) columns. When changing the conditions from high salt to low salt

and 50% DMF, the oligonucleotides were eluted from the anticodon columns and

could bind to the DEAE columns, where chemical reaction can take place. Following

22

elution from the DEAE columns in high salt conditions the combined DEAE column

eluates were split again by sequence-specific columns, thus entering a new cycle of

“split-&-pool” synthesis. Using a radioactively labelled 340-mer template the authors

showed that the routing was indeed both sequence-specific and efficient (>95% for

anticodon to DEAE column and >90% for DEAE to anticodon column), resulting in

an overall yield of 0.85n for n hybridization rounds. Furthermore, the anticodon

columns proved to be reusable for at least 30 rounds of hybridization and elution.

NH2

NH2NH2 NH2

. . . . . . . . .

NH NH NH

. . . . . . . . .

NH2 NH2 NH2

NHNH2

NH

z7 z6 z5 z4 z3 z2 z1

(a-j)1(a-j)2(a-j)3(a-j)4(a-j)5(a-j)6

a1

a*1 b*

1

b1 j1

j*1

a1 b1 j1

(a-j)1(a-j)2(a-j)3(a-j)4(a-j)5(a-j)6

(a-j)1

z7 z6 z5 z4 z3 z2 z1

Split

Coupling

NH2 NH2 NH2

Pool

6 roundsSplit-&-pool

Figure 2-5: Synthesis of a DNA-encoded chemical library by ‘DNA-routing’. The initial

oligonucleotide template contains six coding regions for ten different amino acids [(a-j)1-6] as well as

seven constant domains (z1-7). The library of coding oligonucleotides, comprising all the possible

combinations of the different coding regions was split by affinity chromatography using specific

complementary oligonucleotides bound on resin [(a*-j*)1-6]. Following separation, each

oligonucleotide template was conjugated to the corresponding amino acid and subsequently pooled

together. The whole cycle was repeated totally six times, yielding to a library of DNA-encoded

hexapeptides.

According to this split-and-pool protocol (Figure 2-5), a combinatorial library

composed of 106 N-acylated pentapeptides conjugate to 340-mer oligonucleotides was

generated.29 Ten different amino acid building blocks were used for the first positions

and nine carboxylic acids for the N-acylation step. The library included acylated

leucine-enkephalin pentapeptides as positive control. After conversion into a DNA

duplex form, the library was subjected to an affinity-based selection against the

23

monoclonal antibody 3-E7, which was known to bind the leucine-enkephalin

pentapeptide YGGFL with 7.1 nM affinity30 (the same selection system was used by

Brenner und Janda in 1993)19. Two iterated cycles of panning were performed. The

eluted DNA from the first round was PCR-amplified and used as input for the

following round of synthesis and selection. After sequencing both input DNA and

eluted DNA after two rounds of panning a strong round-to-round of leucine-

enkephalin pentapeptide DNA conjugates could be demonstrated, leading to a

consensus sequence matching leucine-enkephalin.

To confirm that the coding sequences did not bias the synthesis of leucine-enkephalin

DNA-conjugates, an analogous DNA-pentapeptide library was constructed, differing

only in the coding sequences. Selections performed with this library also evidenced a

105-fold enrichment of the leucine-enkephalin encoded compound.

This novel embodiment of “split-&-pool” library construction, together with the

possibility of chemical translation and diversification, holds promises for the

construction of large DNA-encoded chemical libraries. While the set-up of the routing

technology seems to be tedious at a first glance, exponentially larger libraries can be

constructed with only a linear increase of work. Yet, chemistry has so far been limited

to peptide synthesis. In an additional publication, Harbury and co-workers describe

the feasibility and efficiency of solid phase peptide synthesis on unprotected DNA.31

Yields over 90% per individual coupling step could be achieved which might be

sufficient for the construction of big libraries. Future selection experiments will reveal

whether the accumulation of synthesis failure sequences accumulating from step to

step does not encumber the identification of the best binders. From a drug discovery

point of view, the linear peptides which so far have been produced by this approach

may not represent the drug-like structures pharmaceutical industry is interested in.32

Nonetheless the potentiality of this technology can probably be increased by enlarging

the repertoire of building blocks and expanding the range of chemical reactions.

24

2.1.1.3 DNA-templated synthesis

In 2001 David Liu and co-workers showed that complementary DNA

oligonucleotides can be used to assist certain synthetic reactions, which do not

efficiently take place in solution at low concentration.33,34 At the same time,

Summerer and Marx demonstrated that the use of reagents in close spatial proximity

may lead to an enhancement of reaction rates.35 Indeed, a DNA-heteroduplex can be

used to accelerate the reaction between chemical moieties displayed at the extremities

of the two DNA strands.33,34D.R. Liu and coworkers were the first to show an

efficient series of solution-phase DNA sequence-programmed chemical reactions. In

these reactions, oligonucleotides carrying one chemical reactant group are hybridized

to complementary oligonucleotide derivatives carrying a different reactive chemical

group (Figure 2-6).36 The close proximity conferred by the DNA hybridization

drastically increases the effective molarity of the reaction reagents attached to the

oligonucleotides, enabling the desired reaction to occur even in an aqueous

environment at concentrations which are several orders of magnitude lower than those

needed for the corresponding conventional organic reaction not DNA-templated.36 A

variety of oligonucleotide-derivatives can be paired and can be used to discover novel

chemical reactions.36,37

Figure 2-6: DNA sequence-programmed chemical reactions: schematic overview of the reactions

compatible with the ‘DNA-templated synthesis’ approach. The close proximity conferred by the DNA

hybridization drastically increases the effective molarity of the reaction reagents attached to the

oligonucleotides, enabling the desired reaction to occur. (Adapted from Li, X. and Liu, D.R.36)

25

To a certain extent, this proximity effect which accelerates bimolecular reaction is

distance-independent (at least within a distance of 30 nucleotides), allowing the

introduction of variable DNA coding regions on the oligonucleotide template at

different position. These DNA-templated reactions can be performed in multiple

consecutive steps38 and in step-programmed fashion39. Crucially, by linking chemical

compounds directly to DNA, a linkage of phenotype and genotype may be

established, in full analogy to protein display methodologies. Subsequently the

information content can be amplified by PCR after affinity capture. In a later step,

sequence-programmed synthesis of DNA-conjugates may facilitate library

amplification after selection. The selection efficiency which could be achieved with

DNA-encoded binding molecules and affinity captures, was investigated by

performing selections on glutathione S-transferase with suitable inhibitors, revealing

enrichment factors of the cognate DNA derivatives up to 10,000-fold.40 Recently, Liu

and co-workers described the DNA-templated set-up of a small library of macrocycles

which they subjected to in vitro selection (Figure 2-7).41 For this purpose, a 48-base

DNA-template library comprising 48-mer oligonucleotides carrying an amino group

at 5’ end and containing three consecutive coding regions was used. A lysine was

coupled to the primary amino group at the oligonucleotide extremity by amide bond

reaction formation. The lysine was ε-protected by acylation with a compound

containing a vicinal diol, which allows the cleavage to an aldehyde which serves for

the final ring-closing step through a Wittig-olefination. Initially a code-1

complementary 10-mer oligonucleotide, carrying both a biotin at its 5′ end and an

amino acid N-protected with a base-labile cleavable linker at its 3′ terminus, was

hybridized to the template. The free carboxylic acid moiety of the protected amino

acid was activated to a sulfo-N-hydroxysuccinimidyl ester and covalently reacted with

the free-amino group displayed on the 48-mer template oligonucleotide to form an

amide bond. A purification step of the resulting covalent conjugate was obtained by

capture on avidin-coated beads which retained all biotin-containing fragments, thus

washing away residual, not covalently conjugate 48-mer template oligonucleotide.

26

...

... ...

. . . ...

... 1

2

n

Library of n DNA templates

Reagent Library 1

Annealing and DNA-templated reaction 1

Reagent Library 2

Annealing and DNA-templated

reaction 2

Reagent Library 3

Annealing and DNA-templated

reaction 3

1

2

n

Ring

closure

Selection with

target protein

PCR-selectionusing primer

DNA-sequencing Binder synthesis

Enter next round:Reconstitute enriched

library members

Enrichedconjugates

Figure 2-7: Schematic representation of a DNA-encoded library by ‘DNA-templated synthesis’. A

library of oligonucleotides (i.e, 64 different oligonucleotides) containing three coding regions was

hybridized to a library of reagent compound-oligonucleotide conjugates (i.e., 4 reagent oligonucleotide

conjugates), able of pairing with the initial coding domain of the template oligonucleotide. After

transferring of the compounds on the corresponding olgonucleotide template, the synthesis cycle was

repeated the desired number of times with further sets of carrier compound-oligonucleotide conjugates

(i.e., two rounds with four carrier compound-oligonucleotide conjugates per round). Subsequently

functional selection was performed and the sequence of the binding template amplified by PCR. Thus,

DNA-sequencing allowed the identification of the binding molecule. In the construction of the 65

member library, the 65th template which served as positive control was also subjected to the DNA-

templated synthesis scheme.

By increasing the pH, the base-labile linker could be cleaved and the reaction product

(i.e., the α-amino acylated 48-mer DNA fragment) could be eluted. This procedure

was repeated with an additional code-2 specific specific 12-mer reagent

oligonucleotide and a code-3 specific 12-mer reagent oligonucleotide. In the last

coupling step, the reagent amino acid building block was connected to the

oligonucleotide not by a base-labile linker, but with a linker containing a

phosphonium group. After the third conjugation step and avidin-coated resin

purification, the geminal diol linker of the α-amino group of the 48-mer template was

cleaved by periodate and the resulting aldehyde could undergo a Wittig-olefination to

form a fumaramide, leading to ring-closure to a macrocycle. As in the course of the

27

Wittig reaction the P–C bond between reagent oligonucleotide and template

oligonucleotide was broken, the desired macrocycle-template conjugate self-eluted

from the avidin beads. The authors generated a 65-fumaramide macrocycles library,

starting from four initial building blocks for the three synthetic steps plus plus one

additional aryl sulfonamide building block in the first step which was known to bind

to carbonic anhydrase with nanomolar affinity. The DNA-template of the positive

control included a NlaIII restriction site, which facilitated the monitoring of the

enrichment after the selection by polyacrylamide gel electrophoresis (PAGE)

following PCR amplification and NlaIII digestion. 100 fmol of the DNA-conjugate

macrocycle library were subjected to an in vitro experiment against immobilized

carbonic anhydrase. In a further pseudo-round of selection the eluted DNA was again

loaded onto a carbonic anhydrase column. As decoding strategy of the positive

control binder, the DNA was PCR-amplified and NlaIII digested before selection and

after each elution. Liu and coworkers demonstrated that a significant enrichment of

the positive control oligonucleotide-macrocycle conjugate was detectable after the

second elution. However, the decoding method described in the paper41 was quite

rudimentary and not directly applicable to libraries of larger size. Furthermore, the

possibility to re-synthesize the unbiased library after selection was not demonstrated.

Assisting oligonucleotide strands and proximity-based chemical reactions may

represent an alternative to “split-&-pool” strategies for the construction of large

libraries in solution. While amide bond forming reactions have so far been used for

library construction, it is expected that different chemistries may be used in order to

generate non-peptidic structures. The group of Liu considered a variety of other

possible reaction, which may occur in the presence of DNA (Figure 2-6).36

Additionally, even though the overall yields for the multi-step synthesis of DNA-

encoded compounds were not excellent (approx. 5% over three steps), the use of

avidin resins for products purification contributed to the purity of library compounds.

Nevertheless, quality controls of library synthesis may become more difficult for

libraries of larger size. In this light, DNA-templated synthesis method as the one

described by D.R. Liu and co-workers for constructing libraries with complexities of

pharmaceutical interest remains at present a formidable challenge.

28

2.1.1.4 Stepwise coupling of coding DNA fragments to nascent organic molecules

A promising strategy for the construction of DNA-encoded libraries is represented by

the use of multifunctional building blocks covalently conjugate to an oligonucleotide

serving as a “core structure”for library synthesis. In a ‘spit-&-pool’ fashion a set of

multifunctional scaffolds could undergo orthogonal reactions with series of suitable

reactive partners. Following each reaction step, the identity of the modification could

be encoded by an enzymatic addition of DNA segment to the original DNA “core

structure” (e.g., by ligation, Figure 2-8). This feature has been exploited for the first

time by our group.42,43 Initially we envisaged the use either of a variety of N-protected

amino acids or of diene carboxylic acid derivatives. The use of N-protected amino

acids covalently attached to a DNA fragment allow, after a suitable deprotection step,

a further peptide bond formation with a series of carboxylic acids or a reductive

amination with aldehydes. Similarly, diene carboxylic acids used as scaffolds for

library construction at the 5’-end of amino modified oligonucleotide, could be

subjected to a Diels-Alder reaction with a variety of maleimide derivatives.

FG2

FG2

FG2

FG1

FG1

FG1

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

FG2

. . . . . . . . . . . .

EncodingSplit /Reaction

Pool EncodingSplit /Reaction

Pool....

....

Figure 2-8: Schematic representation of a DNA-encoded library by stepwise coupling of coding DNA

fragments to nascent organic molecules. An initial set of multifunctional building blocks (FGn

represents the different orthogonal functional groups) are covalently conjugate to a corresponding

encoding oligonucleotide and reacted in a split-&-pool fashion on a specific functional group (FG1 in

red) with a suitable collection of reagents. Following enzymatic encoding, a further round of split-&-

pool is initiated. At this stage the second functional group (FG2 in blue) undergoes an additional

29

reaction step with a different set of suitable reagents. The identity of the final modification could be

ensured yet again by enzymatic DNA encoding by means of a further oligonucleotide carrying a

specific coding region.

After completion of the desired reaction step, the identity of the chemical moiety

added to the oligonucleotide could be established by the annealing of a partially

complementary oligonucleotide and by a subsequent Klenow fill-in DNA-

polymerization, yielding a double stranded DNA fragment. The synthetic and

encoding strategies described above enable the facile construction of DNA-encoded

libraries of a size up to 104 member compounds carrying two sets of “building

blocks”. However the stepwise addition of at least three independent sets of chemical

moieties to a tri-functional core building block for the construction and encoding of a

very large DNA-encoded library (comprising up to 106 compounds) (see Chapter 3.3)

can also be envisaged.

Importantly we have found that selections of DNA-encoded chemical libraries can

conveniently be decoded after PCR amplification of the DNA-tags using recently

described high-throughput DNA sequencing technologies (such as “454 technology”),

which had originally been developed for genome sequencing (see Chapter 2.2.2).44

Recent advances in ultra high-throughput DNA sequencing allow the sequencing of

over one million sequence tags per sequencing run (see Chapter 2.2.2)44,45 and may

thus allow the decoding of DNA-encoded libraries containing millions of chemical

compounds.

30

2.1.2 DNA libraries displaying multiple covalently linked chemical

entities—ESAC libraries

Watson-Crick and Hoogsteen46 base pairing allow the sequence-specific assembly of

oligonucleotides to form stable heterodimers and heterotrimers, respectively. Our

laboratory has exploited this feature for the combinatorial self-assembly of

oligonucleotide-chemical compound conjugates.47 In principle, the self-assembly of

two sublibraries of a size of only 103 members containing a constant complementary

hybridization domain can yield a combinatorial DNA-duplex library after

hybridization with a complexity of 106 uniformly represented library members

(Figure 2-9).

Compound 1

Hybridizationdomain

Code 1

Target

Knownbinder

Target Target Target

a) b) c) d)

Target

I

II III

IV

V

VI

e)

Single pharmacophore Affinity maturation Duplex library Triplex library

Figure 2-9: ESAC library technology overview. Small organic molecules are coupled to 5’-amino

modified oligonucleotides, containing a hybridization domain and a unique coding sequence, which

ensure the identity of the coupled molecule. The ESAC library can be used in single pharmacophore

format (a), in affinity maturations of known binders (b), or in de novo selections of binding molecules

by self assembling of sublibraries in DNA-double strand format (c) as well as in DNA-triplexes (d).

The ESAC library in the selected format is used in a selection and read-out procedure (e). Following

incubation of the library (i) with the target protein of choice (ii) and washing of unbound molecules

(iii), the oligonucleotide codes of the binding compounds are PCR-amplified and compared with the

library without selection on oligonucleotide micro-arrays (iv, v). Identified binders/binding pairs are

validated after conjugation (if appropriate) to suitable scaffolds (vi).

31

A third strand can be added introducing Hoogsteen base pairing46. Hoogsteen and

reversed-Hoogsteen48,49 base pairing mediate the interaction of a third cognate

oligonucleotide with a Watson-Crick DNA double helix. Using a triplex DNA format,

three 103 member sublibraries could yield a 109 member library (Figure 2-9). Each

sub-library member would consist of an oligonucleotide containing a variable, coding

region flanked by a constant DNA sequence, carrying a suitable chemical

modification at the oligonucleotide extremity (Figure 2-9). This approach has been

termed ESAC (for Encoded Self-Assembling Chemical libraries). In contrast to the

library formats described in the previous section (see Chapter 2.1.1), in which only

one oligonucleotide in the DNA-heteroduplex would carry a chemical group, the

ESAC method enables multiple (i.e. single-, double-, triple-) oligonucleotides

displaying different pharmacophores. Moreover each sub-library member can be

individually produced and purified by HPLC in nanomolar quantities, thus enabling

reliable analytics and quality controls. These sublibraries can be used in at least four

different embodiments. In a first example, a sub-library can be paired with a

complementary oligonucleotide and used as a DNA encoded library displaying a

single covalently linked compound for affinity-based selection experiments (Figure

2-9a). Alternatively, a sub-library can be paired with an oligonucleotide displaying a

known binder to the target, thus enabling affinity maturation strategies (Figure 2-9b).

In a third embodiment, two individual sublibraries can be assembled combinatorially

and used for the de novo identification of bindentate binding molecules (Figure 2-9c).

Finally, three different sublibraries can be assembled to form a combinatorial triplex

library (Figure 2-9d). The multiple pharmacophore display approaches may lead to

high binding affinities, by virtue of a simultaneous engagement of adjacent binding

sites, thus exploiting the chelate effect in analogy to fragment-based drug discovery.50

The conjugation of two pharmacophores to the two strands of a DNA double helix

introduces a spacing of roughly 10-15 Ǻ, with some flexibility between the binding

moieties and the core DNA structure. Preferential binders isolated from an affinity-

based selection can be PCR-amplified and decoded on complementary

oligonucleotide microarrays51,52 (Figure 2-9e) or by concatenation of the codes,

subcloning and sequencing53. The individual building blocks can eventually be

conjugated using suitable linkers to yield a drug-like high-affinity compound. The

characteristics of the linker (e.g. length, flexibility, geometry, chemical nature and

32

solubility) influence the binding affinity and the chemical properties of the resulting

binder.

A first 138-member ESAC library (termed ‘elib1’ library) which consisted of

carboxylic acids covalently linked to 5′ amino-modified 48-mer oligonucleotides and

contained a biotin-oligonucleotide conjugate as positive control. The library was

hybridized with an oligonucleotide conjugated to a cyanine dye (irrelevant for the

binding) and subjected to affinity-based selection on streptavidin. A significant

enhancement of the biotin-oligonucleotide conjugate signal was observed after

selection and microarray-based decoding.47

In a second proof of principle, the 137-member ESAC library was employed in

affinity maturation experiments. A dansylamide and a benzoyl sulfonamide

conjugated at the 3’ extremity of an oligonucleotide were used as lead binders to

human serum albumin (HSA) and bovine carbonic anhydrase II (CAII) respectively.

The oligonucleotide derivatives were hybridized with the 137-member library and

subjected to selection using immobilized HSA and CAII. Following microarray-based

decoding, the enriched binding molecules were linked to the lead-binder with a set of

bifunctional linkers of different length and the affinities of the respective conjugates

towards the target protein were determined. The simultaneous engagement of the

lead-binder and the selected compound led to a 10–40-fold increase in affinity.47

Encouraged by the results, ‘Elib1’ ESAC library was extended from 137-compounds

to over 600 compound members and termed ‘elib2’ library. Thereby, a further series

of bio-panning experiments on streptavidin and HSA were performed, leading to the

identification after micro-array based read-out of novel target specific binding

molecules ranking dissociation constant from the mM to the fM range.54,55 Notably

the screening of the ‘Elib2’ ESAC library towards HSA allowed the isolation of the

4-(p-iodophenyl)butanoic moiety. The compound discovered by our group represents

the core structure of a series of portable albumin binding molecules and of

Albufluor™, a recently developed fluorescein angiographic contrast agent currently

under clinical evaluation.55

33

Recently, ESAC technology has been used by our group for the isolation of potent

inhibitors of bovine trypsin56 and for the identification of novel inhibitors of

stromelysin-1 (MMP-3)57, a matrix metalloproteinase involved in both physiological

and pathological tissue remodeling processes. Benzamidine, a trypsin inhibitor with

an IC50 value in the 100 μΜ range, was used aslead in an ESAC-based affinity

maturation procedure. 5-(4-carbamidoylbenzylamino)-5-oxopentanoic acid was

conjugated at the 3’-end of an amino-modified oligonucleotide and hybridized with a

620-member ESAC sublibrary. After selection using immobilized trypsin and

microarray-based decoding, a number of bidentate binders were identified and

synthesized, allowing for different linkers connecting the benzamidine moiety to the

other pharmacophore identified in the ESAC procedure. The most active inhibitor

exhibited an IC50 value of 98 nM, but various bidentate ligands also revealed a

dramatically improved affinity, compared to a set of parental benzamidine derivatives,

whose IC50 values were in the 11-220 μM range. Similarly for the identification of

novel inhibitors of stromelysin-1 matrix metalloproteinase (MMP-3) an ESAC library

of 550 DNA-encoded chemical compounds was used. After selection on immobilized

MMP-3 and microarray-based decoding, the best candidate was conjugated to the

amino-modified 3′-extremity of a 24-mer oligonucleotide capable of pairing with the

initial 550 member ESAC sublibrary and used as lead for affinity maturation

selections. After a second round of selection enrichment of one synergistic binding

moiety was identified. The newly discovered pharmacophores were used for the

synthesis of low-molecular weight bidentate MMP-3 inhibitors with a series of

diamino linkers. The bidentate binder was superior compared to DNA conjugates

displaying the individual pharmacophores or no pharmacophore at all. After

measuring the corresponding inhibition constants to MMP-3, the best binder exhibited

an IC50 value of 9.9 µM.

In most cases, the spatial arrangement and the flexibility associated to the linker used

to conjugate the two pharmacophores identified after ESAC-library selection,

dramatically influence the binding affinity of the corresponding bidentate ligand. The

identification of optimal linkers may sometimes be a tedious procedure. Furthermore

the decoding of ESAC library in a multiple DNA-stranded format comprising over

104 compounds as for the de novo identification of binding molecules (Figure 2-9c, 2-

9d) cannot be efficiently achieved by a microarray-based approach due to suboptimal

34

read-out quality and to physical spotting limitation. In principle, high-throughput

sequencing techniques could be considered for the decoding of selections performed

with ESAC libraries (see Chapter 2.2.2).58

2.2 The decoding of DNA-encoded chemical libraries

The identification of specific binding compounds from DNA-encoded chemical

libraries requires the use of affinity-based selection strategies and of suitable decoding

techniques. Generally, selections are performed by capture of binding compounds on

a target protein, immobilized on a solid support. The stringency of both capture and

washing steps crucially influences the outcome of affinity selections.19,20,29,41,47 The

decoding strategy also greatly contributes to the successful use of DNA-encoded

chemical libraries. So far, most groups active in DNA-encoded libraries research

often used rudimentary techniques, mainly aiming at demonstrating the feasibility of

the DNA-encoded strategy principle, rather than exhaustively analyzing the decoding

aspect of the selection.19,20,29,41 Although many authors implicitly envisaged a

traditional Sanger-sequencing-based decoding (for an overview on Sanger sequencing

see Ref 65), the number of codes to sequence simply according to the complexity of

the library is definitely an unrealistic task for a traditional Sanger-sequencing

approach. If one assumes a library complexity of 106 and an enrichment factor of 100

for good binders versus non-binders in a round of selection then, statistically, 105

sequences are required to identify preferential binding compounds with suitable

confidence. Furthermore the number of sequences to be read is destined to grow up

together with the increase of library size. Nevertheless a first implementation of

Sanger-sequencing for decoding DNA-encoded chemical libraries in high-throughput

fashion was described by our laboratory.47 After selection and PCR amplification of

the DNA-tags of the library compounds, concatamers containing multiple coding

sequences were generated and ligated into an EcoRI-digested pUC19 vector.

Following sequencing of a representative number of the resulting colonies revealed

the frequencies of the codes present in the ESAC DNA sample before and after

selection. Besides the Sanger-sequencing-based decoding, our group investigated

microarray-based47 methodology and very recently implemented the novel robust

high-throughput sequencing techniques for efficiently decoding DNA-encoded

libraries42.

35

2.2.1 Microarray-based decoding A DNA microarray is a device for high-throughput investigations widely used in

molecular biology and in medicine.59 It consists of an arrayed series of microscopic

spots (‘features’ or ‘locations’) containing few picomoles of oligonucleotides carrying

a specific DNA sequence (Figure 2-10). This can be a short section of a gene or other

DNA element that are used as probes to hybridize a DNA or RNA sample under

suitable conditions. Probe-target hybridization is usually detected and quantified by

fluorescence-based detection of fluorophore-labeled targets to determine relative

abundance of the target nucleic acid sequences. In standard microarrays, the probes

are attached to a solid surface by a covalent bond to a chemical matrix (via epoxy-

silane, amino-silane, lysine, polyacrylamide or others). The solid surface can be glass

or a silicon chip, in which case they are commonly known as gene chip (Affy-chip

when an Affymetrix chip is used, Figure 2-10). Other microarray platforms,

(Illumina), use microscopic beads, instead of the large solid support.

„Feature“

Millions of DNA-probe strandsbuilt up on each „feature“

Probe oligonucleotide

Current size of last generation of GeneChip®

1.28 cm

1.28 cm

Figure 2-10: Schematic representation of an Affimetrix micro-array chip. Microscopic spots

(‘features’ or ‘locations’) on the solid support contain several millions of single stranded DNA-probes

immobilized. After hybridization to the chip of fluorescent labelled DNA or RNA sample, detection

and quantification are carried out by fluorescence-based analysis. (Adapted from

http://www.affymetrix.com)

36

Microarray technology was originally derived from the Southern blotting60 technique,

in which DNA fragments are probed with a labelled oligonucleotide complementary

to the DNA segment. The use of a library of distinct DNAs in arrays format for

expression profiling was first described in 1987, and the arrayed DNAs were used to

identify genes whose expression is modulated by interferon.61 These first gene arrays

were prepared by spotting cDNAs onto filter paper with a pin-spotting device.

Conversely, the use of miniaturized microarrays was first reported in 1995,62 and a

complete eukaryotic genome (Saccharomyces cerevisiae) on a microarray was

published in 199763.

So far, DNA microarrays have found many applications in a variety of technologies

(gene expression profiling, SNP detection, comparative genomic hybridization,

alternative splicing determination) and have dramatically accelerated many types of

investigations.59 Over the last few years, our laboratory used DNA microarray for the

decoding of DNA-encoded chemical libraries.47 In this setting 19-mer, 5' amino-

tagged oligonucleotides each containing a specific sequence representing the code of

the individual chemical compounds in the library, are spotted in quintuplicate onto

25x75 mm polyethylene glycol−coated and epoxy-activated microarray slides, using a

BioChip Arrayer robot and incubated in a humid chamber overnight at 25 °C.

Subsequently, the oligonucleotide tags of the binding compounds isolated from the

affinity-based selection are PCR amplified using a fluorescent primer and hybridized

onto the DNA-microarray slide. Afterwards, microarrays are analyzed using a laser

scan-array and spot intensities detected and quantified. The enrichment of the

preferential binding compounds is revealed comparing the spots intensity of the

DNA-microarray slide before and after selection.

Although DNA microarrays have provided a powerful approach to decode DNA-

encoded chemical libraries and to rapidly interrogate biological systems at a genomic

level, several limitations restrict the margins of its application. Even for the last

generation of high-density microarray chip (up to 7x106 features), the spotting and

hybridization of DNA-encoded libraries is quite demanding. Additionally, the

fundamental reliance of microarrays on nucleic-acid hybridization results in a “low-

fidelity” hybridization analysis of highly related sequences because of cross-

hybridization. This problematic is crucial in the decoding of DNA-encoded chemical

37

libraries. Since the differences between distinct compounds could be very small at the

level of the oligonucleotide tags, cross-hybridization may yield to false positive

identification. Additionally it is difficult to confidently detect and quantify low-

abundance species by DNA-microarray-based decoding even if the enrichment after

selection is substantial. Moreover, microarray decoding is currently challenging

regarding the reproducibility of results and is very dependent on specific platforms.

For instance, the “analog” quantification rather than “digital” limits the dynamic

range and the sample comparison. Last but not least, from the economical point of

view, the technology is costly (DNA probes and robotic equipment). However since

2004, massively parallel DNA sequencing technologies have became available,

offering dramatically lower per-base costs64 and promising to overcome the

limitations of microarrays. Millions of independently derived sequencing tags can

nowadays be simultaneously investigated in a single experiment at a cost below 1000

Sfr.

38

2.2.2 Decoding by high throughput sequencing According to the complexity of the DNA encoded chemical library (typically between

103 and 106 members), a conventional Sanger-sequencing based decoding is unlikely

to be usable in practice, due both to the high cost per base for the sequencing65 and to

the tedious procedure involved65. However nearly three decades have passed since the

invention of electrophoretic methods for DNA sequencing and various novel

sequencing technologies have recently been developed, each aiming to reduce costs to

the point at which the genomes of individual humans could be sequenced as part of

routine health care. Large-scale sequencing projects, including whole-genome

sequencing, have usually involved the Sanger sequencing method65 using fluorescent

chain-terminating nucleotide analogues66 and either slab gel or capillary

electrophoresis. Recent estimates of cost for human genome sequencing with standard

sequencing technologies are between $10 million and $25 million. Alternative

sequencing methods have been described67,68,69,70,71; nonetheless all these strategies

were essentially based on bacterial vectors and Sanger sequencing as the main final

generators of sequence information and consequently failed to develop new ultra-low-

cost massive sequencing techniques. Very recently new methods exploited strategies

that parallelize the sequencing process displacing the use of capillary electrophoresis

and producing thousands or millions of sequences at once.

Since the detection methods are often not sensitive enough for sequencing a single

molecule of DNA, the majority of the novel strategies use an in vitro amplification

step. Typically, it is possible to isolate individual DNA molecules along with primer-

coated beads in aqueous bubbles within an oil phase by emulsion PCR. A polymerase

chain reaction (PCR) then coats each bead with several clonal copies (called

“polony”) of the isolated library DNA molecule.72 This strategy is employed in the

methods commercialized by 454 Life Sciences, acquired by Roche, in the "polony

sequencing"73 and SOLiD sequencing (developed by Agencourt and acquired by

Applied Biosystems)74. Each bead is subsequently immobilized on a support for the

subsequent sequencing step. An alternative method for in vitro PCR amplification is

the "bridge-PCR", where fragments are amplified on primers anchored to a solid

surface. This system is developed and used by Solexa (now purchased by Illumina).75

Both approaches produce many physically isolated locations, each containing several

39

copies of a single DNA fragment. In 2006, Stephen Quake's laboratory (later

commercialized by Helicos) described the first second generation method for ultra

high throughput sequencing based on a single-molecule sequencing, skipping the

amplification step and directly fixing DNA molecules to a surface.76

Once every single sequence of DNA is physically localized to separate positions on a

support, various sequencing strategies may be applied to parallel determine the DNA

sequences. The "sequencing by synthesis", in full analogy with the dye-termination

electrophoretic sequencing used in the Sanger-method, employs the process of DNA

synthesis by DNA polymerase to identify the bases present in the complementary

DNA molecule.72 Pyrosequencing (used in “454” technology) also uses DNA

polymerization to add nucleotides, then detecting and quantifying the number of

nucleotides added to a given location through the light emitted by the release of

attached pyrophosphates.72,77 Alternatively “reversible terminator methods” (used by

Illumina and Helicos) are used.75,76 The nucleotides are added one at a time, then the

fluorescence corresponding to that position is detected, and the polymerization of

another nucleotide is enabled following removing of a blocking group. "Sequencing

by ligation" is another enzymatic method of sequencing, pioneered by the laboratory

of G.M. Church and employed in the “polony sequencing” and in the SOLiD

technology offered by Applied Biosystems. By means of a DNA ligase enzyme rather

than a polymerase and a pool of all possible oligonucleotide sequences of a fixed

length, labeled according to the sequenced position, oligonucleotides are annealed and

ligated.73,74,78 The corresponding ligation for matching sequences results in a signal

related to the complementary sequence at that position.

In this light, advances in high-throughput DNA sequencing technologies are likely to

revolutionize the strategies for the accurate decoding of DNA-encoded chemical

libraries of unprecedented size.

40

2.2.2.1 “454” technology

The “454” technology of Genome Sequencer FLX System (GS FLX), was developed

by 454 Life Sciences and has recently (2005) been acquired by Roche. The GS FLX is

a next generation DNA high throughput sequencing system featuring long reads, high

accuracy, and ultra-high throughput application.72,79 Currently GS FLX is one of the

most versatile high-throughput sequencing platforms available, supporting high

profile studies in a wide range of categories.72,79

Figure 2-11 schematically depicts the workflow of the “454” technology. Initially,

large DNA samples, such as genomic material, are fragmented in smaller fragments

(between 300 and 800 basepairs) by nebulisation. The DNA sample is then

denaturated to single stranded DNA (sstDNA). Subsequently specific short adaptors

(called A and B) are added to each fragment using standard molecular biology

techniques. An excess of sepharose beads carrying oligonucleotides complementary to

e.g. the A-adaptor sequence of the library fragments is added to the DNA library

previously generated in order to ensure that each of these beads hybridize to a unique

single-stranded DNA sequence. The bead-bound library is emulsified with the

amplification reagents in a water-in-oil mixture. Following an emulsion PCR is

performed yielding in several on-beads immobilized clonally copies of a specific

DNA fragment (ca. 10 million identical DNA molecules per bead). Afterwards, the

emulsion PCR is broken while the amplified fragments remain bound to their specific

beads. The clonally amplified on-bead fragments are enriched and loaded onto a

“PicoTiterPlate” device for sequencing (70x75 mm, containg 1.6 million wells), in

which the diameter of the single wells (44μm) allows for only one bead (round 30μm)

per well. After addition of a DNA bead incubation mix (containing DNA polymerase,

sulfurylase and luciferase), the fluidics subsystem of the Genome Sequencer FLX

instrument flows individual nucleotides in a fixed order across the wells containing

one bead each. Addition of one (or more) nucleotide(s) complementary to the

template strand yields in a chemiluminescent signal recorded by the CCD camera.

41

a b

de

sstDNA annealed to an excess

Capture Beads

emulsify beads and PCR reagents

Monoclonalamplification

Emulsification and em-PCR

break emulsion

sstDNA library

sequencing by synthesis: chemiluminescent signals upon nucleotide incorporation

deposit beads into wells1 well = 1 bead = 1 clonal amplification

add enzymes

Partitioning : one bead per well

SIGNAL

pyrophosphate release

Amplicon

Sequences

c

f

Figure 2-11: Workflow enabling “454” technology high-throughput sequencing technology. Adaptors

(A and B) - specific for both the 3' and 5' ends - are added to each sample fragment. The adaptors are

used for purification, amplification, and sequencing steps. Single-stranded fragments with A and B

adaptors compose the sample library used for subsequent workflow steps (a). The single-stranded DNA

library is immobilized onto specifically designed DNA Capture Beads. Each bead carries a unique

single-stranded DNA library fragment. The bead-bound library is emulsified with amplification

reagents in a water-in-oil mixture resulting in microreactors containing just one bead with one unique

sample-library fragment (b). The emulsion PCR (em-PCR) is performed and each fragment results in a

copy number of several million per bead. Subsequently, the emulsion PCR is broken while the

amplified fragments remain bound to their specific beads (c). The enriched beads are loaded onto a

PicoTiterPlate device for sequencing. The diameter of wells allows for only one bead per well (d).

After addition of sequencing enzymes, nucleotides are flowed in a fixed order across the wells

containing one bead each. Addition of one (or more) nucleotide(s) complementary to the template

strand results in a chemiluminescent signal recorded by the CCD camera (e). The combination of signal

intensity and positional information allows the software to determine the sequence (f). (Adapted from

http://www.454.com)

The nucleotide flow described above enables parallel sequencing of hundreds of

thousands of beads each carrying millions of copies of a unique single stranded DNA

molecule. Typically 400,000 individual reads per 7.5-hour instrument run

simultaneously. For sequencing-data analysis, different bioinformatics tools are

available supporting the various applications including de novo assembly;

42

resequencing and amplicon variant detection by comparison with a known reference

sequence. Currently the 454 Genome Sequencer FLX instrument ensures read

accuracies of >99.5% over the first 250 bases and 200 Mb of sequence information

per day.72,79

In this Thesis we describe a novel convenient implementation of “454” high-

throughput sequencing technology for the decoding of DNA encoded chemical

library.

2.2.2.2 Solexa technology

Solexa sequencing technology, acquired by Illumina in 2007, is based on massively

parallel sequencing employing reversible terminator-based sequencing chemistry.75

Figure 2-12 schematically describes the Solexa technology process. Similarly to the

“454” technology (see Chapter 2.2.2.1), after fragmentation of the double stranded

DNA genomic material, adapters are ligated to both the extremities. Subsequently the

randomly fragmented genomic DNA is denatured to single strand DNA (sstDNA) and

hybridized to the complementary adapter sequences attached on a planar, optically

transparent surface. Following addition of unlabelled nucleotides and DNA

polymerase, the attached adapters are extended and “bridge”-amplified, resulting in an

ultra-high density sequencing flow cell with ≥50 million clusters, each containing

~1,000 copies of the same template. These templates are sequenced using a four-color

DNA “sequencing-by-synthesis” technology that employs reversible terminators with

removable fluorescent dyes. The four fluorescent dye-nucleotides are added

simultaneously at the beginning of every chemistry cycle. Therefore, after wash of the

unincorporated reagents and laser excitation, the fluorescence emission from each

cluster on the flow cell is recorded and the corresponding base called. Afterward the

fluorophore-dyes at 3’ terminus are removed and the next chemistry cycle is initiated.

Repeating a number of times the sequencing cycles, the entire template sequence of

each cluster-fragment is determined. Furthermore, after completion of the first

sequence read, the templates can be regenerated in situ enabling a second read from

the opposite end of the fragments.75

43

AC

GT

AG

G

C

CT

AG

G

C

CT

A

G G

CC

A

G

CT

AA C

TA...GCG...AGG...CGC...TAC...ACA...C

. . .1st cycle 2nd cycle n cycle

a b c

d

e

f

g

hSequences

Adapter

DNAsample

Attached terminus

Free terminus

AdapterLigation

Attach DNA to surface

BridgePCR

Denaturation

Bridge PCRcycles

Clusters

Sequencing-by-synthesis

cycles

Laserimaging

Sequences determination

Figure 2-12: Schematic description of Solexa sequencing workflow. Initially adapters are ligated to the

DNA samples (a) and hybridized to the complementary adapter sequences on the slide support (b).

Following addition of nucleotides and DNA polymerase, “bridge”-PCR is performed, resulting in an

ultra-high density sequencing flow cell with ≥50 million clusters, each containing ~1,000 copies of the

same template (c, d, e). “Sequencing-by-synthesis” technology employs reversible terminators with

removable fluorescent dyes. After inclusion of the fluorescent dye-nucleotides and wash of the

unincorporated reagents, laser image capture the emitted fluorescence from each cluster, then

fluorophore-dyes at 3’ terminus are removed and the next chemistry cycle is initiated (f, g). Repeating

the sequencing cycles, the sequence of each cluster-fragment is determined (h).

Currently the range of applications of the Solexa technology includes gene

expression, small RNA discovery, and protein-nucleic acid interactions. So far the

main limitation of Solexa system especially for the decoding of DNA-encoded

chemical libraries implementations is represented by the short maximum read length

currently up to 50 basepairs (standard 36 basepairs) for each DNA fragment, that can

be extended to 100 basepairs (averagely 72 basepairs) in the case of the “double

reading” from both the adaptor ends. On the other hand, the Solexa system allows the

generation of up to 600 Mb/day of sequence information, three times more compared

to “454” Genome Sequencer FLX instrumentation with comparable accuracy

(>98.5%).75

44

2.2.2.3 SOLiD techonlogy

SOLiD (Sequencing by Oligonucleotides Ligation and Detection) technology was

firstly described by the group of G.M. Church in 200573 and has recently been

purchased by Applied Biosystem. The methodology is base on sequential ligation

with dye-labeled ologonucleotides.73 Moreover the ultra high throughput capability

and the unequalled accuracy features of the SOLiD system, together with the broad

range of possible applications, provide the vanguard of the next generation high

throughput sequencing technologies.

In full analogy to “454” technology (see Chapter 2.2.2.1), after preparation of a

suitable DNA fragment library containing specific adapters at the extremities, SOLiD

methodology employs emulsion PCR (em-PCR) to generate a clonal bead populations

(Figure 2-13a). Following em-PCR the templates are denatured and the beads with

the extended template are enriched from the undesired beads. A suitable 3’-end

modification allows the selected beads to be covalently attached to the sequencing

glass slide (Figure 2-12a). Thereafter, the sequencing process is started. Typically the

probe library set enabling the sequences determination contains 1024 different 8mer

single strand 5’-fluorescent DNA synthetic oligonucleotides (Figure 2-13b). Each

probe comprises a full randomized sequence of five bases, a cleavage site for

removing the 5’-fluorescent dye and an additional three bases constant domain as

depicted in Figure 2-13b. Importantly, only four different dyes are used for labelling

the entire probe library set (1’024 probes, 256 probes per dye). Thereby each of the

four dyes does not call for a single base, whereas it represents one of the four possible

di-base combinations of position 4 and 5 of the corresponding probe (4 colours coding

16 di-base possible combinations, Figure 2-13b).

45

z zn n T A

Cleavage site Fluorescent dyeLigation site

3‘ 5‘

n = degenerate bases z = Universal bases1,024 Octamer-Probes (45)

znA C G T

A

C

G

T

4 Dyes, 4 di-nucleotides per dyes, 1,024 Probes / 4 Dyes = 256 probes per dye

1st Base

2nd Base

sstDNAsample

Bead Hybridization Em-PCR and emulsion break

Random covalent bead deposition on glass slide

5‘

3‘

a)

b)

5,4

I III IVIIIII

Adapterligation

Figure 2-13: Sample preparation for SOLiD sequencing and schematic representation of the probes

system enabling the sequence identification. a) Single stranded DNA sample fragments (sstDNA) are

ligated to specific adapters to the 5’ and 3’ terminus (i). Following hybridization to capture beads

carrying the corresponding complementary adapter sequence (ii), emulsion PCR with suitable primers

is performed (iii). Lastly, the emulsion is broken and the amplified beads are covalently attached to the

sequencing glass slide by the 3’-end (iv). b) Each probe of the probe library comprises from the 3’-end

a random sequence of five bases, a cleavage site and an additional constant domain of three bases. Four

different dyes are used for labelling the entire probe library set (1024 probes, 256 probes per dye). Each

dye represents one of the four possible di-base combinations of position 4 and 5 of the corresponding

probe.

The sequencing process starts hybridizing an n-base long universal sequencing primer

to the adapter attached to the bead. Subsequently a set of four 5’-fluorescent

removable di-base probes of fixed length together with DNA ligase are flowed on the

slide, competing for ligation to the sequencing primer (Figure 2-14). Therefore after

laser excitation, the fluorescence emission from each cluster on the flow cell reveals

the nature of the di-base probe ligated. Following cleavage of the fluorescent dye by

restriction of the probe at a specific position, the ligation process is repeated (Figure

2-14). Consequently, after every cycle a precise “di-base position” of each template

fragment is interrogated. Following a series of ligation cycles the extension product is

removed and the template is reset with a primer complementary to the n-1 position for

a full second round of ligation cycles (Figure 2-14). After multiple cycles of reset

(typically five) and ligation every base of the template sequence results to be “double

46

interrogated” by different probes (Figure 2-14). Therefore starting from a known base

(e.g. the last base of the initial adapter) it is possible to univocally translate the entire

colour-sequence into the corresponding base-sequence (Figure 2-14).

G T

3,4

A T

8,9

T C

4,5

13

C

A C G T C G C A T T C A C

4,5

Bead

Bead

Bead

Universal primer n

4,5

T CBead

Universal primer n

Bead

Universal primer n-1

T C4,5

T T

9,10Bead

Universal primer n

Universal primer n-1

Bead

Universal primer n

1st Ligation Cycle: Universal nprimer hybridization 1st Ligation Cycle and 1nd di-base calling

Fluorescent dye cleavage

9,10

1nd Ligation Cycle, 2nd di-base calling

1st Ligation Cycle base calling complete

2nd Ligation Cycle: Universal n-1primer hybridization

2nd Ligation Cycle base calling

After n Ligation Cycle: ‚color-sequence‘

Corresponding base-sequence starting from a known base

n-1

n-2

n-3

n-4

3,4 8,9 13

4,5 9,10

2,3 7,8 12,13

1,2 6,7 11,12

0,1 5,6 10,11

n 1st Cycle

2nd Cycle

4th Cycle

5th Cycle

3th Cycle

Multiple cycles di-base calling(each base is “double interrogated” by two different probes)

aBead

3‘5‘

b c

d e f

Repeatingligation cycle and base calling

g h

h i

j

Figure 2-14: Sequencing by oligonucleotides ligation workflow. An n long universal sequencing

primer is hybridized to the adapter attached to the bead (a). Subsequently the probe library 5’-

fluorescent labelled together with DNA ligase are flowed on the slide, competing for ligation to the

sequencing primer (b). The fluorescence emission from each cluster (bead) on the flow cell reveals the

nature of the di-base probe ligated (b). After cleaving of the fluorescent dye the ligation process is

repeated until the terminal adapter (in green) is reached (c, d). Hence, the first cycle of ligation and “di-

base” calling is completed and the system reset (e). Following hybridization with an n-1 long universal

sequencing primer, a second cycle of ligation is started and a second round of “di-base” calling

accomplished (f, g). Repeating a number of ligation cycles and “di-base” calling (typically 5), each

base of the template sequence results to be “double interrogated” by different probes and a ‘colour’-

sequence can be generated (h, i). Starting from a known base (e.g. the last base of the initial adapter)

the entire ‘colour’-sequence is converted into the corresponding base-sequence (j).

Although the SOLiD double base interrogation might appear more cumbersome, it

facilitates the discrimination between system errors and true polymorphism. In

essence, a true single nucleotide polymorphism (SNP) results in a consecutive double

colour change between the colour-sequence of the reference-template and the

observed, while sequencing errors unambiguously result in single colour change

47

(Figure 2-15). The double base interrogation enables ultra high base calling accuracy

(>99.94%). Additionally, the SOLiD system is able to generate 600 Mb of sequence

information per day and up to 6 Gb in a single experiment.74


A C G T C G G A T T C A C


A C G T C G G T A A G T G

Expected

Observed

SNP(two color change)

Sequencing error(single color change)

Figure 2-15: SOLiD discrimination between true polymorphism (SNP) and system sequencing errors.

True single nucleotide polymorphism (SNP) results in a consecutive double colour change between the

reference-template and the observed ‘colour’-sequence (left panel), while sequencing errors

unambiguously result in single colour change (right panel).

As for the Solexa system (see Chapter 2.2.2.2), the main drawback of the SOLiD

technology, particularly for decoding of DNA-encoded chemical libraries, is

represented by the narrow maximum read length currently fixed to 35 basepairs for

standard applications. However the double base interrogation feature of the SOLiD

approach is undoubtedly very attractive for high fidelity decoding of large DNA-

encoded libraries, where the mismatch on a single base calling might be crucial for

the proper identification of the binding structures.

48

2.2.2.4 Single Molecule DNA Sequencing – Helicos technology

An alternative ambitious solution to address the issues of costs, speed and sensitivity

of the conventional sequencing technologies and the exponentially increasing demand

of DNA and RNA sequence information was very recently presented by Stephen

Quake's laboratory describing the use of DNA polymerase and fluorescence

microscopy to obtain sequence information from single DNA molecules.76

Furthermore, single DNA molecule sensitivity might permit direct sequencing of

mRNA from rare cell populations or perhaps even individual cells.

The technology has been commercialized in 2006 as Helicos True Single Molecule

Sequencing (tSMS). Initially the DNA samples are restricted in fragments comprising

up to 55 basepairs. Subsequently the DNA library fragment is denatured, ligated to an

adaptor sequence at the 3’-terminus and captured on the flow-cell by hybridization to

the complementary adapter sequences attached on the surface (Figure 2-16).

According to a sequencing-by-synthesis approach, reversible fluorescently labeled

nucleotides are sequentially added to the nucleic acid templates (Figure 2-16). The

polymerase catalyzes the sequence-specific incorporation of fluorescent nucleotides

into nascent complementary strands on all the templates. After a washing step, which

removes all non-reacted nucleotides, the incorporated nucleotides are imaged and

their positions recorded (Figure 2-16). Angstrom spatial resolution is not necessary

since the hybridized templates distance is sufficiently high (0.1 micrometer range) and

the nucleotides are inserted sequentially; only the time resolution to discriminate

successive incorporations is required. Following removal of the fluorescent group the

process continues through the flowing of each of the other three bases (Figure 2-16).

Therefore multiple four-base cycles result in the parallel determination of billions of

template sequences and the generation of up to 900Mb/day sequence information

(Figure 2-16). 80 Unlike amplification-based sequencing technologies, in tSMS every

strand is unique and sequenced independently. As a result, the tSMS process is not

subject to “dephasing” errors that occur when amplified DNA clusters fall out of

step.80,81

49

Hybridization of DNA to surface

a

DNAsample Denaturation

3‘-endAdapter Ligation

b

AA

A

CC

C

GG

G

TT

T

3‘

5‘

Sequencing-by-synthesisFlow A

1. Capture image2. Cleavage 3. Flow T

1. Capture image2. Cleavage3. Flow G

1. Capture image2. Cleavage3. Flow C

Sequences

Sequencing-by-synthesisnext cycle

Captureimage

d

ef

g h

c

Figure 2-16: True Single Molecule Sequencing (tSMS) workflow. DNA library fragment is

denaturated, ligated to an adaptor sequence at the 3’-terminus and hybridized on the flow-cell (a, b, c).

Sequencing-by-synthesis is initiated adding sequentially reversible fluorescently labelled nucleotides.

The polymerase catalyzes the sequence-specific incorporation into the template strands of the specific

fluorescent nucleotides. After a washing step and removal of the fluorescent group, the incorporated

nucleotides are imaged, their positions recorded and the next fluorescent nucleotide flowed (d, f, g).

Multiple sequencing-by-synthesis cycles result in the parallel determination of the template sequences

(h).

Although the Helicos methodology is very promising and displays an accuracy over

99%80, research applications are currently not reported in literature. In the view of a

DNA-encoded chemical library implementation, the read length space is at present

very limited (55 basepairs). However, in the future, technology improvements may

permit the use of a True Single Molecule Sequencing in chemical library decoding.

50

3. RESULTS

3.1 DNA-Encoded Library “DEL4000” DNA-encoding facilitates the construction and screening of large chemical libraries.

Here, we describe general strategies for the stepwise coupling of coding DNA

fragments to nascent organic molecules throughout individual reaction steps. The

methodology was exemplified in the construction of a DNA-encoded chemical library

containing 4’000 compounds named “DEL4000” (DNA Encoded Library 4000). The

synthesis of the library was achieved using a split-and-pool procedure, which featured

the following sequential steps: (i) conjugation of different N-Fmoc-amino acids to

distinct amino-modified synthetic oligonucleotides; (ii) deprotection of the amino

moiety (iii) pool and split; (iv) amide bond formation reaction with selected

carboxylic acid; (v) encoding of the carboxylic acid used in the previous step by

hybridization of partially complementary oligonucleotides followed by Klenow-

mediated DNA polymerization, yielding the final compounds in a double-stranded

DNA format. Moreover the purity of the intermediate steps was extensively

investigated using HPLC and mass spectrometry.

51

3.1.1 Library design and synthesis

Figure 3-1 describes the strategy for the construction of a DNA-encoded chemical

library consisting of 20 x 200 modules (i.e., 4’000 compounds), joined together by the

formation of an amide bond.

. . . . . .

Pool 4000

HNFmoc

COOH

x20

1) Sulfo-NHSEDCDMSO30°C, 15min

2)

TEA/HCl pH = 1030 °C, o/n

3) Piperidine 500mM4°C, 1h

4) HPLC

(C12)NH2

NH2

(C12)5‘

3‘

x20

. . . . . . NH2

(1-20)

20NH2

NH2

5‘3‘

POOL

1) SPLIT 2002) Amide bond

formation

200 Carboxylic acids3) EtOH prec.

COOH

200

. . . . . .

20

.... . . . . .

Encoding(Annealing)

200. . . . . .

......

Encoding(Klenow)

200

. . . . . . . .

. . . . . . . .

. . . . . . . . 1) Ion-exchange

on cartridge2) POOL

HN O

Figure 3-1: Schematic representation of the strategy used for the synthesis and encoding of the

DEL4000 library. Initially, 20 different Fmoc-protected amino acids were coupled to unique

oligonucleotides derivatives, carrying a primary amino group at the 5’ extremity. After deprotection

and HPLC purification, these derivatives were pooled and coupled to 200 carboxylic acids in parallel

reactions. The identity of each carboxylic acid was encoded by means of a Klenow polymerization

step, using a set of partially complementary oligonucleotides. This procedure resulted in a 4000-

member library (DEL4000), in which each chemical compound was covalently attached to a double-

stranded DNA fragment, containing two coding domains which unambiguously identify the

compound’s structure (i.e., the two chemical moieties used for compound synthesis).

Initially, 20 Fmoc-protected amino acids (for the structures see Appendix 9.2) were

chemically coupled to 20 individual amino-tagged oligonucleotides. After

deprotection and HPLC purification, the 20 resulting DNA-encoded primary amines

were coupled to 200 carboxylic acids (for the structures see Appendix 9.2), generating

52

a library of 4’000 members. In order to ensure that each library member contained a

different DNA code, a split-and-pool strategy was chosen, which also minimizes the

number of oligonucleotides needed for library construction. As indicated in Figure 3-

1, the 20 primary amines covalently linked to individual single-stranded

oligonucleotides were mixed and aliquoted in 200 reaction vessels, prior to coupling

with the 200 different carboxylic acids (one per well). Following the reaction, the

oligonucleotides of each vessel were precipitated as sodium phosphate adducts, after

addition of an AcOH/AcONa solution (pH 4.7) and three volumes of ethanol. The

identities of the carboxylic acids used for the coupling reactions were encoded by

performing an annealing step with individual oligonucleotides, partially

complementary to the first oligonucleotide carrying the chemical modification. A

successive Klenow fill-in DNA-polymerization step yielded double stranded DNA

fragments, each of which contained two identification codes (one corresponding to the

initial 20 compounds and one corresponding to the 200 carboxylic acids, see Figure

3-1). The 200 reaction mixtures were then purified on an anion exchange cartridge

and pooled. Model reactions performed prior to library construction had shown that

the yields of the amide bond forming reaction ranged between 51% and 98% (see

Chapter 3.1.2, Table 3-1). The resulting DNA-encoded chemical library, containing

4’000 compounds, was aliquoted at a total DNA concentration of 300 nM and stored

frozen prior to further use.

53

3.1.2 Model Compounds A high quality library is crucial for reliable and reproducible selection experiments.

Unreacted oligonucleotide and side products may lead to erroneous decoding

interpretation and consequently incorrect binder identification. Therefore, since the

library quality relies essentially on the yield of the reactions used to produce each

compound member, model compounds of the library oligonucleotide conjugate were

synthesized in order to validate reaction conditions, yields and product recovery.

Three 42mer 5’-Fmoc-deprotected model amino acids oligonucleotide conjugates

carrying a primary amino group were individually coupled to four different carboxylic

acids using a solution of N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide (EDC)

and N-hydroxysulfosuccinimide, and finally buffering the pH by adding an aqueous

triethylamine hydrochloride, pH9.0. Following overnight stirring and quenching by

addition of Tris-Cl buffer, the reactions were analysed by HPLC and the masses of the

reacted oligonucleotides detected by LC-ESI-MS. Typical HPLC coupling yields and

recovery were assessed to range between 51% and 98% (Table 3-1, see also

Appendix 9.1).

Table 3-1: HPLC coupling yields and recovery assessed after peptide bond formation reaction

between three selected 5’-Fmoc-deprotected amino acids oligonucleotide conjugate and four different

model carboxylic acids (see also Appendix 9.1). *) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis

spectrophotometer) following HPLC purification (see Chapter 3.1.7).

N H 2

O

H N

D N A

OHN

H 2 N

D NA

O

N H

H 2 N

D NA

Structure

Yield % Recovery*) % Yield % Recovery*) % Yield % Recovery*) %

O H

NHO H

H N

HS

O 98 90 83 68 65 60

O

H O

O I 70 60 72 60 76 65

O

O H

N

>70 70 >64 64 >57 57

O

H O

Br

>52 52 >55 55 >51 51

54

3.1.3 Oligonucleotides In Figure 3-2 the two distinct sets of oligonucleotide used for the unambiguous

encoding of DEL4000 compounds are schematically depicted. The first set (Figure 3-

2a) consisted of 20 unique 42mer single-stranded DNA oligonucleotides, comprising

three domains: an 18 nucleotides primer region (including an EcoRI restriction site)

for PCR amplification at the 5’-terminus, a region of six bases serving as code (each

code differing from the others by at least three bases, see Appendix 9.2) and a

hybridization domain of 18 nucleotides at 3’-end. For the conjugation of the initial 20

Fmoc-protected amino acids, a NH2-(CH2)12-modification was added to 5’-terminal

phosphate group. The general sequence was 5’-NH2-(CH2)12PO4-GGA GCT TGT

GAA TTC TGG XXXXXX GGA CGT GTG TGA ATT GTC (a list with all the 20

codes used for library construction can be found in the Appendix 9.2). A second set of

oligonucleotides (Figure 3-2b) for the encoding of the further 200 carboxylic acids

used in the second step of the synthesis of the DEL4000 library (see Chapter 3.1.1)

consisted of 200 distinct 44mer single-stranded DNA oligonucleotides with a general

sequence: 5’-GTA GTC GGA TCC GAC CAC XXXXXXXX GAC AAT TCA CAC

ACG TCC-3’. The sequence contains an 18 nucleotides primer region (including a

BamHI restriction site) for PCR amplification at the 5’-terminus, a specific coding

region of eight nucleotides (each differing from the others by at least four bases, see

Appendix 9.2) and a hybridization domain of 18 bases always complementary to the

hybridization domain of the previous set of oligonucleotides (the list with all the 200

codes used is given in the Appendix 9.2).

55

GGA GCT TGT GAA TTC TGG XXX XXX GGA CGT GTG TGA ATT GTC

NH2 5‘ 3‘

18nt PCR primer domain 6nt code 18nt hybridization domainNH2-(CH2)12modification

EcorI restriction site

(42nt)X 20

GTA GTC GGA TCC GAC CAC XXXXXXXX GAC AAT TCA CAC ACG TCC5‘ 3‘

18nt PCR primer domain 8nt code 18nt complementaryhybridization domain

BamHI restriction site

(44nt)X 200

a)

b)

Figure 3-2: Schematic representation of the oligonucleotide sets employed for the encoding of

DEL4000 library. a) 20 unique 42mer single stranded DNA 5’-NH2-(CH2)12PO4- oligonucleotides. The

sequences contain three domains: an 18 nucleotides primer region (including an EcorI restriction site)

for PCR amplification, a coding region of six bases (each differing from the others by at least three

bases, see Appendix 9.2) and a hybridization domain of 18 nucleotides at 3’-end. The amino

modification serves as reactive group for the conjugation of the initial 20 Fmoc-protected amino acids.

b) Second set of 200 unique 44mer single stranded DNA oligonucleotides served as identification bar-

code for the 200 carboxylic acids used in the synthesis of DEL4000. The sequences contain from 5’-

terminus: an 18 nucleotides primer region (including a BamHI restriction site) for PCR amplification, a

coding region of eight nucleotides (each differing from the others by at least four bases, see Appendix

9.2) and a complementary hybridization domain of 18 bases.

3.1.4 Compounds Various considerations were taken into account for the selection of the 20 Fmoc-

protected amino acids and the 200 carboxylic acids to build the library. The

compounds had to be commercially available and suitable for conjugation to an amino

modified oligonucleotide forming a stable amide bond. The amide bond formation

reaction on amino-tagged oligonucleotides worked very well for the construction of

DNA-encoded ESAC libraries in our laboratory.47 We mainly utilized the amide bond

forming reaction for the conjugation of activated alkylic carboxylic acids to primary

amine moieties. The molecules selected were further restricted in size to be between

56

100 and 300 Dalton, (without removable protecting groups in the case of the Fmoc-

protected amino acids). We sought compounds with a range of functional groups, with

hydrophobic and hydrophilic properties. A complete list with all the structures of the

20 Fmoc-protected amino acids and the 200 carboxylic acids is given in the Appendix

9.2.

The protocol for the amide bond formation reaction was set up by testing several

compounds and analyzing them by HPLC and MS (see Chapter 3.1.2). A typical

reaction procedure is schematically depicted in Figure 3-3.

HNFmoc

COOH

18 mer 6 mer 18 mer

Code

42meroligonucleotide

5‘3‘

1. EDC 1 eq.S-NHS 4 eq.DMSO, rt, 30‘min

(C12)NH22.

pH=10 TEAA-HClrt, o/n

HNO

O

Piperidine

50 eq.4°C, 2h

Fmocremoving

Amide bond formation

5‘3‘

COOH

1. EDC 1 eq.S-NHS 4 eq.DMSO, rt, 30‘min

2.

pH=10 TEAA-HClrt, o/n

NH2

(C12)HNC

Code 5‘3‘ O

NHC

(C12)HNC

Code 5‘3‘ O

O

(C12)HNC

Code 5‘3‘ O

NH2

(C12)HNC

Code 5‘3‘ O

a)

b)

Figure 3-3: Reaction scheme of library synthesis. a) Coupling of Fmoc-amino acids to the initial 5’-

amino oligonucleotides and Fmoc removal. b) Amide bond formation reaction enabling the final

coupling with 200 different carboxylic acids. In the right panel is schematically depicted the structure

of the oligonucleotide.

3.1.5 HPLC Purification A challenging task for library construction was the separation of the conjugate

oligonucleotide from the unconjugate oligonucleotide precursor. Typically,

purifications after first step of library synthesis were performed by reversed phase

HPLC using an ion pairing reagent. In order to prevent the addition of contaminants, a

volatile buffer was employed and removed under vacuum after the chromatographic

step. The best purification profiles were obtained using a C18 column with increased

pH stability (Figure 3-4a). Dimethylbutylammonium acetate, (DMBAA, 100 mM,

57

pH = 7) was used for those oligonucleotides not sufficiently resolved by the TEAA

buffer.

In order to distinguish oligonucleotides and oligonucleotide conjugates from starting

compounds and side-products, absorption was monitored at 260 nm and 280 nm. The

oligonucleotide absorption ratio 260 nm: 280 nm is typically 1.8 : 1.

3.1.6 Mass Spectrometry Electrospray ionization mass spectrometry (ESI-MS) was employed for the

characterization of the reaction products after oligonucleotide conjugation in the first

step of library construction. Desalting of the oligonucleotide from sodium and

potassium adducts is crucial for the ESI-MS analysis. The multiple adducts of the

phosphate backbone of the oligonucleotide with sodium and potassium dramatically

decrease the sensitivity and complicate the interpretation of the spectra. To avoid

manual desalting (e.g. by Zip Tips), desalting was performed on-flow before each

mass spectrometric analysis. While several combinations of column package material

and buffer systems have been reported to efficiently desalt oligonucleotides on-flow

before mass spectrometry82, the only system working successfully in our hands was

1,1,1,3,3,3-hexafluoroisopropanol (HFIP) as volatile acid component and

triethylamine (TEA) as ion pairing reagent on a C18 column. Since TEA strongly

suppresses the ionization, its concentration was kept to 5 mM, thus allowing sufficient

ion formation and desalting. This protocol enabled the ESI-MS of oligonucleotides of

various sizes as multiple charged molecules (Figure 3-4b) with sensitivity up at 5

pmol.

58

a)

b)

Initial oligonucleotide

Compound-oligonucleotide conjugated

Carboxylic acid compound

10 -

11 -

12 -13 -14 -

9 - 8 -

7 -

Figure 3-4: Example of oligonucleotide HPLC purification and mass spectrometry charachterization.

a) HPLC purification of a typical coupling reaction of an Fmoc-amino acid and 5’-amino-

ologonucleotide after Fmoc removal. The green line indicates the absorption at 260 nm, the red line at

280 nm. The chromatogram is recorded using TEAA, 100 mM, pH = 7 as buffer system on a C18

column. b) ESI-MS of a compound oligonucleotide conjugate as multiple negative charged molecules.

The peaks with a mass over charge ratio between 7 and 14 are depicted.

3.1.7 Oligonucleotide concentration determination Following HPLC purification and solvent removal under vacuum, the oligonucleotide

fractions were dissolved in water. The concentration was determined measuring the

absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis

spectrophotometer). The extinction coefficient of each oligonucleotide was calculated

from the specific sequence assuming the following per nucleotide molar extinction

coefficient: εT = 8400 cm-1M-1; εA = 15200 cm-1M-1; εC = 7050 cm-1M-1; εG = 12010

cm-1M-1. The ratio of absorbance at 260 nm and 280 nm was used to estimate the

purity of DNA and other contaminants that absorb strongly at or near 280 nm. A ratio

260/280 of ~1.8 was generally accepted as “pure” for DNA. The ratio of absorbance

at 230 nm and 280 nm was used as a secondary measure of nucleic acid purity from

59

organic contaminants which absorb at or near 230 nm. Expected 260/230 values for

“pure” DNA are commonly in the range of 2.0-2.2.

3.1.8 Polymerase Klenow encoding Following coupling with the 200 different carboxylic acids in the second step of

library construction (see Chapter 3.1.1, Figure 3-1), an annealing step with individual

44mer oligonucleotides (see Chapter 3.1.3), partially complementary to the 42mer

oligonucleotides carrying the chemical modification (see Chapter 3.1.3), was

performed. A subsequent Klenow fill-in DNA-polymerization step at 37 °C yielded

double stranded DNA fragments, each of which contained both identification codes

(see Chapter 3.1.1, Figure 3-1). The Klenow fragment83 is a large protein fragment

produced when DNA polymerase I from E. coli is enzymatically cleaved by the

protease subtilisin. The Klenow Polymerase I exhibits optimal performance at 37 °C

retaining the 5’ → 3’ polymerase activity and the 3’ → 5’ exonuclease activity for

removal of precoding nucleotides and proofreading, but losing its 5' → 3' exonuclease

activity. Therefore Klenow Polymerase I is very suitable for fill-in reactions of

partially complementary DNA strands at mild temperature. Conversely, polymerase

fill-in using conventional Taq DNA polymerase requires higher polymerization

temperature (75-80 °C), which may compromises the stability of the conjugate

compounds.

Product of Klenow Polymerase fill-in encoding were analysed by gel electrophoresis

on polyacrylamide gels 20 % trisborate-EDTA (TBE) and 15 % trisborate-EDTA urea

(TBU). In all the reactions the rate of polymerization was complete.

3.1.9 Summary

The individual steps described above were used for the construction of the DEL4000

library as shown in Par 3.1.1, Figure 3-1. After dissolving the 20 Fmoc-protected

amino acid compounds and the specific 5’-amino-modified oligonucleotide tag, a

peptide bond formation reaction was performed. Following Fmoc protection removal,

the reaction products were purified by HPLC and the appropriate fractions dried under

vacuum, dissolved in water and analyzed by mass spectrometry. Typical HPLC yields

on this first step were over 43% (on average 65%). In a second step, each of the 20

60

compound oligonucleotide conjugates was mixed in equimolar amount (4 nmol each)

in order to generate a first DNA encoded sub-library of 20 amino-tagged compounds.

The pool was then equally split in 200 vessels and each vessel underwent a second

peptide bond formation with a different carboxylic acid. Following precipitation of

the oligonuclotides of each reaction as phosphate adducts, the modification was

enzymatically encoded by Klenow assisted polymerization using a further DNA

oligonucleotide fragment. At the same time, the encoding also generated the desired

double stranded DNA format of the final DEL4000 library. After purification of DNA

over ion-exchange cartridges, the 200 reaction vessels were pooled to produce the

final 4000 member compounds DNA Encoded Library (DEL4000). The library was

aliquoted at a total DNA concentration of 300 nM and stored frozen prior to further

use. Figure 3-5 schematically represents the general structure of a typical compound

in the library.

2nd building blockcorresponding

code: 8 bpCODE2

1st building block corresponding

code: 6 bpCODE1

1st constant domain:18 bp

2nd constant domain:18 bp

3rd constant domain:18 bp

Total: 68 bp

GGAGCTTGTGAATTCTGGXXXXXXGGACGTGTGTGAATTGTCYYYYYYYYGTGGTCGGATCCGACTAC-3’

3’ - CCTCGAACACTTAAGACCXXXXXXCCTGCACACACTTAACAGYYYYYYYYCACCAGCCTAGGCTGATG- 5‘

O

HNOHN

CODE1 CODE25‘ 3‘

5‘

Pharmacophore compound

Figure 3-5: Schematic representation of the general structure of a typical compound in the DEL4000

library. Each pharmacophore compound was assembled from two different building blocks (in green

and red) in a split-&-pool fashion and was encoded by two corresponding DNA domains (green X and

red Y) of six and eight base pairs respectively. The coding regions are both flanked by two constant

PCR priming domains of 18 base pairs and by a constant spacer of 18 base pairs that acts as spacer

between the codes.

61

3.2 Selections using the DEL4000 library In order to investigate the functionality of the newly synthesized DEL4000 library and

to validate the reliability of the selection and of the high-throughput sequencing read-

out procedure, DEL4000 was biopanned onto three target proteins (streptavidin,

matrix metalloproteinase 3 and polyclonal human IgG) immobilized on a sepharose

support in three independent selection experiments. Although the concentration of an

individual library member is below 1 nM, binding compounds can efficiently be

recovered by selection with biotinylated target protein in solution at concentrations

above the dissociation constant Kd, followed by streptavidin capture. Similarly, the

selection can be performed with the protein of interest immobilized at high surface

density on a solid support (e.g., CNBr activated sepharose), in full analogy to the

procedures commonly used for the selection of antibodies from phage display

libraries.84 Therefore selections were performed by incubating the DEL4000 library

with the target protein attached on a sepharose resin (Figure 3-6a).54 The resin,

containing the retained DNA-encoded binding molecules was washed four times with

400 µL PBS and finally resuspended in 100 µL water for a subsequent PCR

amplification step followed by high-throughput sequencing (Figure 3-6a). After

analysis of the experimental sequences derived by high-throughput sequencing using

an in-house developed program written in C++, the frequency of each code

corresponding to the individual pharmacophores was plotted in a 3D graph in which

the xy plane represents the 4000 different sequences (compounds) of the library,

while the number of sequence counts for each compounds is reported on the z axis

(Figure 3-6b).

62

GGAGCTTGTGAATTCTGGCAAGCTGGACGTGTGTAATTGTCGACTTCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGATCTTAGGACGTGTGTGAATTGTCTTGGGGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCCTGATCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCAGTCAGGGGTGGTCGGATCCGACTAGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCGTTGACGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGNTTAACTGGACGTGTGTGAATTGTCCTCTNTGTCGGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCTGTGCAGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCCAACGTAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGGGTAAGGACGTGTGTGAATTGTCATTAGCTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCCAACGCCGGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTAAATGGACGTGTGTGAATTGTCCAGTGTGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCCACAGTCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCCACAACTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCGATCGGACGTGTGTGAATTGTCGTTGTTCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGCAAGCTGGACGTGTGTGAATTGTCGCCGTAGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCGGAAAAGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCTGGTGTACGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCAGGAGACCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGCGTGCAGGACGTGTGTGAATTGTCCCCCCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCCGGCGGACGTGTGTGAATTGTCCCCCCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCACCAACGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGGTAAGGCGTGTGTGAATTGTCACAACGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCCATGACCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGACGGCAGGAGTGTGTGAATTGTCTATANGCCGGGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCACCAGTAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCAAAAGGGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGACGGCAGGACGTGTGTGAATTGTCCCAAAACCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCACGTTGGGGTGGTCGGATCCGACTGGAGCTTGTGAATTCTGGACGGCAGGACGTGTGTGAATTACTAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCGACTTCCCGTGATCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCGTGTGTCCGTGGTCGGATCCGACTAGAGCTTGTGAATTCTGGATTACTGGACGTGTGTGAATTGTCCCAAAACCGTGGTCGGATCCGATAGGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCAGGAAGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCTCCTAGTTGTGGTCGCATCCGACTAGGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCCGCGCGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGACGAATGGACGTGTGTGAATTGTCCAGTGTGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCAGGAAGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCGCATATAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGATCTTAGGACGTGTGTGAATTGTCCAACACGGGTGGTCGGATCCGACTA

CODE1 CODE2

SequencesSequences

1

20

Code 1Code 1

Code 2Code 2200

1

Sequence

b)

8

13

18

23

28

33

38

43

High-Throughput Sequencing

DEL4000 Library

Targetprotein

Targetprotein

a)

I II III IV

Figure 3-6: Selection and high-throughput sequencing workflow. a) DEL4000 was incubated with the

target protein immobilized on a sepharose resin (i). The resin, containing the retained DNA-encoded

binding molecules was washed several times (ii) and used as template in a polymerase chain reaction

(PCR) amplification (iii) prior to high-throughput sequencing decoding (iv). b) An in-house made C++

program processes various thousands of raw DNA sequences after high-throughput sequencing (left

panel). All the codes 1 and 2 present in every sequence are identified and plotted in a 3D graph (right

panel). On the xy plane all the 4000 different possible compounds are represented as combination of

Code1+Code2, while on the z axis the number of counts for a specific combination (compound) is

reported.

3.2.1 Streptavidin selection We have initially assessed the relative composition of the new library and its

functionality by performing selection experiments on sepharose resin coated with

streptavidin. Since a variety of streptavidin ligands were known with dissociation

constants ranking from the mM to the fM54 range, the challenge was to investigate

whether binders with different affinities could be isolated from a library containing

4’000 members. D-desthiobiotin was chosen as positive control binder for

streptavidin (Kd = 47nM)54 and a D-desthiobiotin-oligonucleotide-conjugate was

63

synthesized, unambiguously encoded and added to a final concentration of 1 pM to

the library of 4000 compounds (20 nM total DNA concentration). Subsequently the

spiked library was either added to similar amount of streptavidin-sepharose slurry or

to sepharose slurry without streptavidin. Both resins were preincubated herring sperm

DNA to prevent aspecific binding. After incubation for 1 h at 25 ºC the beads were

washed 4 times with PBS buffer and used as template for PCR amplification of the

selected codes.

64

3.2.1.1 Identification of streptavidin binding molecules

Figure 3-7 shows the results of the high-throughput sequencing analysis performed

on the library before selection, after selection on unmodified sepharose resin used as

negative control, and after selection on streptavidin-coated sepharose.

Figure 3-7: Plots representing the frequency (i.e., sequence counts) of the 4000 library members before

selection, after selection on empty resin and after selection on streptavidin resin, as revealed by high-

throughput 454 sequencing. The chemical structures of some of the most relevant straptavidin binders

are indicated. The building blocks used in the two synthetic steps are indicated in green and red color

respectively, together with the respective identification number. A known streptavidin binder

(desthiobiotin) had been mixed with the library at low concentration prior to the selections serving as

positive control.

65

High-throughput sequencing of the library containing 4000 DNA-encoded compounds

yielded up to 12.000 sequences per sample. The counts for individual library codes (z

axis of the 20 x 200 matrices in Figure 3-7) indicate the abundance of the

corresponding oligonucleotide-compound conjugate. As expected, compounds were

found to be represented in comparable amounts in the library before selection. The

average counts and the standard deviations for the 4’000 compounds were found to be

1.72 +/- 1.42 when analyzing 7’336 individual codes from the library before selection.

Similarly, no striking enrichment was observed for selections on unmodified resin. By

contrast, the decoding of the streptavidin selection revealed a preferential enrichment

of certain classes of structurally-related compounds (Figure 3-7). In addition to

desthiobiotin, a biotin analogue with nanomolar affinity to streptavidin,54 which had

been spiked into the library as positive control prior to selection (see Chapter 3.2.1),

we observed an enrichment of derivatives of the thioester moiety 78, of the ester

moiety 49, as well as of other pharmacophores (e.g., 175). Fluorescent amide

derivatives of compounds 49 and 78 had previously been found to bind to streptavidin

with dissociation constants in the millimolar range, as assessed by fluorescence

polarization assays,54 while others (e.g., 175) had not previously been reported as

streptavidin binders.

3.2.1.2 Characterization of streptavidin binding molecules

In order to evaluate whether the extensions of the pharmacophore 49 and 78 moieties

within the new 4’000-membered chemical library (02, 07, 11, 15, 16, 17 depicted in

green color in Figure 3-7, Chapter 3.2.1.1) contribute to an increased affinity towards

streptavidin, we measured the dissociation constants of the most enriched compounds

by fluorescence polarization at 25 °C, following conjugation to fluorescein (Figure 3-

8a; see also Chapter 3.2.4). Additionally, to assess the specificity of preferentially

enriched compounds, we determined the binding affinities towards two unrelated

proteins (bovine carbonic anhydrase II and hen egg lysozyme Figure 3-8b) serving as

negative controls, and we included four non-enriched compounds (15-117, 02-107,

13-40 and 15-78) in the analysis.

66

0

50

100

150

200

250

300

350

400

10-8 10-7 10-6 10-5 10-4 10-30

50

100

150

200

250

300

350

400

10-8 10-7 10-6 10-5 10-4 10-30

50

100

150

200

250

300

350

400

10-8 10-7 10-6 10-5 10-4 10-3

Concentration [M]

Streptavidin Carbonic anhydrase II Lysozymea) b) c)

Concentration [M] Concentration [M]

Fluo

resc

ence

Pol

ariz

atio

n [m

P]

02-78

07-78

17-78

16-78

17-49

11-78

02-49

02-107

13-40

15-117

15-78

(108)

(73)

(70)

(55)

(32)

(48)

(41)

(0)

(2)

(0)

(7)

(In brackets the counts)

Br HN

O NH

DN A

O

S

CH 3O

7 8

02

N HO tB u

O

OHNDN A

H N O

S

C H 3O

0 7

7 8

I H N

O NH

D NA

O

S

C H 3O

78

1 7

HN

O NH

DN A

O

S

CH 3O

78

16

SI H N

O NH

D NA

O

C H 3

4 9

1 7

O

O M e

M eO

O(C H 2 )4

HNO

DN A

H NO

SM e

O

7 8

1 1

Br HN

O NH

DN A

O

O

CH 3

49

02

O

NH

O

OO M e

HO

N OO

4 0 O

HN D NA

13

SHN OO

HNO

OCH 3

HN D NA

O

1 5

1 17

SHNO

ONH

H NDN A

O

O

SM e O

15

7 8

Figure 3-8: Dissociation constants of the selected compounds determined by fluorescence polarization.

Individual compounds identified in the streptavidin selection experiments were synthesized as

fluorescein conjugates and incubated with different concentrations of target proteins (streptavidin,

bovine carbonic anhydrase II and hen egg lysozyme). a) The top streptavidin binding molecules

[various shades of blue], identified with at least 30 counts (see Chapter 3.2.4), exhibited a preferential

binding towards streptavidin, with Kd values ranging between 350 nM and 11 μM). By contrast, non-

enriched compounds (shades of red) did not exhibit an appreciable binding to streptavidin (Kd > 50

μM). b, c) Neither the streptavidin binders nor the non-enriched compounds exhibited an appreciable

binding to carbonic anhydrase II or to lysozyme. The structures of the 11 compounds can be found in

Figure 3-7.

The dissociation constants towards streptavidin of the most enriched compounds

ranged between 350 nM and 11 μM [Kd (17-49) = 350 nM; Kd (02-78) = 385 nM; Kd

67

(17-78) = 374 nM; Kd (02-49) = 804 nM; Kd (16-78) = 1.1 μM; Kd (11-78) = 3.5 μM;

Kd (07-78) = 11 μM; Figure 3-8a]. These compounds, each represented at least 30

times in the high-throughput sequencing results, were found at least ten-times more

frequently after selection on streptavidin, compared to their occurrence in the

unselected library and to what would be predicted by a random statistical distribution

(for a simulation, see Chapter 3.2.4). By contrast, four randomly chosen negative-

control compounds, experimentally found less than 7-times after sequencing,

exhibited Kd values to streptavidin > 50 μM (Figure 3-8a). Importantly, all

compounds exhibited no appreciable binding affinity (Kd > 200 μM; Figure 3-8b,c)

towards lysozyme and carbonic anhydrase serving as negative control proteins, thus

confirming the specificity of the streptavidin selection. Table 3-2 summarizes the

dissociation constants of the tested compounds towards the different targets.

Fluorescent

Compound

Counts after DEL4000

selection

Streptavidin

Kd (μM)

Lysozyme

Kd (μM)

Carbonic anhydrase

Kd (μM)

13-40 2 54 384 703

11-78 48 3.5 753 781

17-49 32 0.35 1.9e3 225

17-78 70 0.37 834 264

16-78 55 1.1 5.2e3 1e5

15-117 0 99 1.3e7 1.58e8

02-49 41 0.80 1.9e4 452

02-78 108 0.38 3.9e3 5.4e7

07-78 73 11 9.8e8 448

15-78 7 79 1.9e7 6.9e6

02-107 0 50 694 1.4e8

Table 3-2: Complete list of the dissociation constants towards different targets of the selected

compound fluorescein conjugate revealed by fluorescent polarization measurements.

68

3.2.2 Polyclonal human IgG selection Immunoglobulin G (IgG) is an immunoglobulin consisting of two heavy chains γ (H)

and two light chains (L) linked to each other by disulfide bonds, with a total

molecular weight of approximately 150KDa.85 As for the other immunoglobulins, the

variable portion (V-domain) of the heavy and light chains (VH and VL respectively) of

IgG confers to the antibody the ability to bind specific antigen, whereas the constant

domains (C domains, CH and CL respectively) determine the isotype and therefore the

functional properties of the antibody.85 The IgG is the most abundant immunoglobulin

with four different isotypes (IgG1, 2, 3, and 4 in humans) representing the 75% of

serum immunoglobulins in humans. 85 IgG molecules are synthesised and secreted by

plasma B cells and are predominantly involved in the secondary antibody response. 85

Two antigen binding sites allow the binding of IgG to a variety of pathogens (viruses,

bacteria and fungi), protecting the body against them by agglutination,

immobilization, complement activation, opsonization for phagocytosis and

neutralization of their toxins. 85 IgG plays a fundamental role in the immune defence

against pathogens and certain monoclonal antibodies can be used for pharmaceutical

applications. Consequently the production and engineering of therapeutic antibodies

has attracted the interest of numerous pharmaceutical companies.86 For this reason, in

a second selection of DEL4000 library we aimed to identify small organic molecules

which display binding to polyclonal human IgG, immobilized on CNBr-activated

sepharose, which could be useful for affinity purification of human IgG in the

industrial manufacture practice.

3.2.2.1 Identification of polyclonal IgG binding molecules

After selection of the library DEL4000 on polyclonal human IgG-sepharose resin and

a PCR amplification step, high-throughput sequencing decoding was performed. A

total 39’092 sequence tags were identified. Figure 3-9 graphically summarizes the

high-throughput sequencing results, revealing a superior enrichment, after selection of

the derivatives of the compound 40 (927 times overall combination counts) and of the

thiophene moiety 69 (927 times overall combination counts). Typically, bromide 02-

40 was identified 96 times out of a total 39’092 identified sequence tags, while >50%

of library members were detected between 1 and 10 counts and approximately 10% of

the compounds were identified over 20 counts (see also Chapter 3.2.4).

69

20

60

100

1

2001

20

69

02

1640

02

40118

08

18 69B r

N H

O

O

M e O

H O

N O 2

O

NH

D NA

B r

NH

S O

OHN

D N A

S

HN

OO

O M eHO

O 2N

HNO

DN A

NHS

O

O

HN

D NA

NH

O

O

O

O

HN

D N A

Figure 3-9: Plot representing the frequency (i.e., sequence counts) of DEL4000 library members after

selection on polyclonal human IgG resin, as revealed by high-throughput 454 sequencing. The

chemical structures of some of the most relevant compounds enriched are indicated. The building

blocks used in the two synthetic steps are indicated in green and red colour respectively, together with

an identification number.

70

3.2.2.2 Characterization of polyclonal IgG binding molecules by affinity

chromatography resins

Using the diamino linker O-bis-(aminoethyl) ethylene glycol, compound 02-40 and

16-40 were coupled to CNBr-activated sepharose, and the resulting resin was

evaluated for its performance in the affinity capture of labelled (Cy5 fluorescent dye

and biotinilated) polyclonal human IgG, spiked into Chinese hamster ovary (CHO)

cell supernatant. After loading 100 μL (4 μM) of labeled polyclonal human IgG on 70

mg either of 02-40-sepharose resin or of 16-40-sepharose resin, the affinity

chromatography columns were washed with 5 mL PBS, 5 mL 500 mM NaCl, 0.5 mM

EDTA and 5 mL 100 mM NaCl, 0.1% Tween 20, 0.5 mM EDTA and eluted three

times with 200 μL of triethylamine 100 mM. All the fractions were collected and

concentrated back to a final volume of 100µL by centrifugation and consequently

analyzed by gel electrophoresis. Figure 3-10 shows that both IgG labelled with the

fluorophore Cy5 and with biotin could be completely and selectively captured from

the supernatant, and could be eluted using 100 mM aqueous triethylamine solution.

HN O

HN

2 OO

Ores in

H N

O

M In W E (+)

Coomassie Blue Cy5 Detection

M In W E (+) M In W E (+) M In W E (+)

Coomassie Blue Streptavidin-based blot

IgG (Cy5-labeled) IgG (biotinylated)

150

102

52

225

38

76

40

02 or 16

Figure 3-10: Affinity chromatography of CHO cells supernatant on resin containing the compound 02-

40 or 16-40, spiked with human IgG labeled either with Cy5 or with biotin. For antibody purifications,

relevant fractions were analyzed by SDS-PAGE both with Coomassie Blue staining and with a specific

detection method (Cy5 fluorescence and a streptavidin horseradish peroxidase-based blot,

respectively). M = molecular weight marker; In = Input fraction for the chromatographic process; W =

pooled washed fractions; E = pooled eluted fractions. The lane (+) corresponds to Cy-5 or biotin-

labeled polyclonal human IgG. In, W and E fractions were normalized to the same volume, prior to

SDS-PAGE analysis.

71

3.2.3 Matrix metalloproteinase 3 (MMP3) selection General catabolism of tissue structures by tumour-cell proteases provides access to

the vascular and lymphatic systems, thereby facilitating metastases and cancer

dissemination.87 Proteolytic enzymes, through their capacity to degrade extracellular

matrix (ECM) proteins, are important components of this process. Among protease-

like proteins, the matrix metalloproteinases (MMPs) are a group of 24 zinc-dependent

enzymes capable of degrading the ECM and the basement membrane and process

bioactive mediators.88 For this reason, MMPs have been the focus of much anticancer

research, with inhibitors investigated in clinical trials. The establishment of causal

relationships between MMP overexpression and tumour progression initially

encouraged the development of MMP inhibitors (MMPIs) as cancer therapeutics.89 In

addition to connective-tissue-remodelling functions, MMPs are known to precisely

regulate the function of bioactive molecules by proteolytic processing. For example,

MMPs mediate cell-surface-receptor cleavage and release, cytokine and chemokine

activation and inactivation, and the release of apoptotic ligands.90 These processes are

involved in cell proliferation, adhesion and dispersion, migration, differentiation,

angiogenesis, apoptosis and host defence evasion characteristic of the early stages of

tumour growth, before metastasis occurs.89,90 MMPIs may therefore be potentially

suitable for blocking cancer progression. At the same time, inhibitors with insufficient

specificity may suppress normal tissue function or host defence processes. Very

recently our group used 550 member ESAC library (for the technology see Chapter

2.1.2) in a two-step selection procedure for the identification of novel inhibitors of

stromelysin-1 (MMP-3), a matrix metalloproteinase involved in both physiological

and pathological tissue remodeling processes, yielding novel inhibitors with

micromolar potency suitable for subsequent medicinal chemistry optimization.57

Encouraged by the promising results we decided to perform a MMP3 selection with

the larger library DEL4000.

3.2.3.1 Identification of MMP3 binding molecules

Figure 3-11a shows the relative abundance of the individual compounds as obtained

from high-throughput sequencing. A different fingerprint compared to the streptavidin

and IgG selections was observed. Among the compounds which displayed the highest

72

enrichment, four compounds were selected (02-118, 13-17, 18-96, 17-104) and tested

for MMP3 binding and inhibition.

3.2.3.2 Characterization of MMP3 binding molecules

The MMP3 affinity constants of the compounds 02-118, 13-17, 18-96, 17-104 were

determined by fluorescence polarization at 25 °C, following conjugation to

fluorescein using the diamino linker O-bis-(aminoethyl) ethylene glycol (Figure 3-

11b). Compound 02-118 exhibited the best dissociation constant (Kd of 11 μM), while

the other selected compounds did not reveal an appreciable binding to MMP3 (Figure

3-11b). On the other hand the inhibition assays were performed incubating the MMP3

(500 nM) with a dilution series of the inhibitor (02-118, 13-17, 18-96, 17-104) using

Mca-Pro-Leu-Gly-Leu-Dpa-Ala-Arg-NH2 as fluorogenic substrate. Essentially no

substantial inhibition was observed for any of the compounds tested. The observation

led us to the conclusion that compound 02-118 likely binds yet at a site outside of the

catalytic pocket.

200

5

30

55

NO

O

I

I

H N

O

N HD N AO

Br

H N

O

NHDN ANH

N

O O

O

O

NH

DN A

Br

HN

O

ONH

O

DN A

2

118

13

17

18

96

104

17

1

1

20

55

5

02-11813-1715-11717-10418-96

0

50

100

150

200

250

10-7 10-6 10-5 0.0001

[MMP3] (M)

HN

O

HN

O

S

HN

O

O

O

O HHO

2

O

HN

a) b)

30

Figure 3-11: DEL4000 library selection with human MMP3. a) The plot represents the frequency (i.e.,

sequence counts) of DEL4000 library members after selection on human MMP3 resin, as revealed by

high-throughput 454 sequencing. The chemical structures of some of the most relevant compounds

enriched are indicated. The building blocks used in the two synthetic steps are indicated in green and

red colour respectively, together with an identification number. b) MMP3 affinity constants

determination by fluorescence polarization of the compounds 02-118, 13-17, 18-96, 17-104.

Compound 02-118 exhibited the best dissociation constant (Kd of 11 μM), while the other selected

compounds did not reveal an appreciable binding to MMP3.

73

3.2.4 Computational simulation of DEL4000 selections In order to assess whether the enrichment of a compound in high-thoughput

sequencing procedures is statistically significant, we simulated the stochastic

distribution of sequence counts, using software written in-house (Dr. Y. Zhang) in

C++. The program generated a pool of 4000 equally likely numerical codes

representing the 4000 member of DEL4000 library. According to the number of

sequences obtained after high-throughput sequencing decoding, a corresponding

number of codes picking was randomly performed by the software out of the

stochastic pool. The simulation was then repeated 100 times. The average simulated

distribution was plotted displaying the number of codes (i.e., DNA-encoded

compounds in the library), which would be observed with a given number of counts

(i.e., number of sequences) in an ideal library before selection, in which all library

members were equally represented. This simulated distribution was compared with

the experimental distribution of the sequence counts observed for the members of the

library after 454-assisted sequencing of the PCR reaction before selection (Figure 3-

12a), after selection on Tris-quenched resin (Figure 3-12b) and on streptavidin-resin

(Figure 3-12c), as well as resin coated with human matrix metalloproteinase 3

(Figure 3-12d) and with polyclonal human IgG (Figure 3-12e).

74

a) b) c) d) e)

*

Figure 3-12: Simulated and experimental distribution of sequence counts observed for members of the

library before selection (a) and after selection on Tris-quenched resin (b), streptavidin-resin (c), as well

as resin coated with human matrix metalloproteinase 3 (d) and with polyclonal human IgG (e). The

plots display the number of codes (i.e., DNA-encoded compounds in the library), which were observed

with a given number of counts (i.e., number of sequences) either in the experimental 454-assisted

sequence of PCR reaction (performed before or after selection), or in a computer-assisted simulation.

While in the library before selection experimental findings and simulation are in excellent agreement,

in selection experiments certain compounds are enriched much more compared to what would be

predicted from the statistical distribution of sequence counts in an equimolar mixture of compounds.

The sequences of compounds in plot (c) identified with an asterisk were found more than 30-times;

these compounds were then chosen for the experimental affinity determination (Figure 3-8). The

individual plots exhibit a different maximum for the simulated curve of number of codes observed with

a certain number of counts, due to differences in the overall number of experimental sequences (e.g.,

7’336 overall sequence counts for the library before selection; 39’032 overall sequence counts for IgG

selections).

While in the library before selection experimental findings and simulation are in

excellent agreement, in selection experiments certain compounds are enriched much

more compared to what would be predicted from the stochastical distribution of

sequence counts in an equimolar mixture of compounds. The sequences of

compounds in plot of Figure 3-12c identified more than 30-times (indicated with an

asterisk in Figure 3-12c) were then chosen for the experimental affinity determination

(see Chapter. 3.2.1.1, Figure 3-7). Notably, the individual plots exhibit a different

maximum for the simulated curve of number of codes observed with a certain number

of counts, due to differences in the overall number of experimental sequences (e.g.,

7336 overall sequence counts for the library before selection; 39032 overall sequence

counts for IgG selections).

75

3.3 General strategies for the stepwise construction of very large

DNA encoded chemical libraries

The demonstration that high-quality DNA-encoded chemical libraries could be

synthesized and decoded using 454 high-throughput sequencing technology

encouraged us to investigate methodologies for the construction of larger DNA-

encoded chemical libraries of unprecedented size, (potentially comprising >106

compounds), featuring the stepwise addition of at least three independent sets of

chemical moieties and identification oligonucleotide tags. Therefore we investigate a

three rounds split-&-pool chemical library synthesis based on selective deprotection

and reaction of di-amine carboxylic acid derivative core scaffolds as well as three

different encoding strategies, featuring the stepwise insertion of three independent

oligonucleotide codes using experimental procedures based either on the sticky-end

ligation of DNA fragments and/or annealing of partially complementary

oligonucleotides, followed by Klenow-assisted polymerization.

3.3.1 Selective deprotection and reaction of di-amine derivatives The general strategy for the construction of a DNA-encoded chemical library

consisting of N x M x K modules (i.e., 10 x 200 x 200 compounds) joined together by

the formation of an amide bond using a split-&-pool procedure is given in Figure 3-

13 (see also Chapter 2.1.1.4, Figure 2-8). Initially, a set (i.e., N = 10) of di-amino

protected carboxylic acids is conjugated to distinct amino modified synthetic

oligonucleotides. Cleavage of one amino moiety protective group (PG1) of each of the

core scaffolds followed by split-&-pool amide bond formation reaction with selected

carboxylic acid (i.e., M = 200) and subsequently enzymatic encoding lead to a first

sub-library pool of N x M members. After removal of the further amino moieties

protective group (PG2) and split, the N x M pool may undergoes an additional amide

bond formation reaction with suitable carboxylic acids (i.e., K = 200). Encoding of

the last set of carboxylic acids used through enzymatic elongation of the

oligonucleotide tags and pooling of the reaction may lead to the final library mix of N

x M x K member compounds (i.e., 400’000).

76

NHPG2

PG1NHNHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

NHPG2

. . . . . . . . . . . .

EncodingSplit /Reaction

Pool EncodingSplit /Reaction

Pool....

....

NHPG2

PG1NH

NHPG2PG1NH

10N

1

2

n

200M

1

2

n

200K

Figure 3-13: General strategy for the construction of a DNA-encoded chemical library consisting of N

x M x K modules (i.e., 10 x 200 x 200 compounds). The amino protective groups (PG1) of a set (i.e., N

= 10) of di-amino protected carboxylic acids conjugate to unique oligonucleotide tags are removed.

Subsequently in a split-&-pool fashion amide bond formation reaction is performed with selected

carboxylic acid (i.e., M = 200). After enzymatic encoding, cleavage of the further amine protective

group (PG2) allows an additional split-&-pool amide bond formation reaction with carboxylic acids

(i.e., K = 200). DNA Encoding of the final modifications led to the final DNA-encoded library of N x

M x K compounds (i.e., 400’000).

3.3.1.1 Orthogonal protective group and selective deprotection

The choice of appropriate orthogonal protective groups and of convenient di-amino

carboxylic acid core scaffolds is crucial for the construction of a DNA-encoded

library as described in the previous paragraph (Figure 3-13). A list of useful

protective groups for amino moieties with suitable removal condition compatible with

the DNA is shown in Table 3-3. The assessment of the effective DNA-compatibility

of the cleavage condition (for Fmoc cleavage see Chapter 3.1.2) was obtained by

coupling a specific N-protected cis-2-aminocyclopentanecarboxylic acid to a 5’-

amino-modified oligonucleotide. Following purification by HPLC, the cleavage of

77

the amino moiety was performed and analyzed by HPLC and mass spectrometry

(Table 3-3).

1a-d

N-protected cis-2-aminocyclopentanecarboxylic acid

Compound Protective Group

(PG) Name Type Cleavage HPLC yield

1a O

O

9H-fluoren-9-methyl

carbamate

(Fmoc)

Base labile

Piperidine

500mM

water/DMSO, 4 ºC,

1h

Quantitative

1b

O M eO 2N

O M eO

O

4,5-dimethoxy-2-

nitrobenzyl

carbamate

(Nvoc)

Photocleavable

366 nm, 1mM

AcOH/AcONa

pH 4.7 Pyrex, 4 ºC,

30min

Quantitative

1c O

pent-4-enamide

Iodo

lactonization

I2 THF/water

1h 80 %

1d O O

2-(biphenyl-4-

yl)propan-2-yl

carbamate

(Bpoc)

Acid labile

AcOH/AcONa

water

pH 3-4, 35 ºC, 1h

90 %

Table 3-3: Protective groups for amino moieties with compatible with the DNA. Cis-2-

aminocyclopentanecarboxylic acid was protected on the amino moiety with a selected protective group

and coupled on a 5’-amino-oligonucleotide. Following conjugation, removal of the protective group

was performed and HPLC assessed the yield of the cleavage.

Prior to scaffolds preparation, we investigated the selectivity of the orthogonal-

removal of a combination of two amino-protective groups in presence of DNA. We

explored the use of Fmoc (base labile) and Nvoc (photo-cleavable) amino protective

group combination. After coupling of Nα-Fmoc, Nε-Nvoc lysine (2) to an amino-

modified oligonucleotide, Fmoc was removed through addition of piperidine and the

COOH

NHPG

78

completeness of the reaction assessed by HPLC (Figure 3-14). The mass of the

expected N-Nvoc amino-acid oligonucleotide conjugate was confirmed by ESI-MS as

the only product of the reaction.

(C12)

1) Sulfo-NHSEDCDMSO30°C, 15min

2)

TEA/HCl pH = 1030 °C, o/n

3) Piperidine 500mM4°C, 1h

(C12)NH2

NvocCl 1eq.Na2CO3 2eq.

water/dioxane2h

O M eO 2 N

O M eOCl

O

= NvocCl

HN

NH2

Nvoc

O

NH

H2N COOH

NHFmoc

HN COOH

NHFmoc

Nvoc

2

Figure 3-14: Selective removal of Fmoc protective group on model Nα-Fmoc, Nε-Nvoc di-amino

carboxylic acid oligonucleotide conjugate. Initially the terminal amino moiety of 2-N-Fmoc lysine was

protected by mean of NvocCl reagent (i). Following coupling to 5’-amino-oligonucleotide, piperidine

was added and Fmoc removed (ii). After HPLC, ESI-MS revealed N-Nvoc amino-acid oligonucleotide

conjugate as the only product of the reaction.

3.3.1.2 Core scaffolds design and synthesis strategy

The confirmation that Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid conjugated to

amino-modified oligonucleotide allows the selective cleavage of the Fmoc moiety,

quantitatively yielding in the corresponding N-Nvoc protected di-amino acid

oligonucleotide conjugate, led us to the preparation of a variety of Nα-Fmoc, Nε-Nvoc

di-amino carboxylic acid core scaffolds for investigating the feasibility of the library

synthesis pathway. Figure 3-15 depicts four convenient strategies for the preparation

of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acids.

79

∗

∗∗

NvocHN

COOH

FmocHN∗

∗∗

BocHN

COOM e

HO

2HCl

COOH

NH2H2N 1. NvocCl 1eq.DIEA, DMF/water

2. FmocCl

COOH

NHNvocFmocHN

1. Pd(PPh3)2Cl2K2CO3DME/EtOH/waterμW

2. FmocCl

+

X = Br, IR = substituent with primary amino moiety

R

X

XB(OH)2

R

COOH

NvocHN

B(OH)2

COOH

NvocHN

NvocHN

HOOC

R

NHFmoc

+

NvocHN

R

NHFmoc

COOH

NH2

NHFmocHOOC NvocCl 1eq.

DIEA,

DMF/water NHNvoc

NHFmocHOOC

a)

b)

c)

d)

or or

Figure 3-15: Scheme summarizing convenient strategies for the preparation of Nα-Fmoc, Nε-Nvoc di-

amino carboxylic acid scaffolds. a) Synthesis of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid chiral

scaffolds. All the eight initial diastereomers are available commercially. The synthetic strategy

allowing the preparation of the final Nα-Fmoc, Nε-Nvoc product can be found afterwards in Figure 3-

16. b) Preparation of aromatic Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid symmetric scaffold. c)

Synthesis of biphenyl Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid core scaffolds by means of Suzuki

cross-coupling microwave assisted using either amino boronic acid and suitable aromatic halides or

amino carboxylic boronic acid and opportune amino halides. d) Synthesis of short alkyl linkers as Nα-

Fmoc, Nε-Nvoc di-amino carboxylic acid.

Notably, the strategy in Figure 3-15a allows in a straightforward fashion the

synthesis of a large variety of stereoisomeric core scaffolds starting from chiral

precursor compounds. Conversely, the strategies in Figure 3-15b,c,d describe

convenient pathways for the preparation of aromatic, bi-phenyl and alkyl Nα-Fmoc,

Nε-Nvoc carboxylic acid building-blocks respectively.

80

3.3.1.3 Model compounds for N-Fmoc, N’-Nvoc di-amino carboxylic acid core

scaffold based library.

In order to demonstrate the possibility of constructing a DNA encoded library by

means of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid core scaffold in a three split-

&-pool rounds approach, we initially synthesized two building blocks (Nα-Fmoc, Nε-

Nvoc -lysine and (1S,3R,4R)-3-Nvoc-amino-4-Fmoc-amino-cyclopentanecarboxylic

acid) according to the reaction scheme given previously (see Chapter.3.3.1.1, Figure

3-14) and in Figure 3-16a.

1. MeSO2ClTEA, CH2Cl2

2. NaN3DMF

H2 1 atmPd/C

MeOH

FmocClDIEA

DMF

HCl 2N

water/dioxane

H C l

NvocClNa2CO3

water/dioxane

BocHN

C OOM e

HO

BocH N

COO Me

N 3

BocHN

COO Me

H 2N

BocH N

C OO Me

FmocH N

H 2N

C OO H

FmocH N

NvocH N

C OO H

FmocH N

== NvocHNNH2

OHNNvocHN

H2N

NH

O

(C12) NH2

NHNvoc

(C12) NH2

NHNvoc

Sulfo-NHSEDC DMSO30°C, 15min

HOOC

NHO H

HN

HS

COOH

366nm, pyrex1 mM AcOH/AcONa(pH 4.7), 30min

(C12) NH

NH2

O

COOH

NHO H

HN

H

BiotinO

(C12) NH

NH2

Biotin

Desthiobiotin

(C12) NH

NH

O

O

Desthiobiotin

O

(C12) NH

NH

Biotin

Desthiobiotin

O

Sulfo-NHSEDC DMSO30°C, 15minCoupling I Nvoc emoval

Coupling II

b)

a)

3 4 5

6 7 8

Figure 3-16: Preparation of DNA-encoded model compounds of Nα-Fmoc, Nε-Nvoc protected di-

amino carboxylic acid based library. a) Reaction scheme for the synthesis of an Nα-Fmoc, Nε-Nvoc di-

amino carboxylic acid chiral scaffold. The synthesis has been accomplished with an overall yield of 70

%. b) The N-Nvoc di-amino acid oligonucleotide conjugates were coupled by amide bond formation

reaction to biotin or to 3-p-tolylpropanoic acid. Irradiation at 366 nm at 4°C for 30 min in 1 mM

AcOH/AcONa (pH 4.7), enables Nvoc removal. In a final step, the two resulting compounds were

coupled to a further carboxylic acid (desthiobiotin). HPLC and ESI-MS analysis confirmed the identity

of the expected products.

81

Following coupling of the Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative to

5’-amino-modified oligonucleotide and Fmoc deprotection, the product was purified

over HPLC. Biotin or to 3-p-tolylpropanoic acid as model carboxylic acids were

coupled by amide bond formation reaction to the N-Nvoc di-amino acid

oligonucleotide conjugates. The reaction was then purified from small sized organic

contaminants by precipitation as sodium phosphate adducts (Figure 3-16b).

Following Nvoc removal by irradiation at 366 nm at 4°C for 30 min in 1 mM

AcOH/AcONa (pH 4.7), the oligonucleotide conjugats was precipitated once more as

sodium phosphate complex (Figure 3-16b). In a final step, the two resulting

compounds linked to the oligonucleotide were coupled to a further model carboxylic

acid (desthiobiotin). HPLC and mass spectrometry analysis revealed an overall

conversion of 35% and 38%, respectively, into the desired bi-sobstituted

oligonucleotide conjugates.

82

3.3.2 Stepwise DNA-encoding Encouraged by the preliminary results on selective deprotection and split-&-pool

reaction of the di-amine derivatives (see Chapter 3.3.1) , the next challenge prior to

the library construction was to investigate methodologies for the stepwise addition of

at least three independent sets of oligonucleotide tags as identification code for the

chemical building blocks. Three experimental procedures were explored to add in a

stepwise fashion the oligonucleotide fragments either by DNA ligation enzyme using

sticky-end oligonucleotides or through annealing of partially complementary DNA

fragments followed by Klenow-assisted polymerization. The feasibility of the three

alternative strategies was demonstrated by gel-electrophoretic analysis and by DNA

sequencing of the final construct.

3.3.2 Encoding by ligation

Figure 3-17 features the stepwise addition of groups of chemical moieties onto an

initial scaffold followed by the sequential addition of the corresponding DNA codes

by an iterative ligation procedure. This scheme (Figure 3-17a) is conceptually simple

and can be implemented experimentally, but requires two double-stranded DNA

fragments with sticky ends for each encoding event (i.e., 200 + 200 + 200

oligonucleotides for a library containing 100 x 100 x 100 chemical groups). Native

PAGE analysis with 20% TBE revealed the identity and purity of the DNA fragments

used in the encoding procedure (Figure 3-17b).

5‘3‘ Code AReaction

5‘Code A

5‘

3‘

3‘

5‘Code A

5‘

3‘

3‘5‘ 3‘

5‘Code A

3‘5‘

3‘

5‘3‘ Code A

5‘ 3‘

Ligation Code B

LigationReaction Code B Code BCode C

Code B

a)

100

6040

20

a1

a2a4

M a1 a2 a3 a4a5 a5

a3*

b)

Figure 3-17: Stepwise encoding by ligation. a) Encoding strategy based on the sequential ligation of

double-stranded DNA fragments. b) Native PAGE analysis with a 20% TBE gel revealed the identity

and purity of the DNA fragments used in the encoding procedure. M: marker; a1) single strand 28mer

DNA fragment; a2) single stranded 32mer DNA fragment; a3) hybridization of 28mer (a1) with the

32mer (a2) DNA fragments; a4) double-stranded DNA 50mer first ligation step product; a5) Double-

stranded 78mer second ligation step product. *) The band is the hybridized oligonucleotide (of a 28mer

and 24mer) carrying the Code C which was used in excess.

83

3.3.2.1 Encoding by a combination of Klenow polymerase and ligation

An encoding strategy featuring the combination of the Klenow-assisted encoding

strategy (see Chapter 3.1.8) and the encoding by ligation (see Chapter 3.3.2.1) is

depicted in Figure 3-18a. A double-stranded DNA fragment generated by Klenow

fill-in using a biotinylated template is digested with a non-palindromic cutter (i.e.,

BssSI), followed by streptavidin capture of the biotinylated residual fragments

(Figure 3-18a). Subsequently, a ligation step with a complementary double-stranded

DNA fragment carrying the third code is performed (Figure 3-18a). Denatured PAGE

analysis using a 15% TBE-Urea gel revealed the purity and identity of the DNA

fragments generated in the encoding steps (Figure 3-18b).

5‘3‘ Code AReaction 5‘3‘ Code A Annealing

StreptavidinCaptureLigation

5‘Code A3‘ Code BCode C

3‘5‘

*

M a1 a2 a5a3 a4

a1

a5

a6

a6

Biotin

5‘3‘ Code A 5‘Code A

5‘

3‘

3‘

Code BKlenow

Biotin

5‘ 3‘

BssSI

Code B

Non-palindromic

Biotin

BssSI Digestion

5‘

5‘

3‘

3‘

Reaction

5‘Code A

5‘

3‘

3‘

Code B

Biotin

a2

a3 a4

100

70

40

a)

b)

Figure 3-18: Stepwise encoding by combination of Klenow polymerase and ligation. a) Encoding

strategy based on the formation of a double-stranded DNA fragment by a Klenow-assisted

polymerization step, followed by the ligation of a DNA-fragment carrying the third code. b)

Denaturing PAGE analysis using a 15% TBE-Urea gel revealed the purity and identity of the DNA

fragments generated in the encoding steps. M = marker; a1) single strand 42mer DNA fragment a2)

44mer partially complementary 5’-biotinylated single stranded DNA fragment; a3) 27mer and 23mer

hybridized DNA ligation fragments; a4) Klenow assisted polymerization 68mer product; a5) BssSI

digestion product (54mer); a6) full-length (81mer) DNA fragment. *) The band is an artefact resulting

from incomplete denaturation. If excised, extracted and loaded on a gel, this band migrates at the

expected height of a double-stranded 81-base DNA fragment.

84

3.3.2.2 Encoding by Klenow polymerase

The synthetic and encoding strategy depicted in Figure 3-19a represents a natural

extension of the encoding strategy used in the assembling of DEL4000 library (see

Chapter 3.1.8), which would require the lowest number of oligonucleotides for library

encoding (100 + 100 + 100 oligonucleotides for a library containing 100 x 100 x 100

chemical groups). The feasibility of the experimental procedure was demonstrated by

denaturing 15% TBE-Urea gel-electrophoretic analysis (Figure 3-19b), which

monitored the stepwise assembly of DNA-fragments of suitable size.

Biotin

5‘3‘ Code AReaction

5‘Code A3‘

5‘3‘ Code A

Reaction Code B

Annealing 5‘3‘ Code A 5‘Code A

5‘

3‘

3‘

Code BKlenow

Biotin

StreptavidinCapture

5‘Code A3‘ Code B1) Annealing

2) Klenow

5‘Code A3‘ Code BCode C

3‘5‘

100

M a1

a1

a2 a4a3 a5

a4a3

a5

Code B 3‘a2

5‘

70

40

a)

b)

Figure 3-19: Stepwise encoding by Klenow polymerization. a) Encoding strategy based on the

formation of a double-stranded DNA fragment by the sequential use of two Klenow-assisted

polymerization steps, starting from partially complementary oligonucleotides. b) Denaturing PAGE

analysis performed using a 15% TBE-Urea gel revealed the purity and identity of the DNA fragments

generated in the three Klenow-mediated encoding steps. M = marker; a1) single stranded 5’-

aminomodified 42mer DNA fragment; a2) partially complementary 3’-biotinylated single stranded

DNA fragment; a3) 42mer single-strand DNA fragment partially complementary to first Klenow step

product; a4) single-strand 66mer DNA product following first Klenow step polymerization and

purification; a5) full- length (90mer) DNA fragment, following purification.

85

3.3.3 Summary Based on the promising results achieved with the selective deprotection and reaction

of di-amine derivatives (see Chapter 3.3.1 and Chapter 3.3.2), we have investigated

the feasibility of a step-by-step encoding of a model library member comprising three

building blocks. The N-Fmoc, N’-Nvoc protected di-amino carboxylic acid depicted

in Figure 3-20a was coupled to a 5’-amino-modified oligonucleotide. Following

Fmoc removal, the oligonucleotide conjugate was coupled to a further model

carboxylic acid (3-p-tolylpropanoic acid) by amide bond formation reaction and

precipitation as sodium phosphate adducts (Figure 3-20a). A subsequent Nvoc

removal step (Chapter 3.3.1.3) allowed the modification of the resulting

oligonucleotide derivative, carrying a reactive primary amino group. In order to

generate a DNA fragment carrying three codes which univocally identify the building

blocks used for library construction, three experimental strategies were envisaged and

experimentally demonstrated. One of the three strategies is depicted in Figure 3-20,

featuring the use of a biotinylated oligonucleotide in the klenow-assisted fill-in

reaction for the introduction of the second code. A third step in the encoding

procedure, featuring the ligation of a Cy3-labeled double stranded DNA fragment,

allowed the monitoring of the encoding procedure not only by EtBr DNA staining, but

also by fluorescence imaging of gel-electrophoresis (Figure 3-20b). DNA sequencing

confirmed the identity and the purity of the DNA constructs (Figure 3-20c).

86

1. Coupling

2. Ethanol precipitation

1. Coupling

2. Fmoc removal3. HPLC

(C12)NH2

(C12)

HN

NH2

Nvoc

O

NH

HN

NHFmoc

Nvoc

COOH

HOOC

(C12)NH

HN

HN

OO

Nvoc

1. Nvoc removal2. Klenow encoding

Biotin

5‘

BssSI

3‘

5‘3‘

Biotin

(C12)NH

H2N

HN

OO

1. Cy5 Coupling2. Ethanol Precipitation

3. BssSI digestion

Biotin

5‘

3‘

3‘

(C12)NH

HN

HN

OO

O

Cy5

Ligationencoding

5‘3‘

3‘5‘

StreptavidinCapture 5‘

3‘ 5‘

Cy3

(C12)NH

HN

HN

OO

O

Cy5

EthidiumBromideFluorescence

Cy5Fluorescence

Cy3Fluorescence

* * **

*

60

70

100

80

a)

b)

Cy3

a1a2, a3

a4

a1 a2 a3 a4M

III

III

IV

V

IV

c)

87654321

ColonyRestriction site BamHICode C

Restriction site BssSICode B

Restriction site Ecor I Code A

10 15 19 24 43 50 63 68 70 7554 59

5’-GGAGCTTGTGAATTCTGGGTTAGTGGACGTGTGTGAATTGTCGATTACCAGTACTCGTGAAATTTGCTAGGATCCATATTG-3’

3‘- CCTCGAACACTTAAGACCCAATCACCTGCACACACTTAACAGCTAATGGTCATGAGCACTTTAAACGATCCTAGGTATAAC–5‘TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT Code C

GATTACCAGATTACCAGATTACCAGATTACCAGATTACCAGATTACCAGATTACCAGATTACCA

Code B

GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT Code A

Figure 3-20: Step-by-step synthesis and encoding of a model library member compound of N-Fmoc,

N-Nvoc di-amino carboxylic acid based library. a) N-Fmoc, N-Nvoc di-amino carboxylic acid

compound was conjugated to a 5’-amino-modified oligonucleotide (42mer), (i). Following removal of

Fmoc and HPLC (i), coupling reaction with 3-p-tolylpropanoic acid was performed (ii). Subsequently

Nvoc was removed by irradiation at 366nm and Klenow-assisted encoding was completed with a

partially complementary 5’-biotinylated oligonucleotide (44mer) carrying a BssSI restriction site (iii).

The extended DNA product was labelled with Cy5-N-hydroxysuccinimide ester reagent and restricted

with BssSI enzyme (iv). Incubation with streptavidin sepharose beads allowed the deletion of the small

DNA restriction products (v). Cy3-labelled oligonucleotide (23mer) carrying the third code was ligated

(vi). b) Gel- electrophoretic analysis with specific detection method (ethidium bromide, Cy5 and Cy3

fluorescence) monitored the stepwise assembly of DNA-fragments of suitable size. a1) Klenow assisted

polymerization 68mer product; a2) Klenow assisted 68mer product Cy5 coupled; a3) BssSI digestion

product (54mer); a4) full-length (81mer) DNA fragment after ligation with 23mer Cy3 labelled DNA

fragment. *) The band is an artefact resulting from incomplete denaturation. If excised, extracted and

loaded on a gel, this band migrates at the expected height of a double-stranded 81-base DNA fragment.

c) Bacterial cloning and Sanger sequencing of eight different bacterial colonies revealed the identity of

the DNA constructs.

87

4. DISCUSSION We have constructed a high-quality DNA-encoded chemical library containing 4000

compounds (DEL4000). This library was selected for the identification of novel

streptavidin, MMP3 and IgG binders. High-throughput sequencing of the library

before and after selection revealed the preferential enrichment of binding molecules.

In the case of the newly discovered streptavidin binders, we have observed that both

building blocks used for the stepwise synthesis of compounds in the library may

contribute to the resulting binding affinity. For example, we observed a >100-fold

difference in binding affinity between compounds 02-78 and 15-78, with Kd constants

= 385 nM and 78 μM, respectively, in line with their different recovery rates after

streptavidin selection (see Chapter 3.2.1.1 and 3.2.1.2, Figure 3-7 and Figure 3-8).

We have also shown that the encoding strategy followed for the construction of the

DEL-4000 library can be extended, for example by incorporating a third set of

chemical groups and corresponding DNA-coding fragments (see Chapter 3.3.2).

Recent advances in ultra high-throughput DNA sequencing with 454 technology

indicate that it should be possible to sequence over one million sequence tags per

sequencing run.44 Thereby, provided that two orthogonal synthetic procedures are

used which feature high coupling yields and which preserve the integrity of the DNA

molecule, it should be possible to construct, perform selections and decode DNA-

encoded libraries containing millions of chemical compounds.

The potential of using DNA tagging for the identification of binding compounds (e.g.,

in panning experiments) has long been recognized. However in the last few years

research in the field of DNA-encoded chemical libraries has been advanced by the

development of novel methodologies for library construction and decoding. The

recent interest in DNA-encoded chemical libraries is mainly related to the possibility

of constructing libraries of unprecedented size, which can still be screened at low

concentrations for protein binding, thanks to ultra-sensitive DNA detection

experimental procedures, such as the polymerase chain reaction (PCR) and high-

throughput DNA sequencing. In full analogy to antibody phage libraries, DNA-

encoded chemical libraries do not rely on biological assays for the identification of

the binding molecules, but rather on the physical separation of binding molecules

88

from non-binders. Therefore affinity selection with DNA-encoded chemical libraries

as shown in this work can be performed in one reaction tube with standard laboratory

equipment, even with target proteins for which screening assays are not yet available.

While the work presented in this Thesis clearly illustrates the potential of DNA-

encoded chemical libraries, challenges for the further improvements of this

methodology include the improvement of the synthetic procedures, of the encoding

strategies and of the read-out methodologies (i.e., high-throughput sequencing). The

relatively narrow choice of reactions for the conjugation of chemical moieties to DNA

oligonucleotides still represents a limitation, which deserves to be addressed in the

future.

At present, large pharmaceutical companies typically screen a few hundred thousand

compounds in their high-throughput screening campaigns facing enormous challenges

for the preparation, storage and screening of very large libraries of organic molecules,

not only from the synthetic point of view, but also in terms of logistics and analysis.

Furthermore, the costs associated with the identification of specific binding molecules

from a pool of candidates grow exponentially with the size of the chemical library to

be screened. Thus, the combination of large repertoires of organic molecules and

ingenious screening methodologies is recognized as an important approach for

isolating desired binding molecules. For this reason, selections of DNA-encoded

chemical libraries such as the one described in this Thesis may facilitate the

identification of binding molecules (“hits”) for pharmaceutical applications.

Among the selections described in this work, the identification of binders to

polyclonal human IgG appears to have the most direct application. At present,

monoclonal antibodies for therapeutic applications represent the fastest growing

sector of pharmaceutical biotechnology.86 Protein A sepharose, which is used in

virtually all industrial purification procedures for monoclonal antibodies, represents

the largest cost factor for the manufacture of therapeutic antibodies. In consideration

of the substantial costs, these resins are typically regenerated and re-used, which

complicates certain aspects of good manufacture practice. It could be conceivable to

replace protein A-based affinity supports with the affinity purification supports based

on IgG binding molecules, like the ones described in this work.

89

5. Material and Methods

5.1 Reagents and general remarks Unless otherwise denoted, chemical compounds and proteins were from Sigma-

Aldrich-Fluka (Buchs, Switzerland), resin for solid phase synthesis from

Novabiochem (Laufelfingen, Switzerland), enzymes from New England Biolabs

(Ipswich, MA, USA) and HPLC grade lyophilized oligonucleotides were from IBA

GmbH (Göttingen, Germany). SpinX columns were purchased from Corning Costar

Incorporated (Acton, MA, USA) and ion-exchange cartridges for DNA purification

from Qiagen (Hilden, Germany), (PCR purification cat.no 28104, Nucleotides

removal cat.no 28306) and used according to the protocol described by the provider.

NMR spectra were recorded with a Bruker 400 MHz spectrometer, with TMS as the

internal standard. All reactions involving air- and water-sensitive materials were

performed in flame-dried glassware under argon by standard syringe, cannula and

septa techniques. Precoated Merck 60 F254 alumina silica gel sheets were used for

TLC.

5.2 Synthesis of DEL4000 DNA Encoded Library

The individual organic compounds to be coupled to the 5’ amino-modified 42-mer

oligonucleotides were dissolved to a DMSO stock solution (100 mM), occasionally by

further addition of water or diluted hydrochloric acid. All HPLC were performed on

an XTerra Prep RP18 column (5µm, 10x150mm) using a linear gradient from 10% to

40% MeCN in 100 mM TEAA, pH 7. LC-ESI-MS were performed on an XTerra RP18

column (5 µm, 4.6x20 mm) using a linear gradient from 0% to 50% MeOH over 1

min in 400 mM HFIP/5 mM TEA. The mass spectra were measured from m/z 900 to

2000 by a Waters Quattro Micro instrument (Waters, Milford, MA, USA).

Oligonucleotide quantification was performed measuring the absorption at 260 nm

using a NanoDrop instrument (ND-1000 UV-Vis spectrophotometer).

90

5.2.1 Synthesis of library model compounds oligonucleotide conjugate. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds

were added to the respective final concentrations in the order: Fmoc-protected amino

acid (A, see Appendix) DMSO solution, 4 mM; N-hydroxysulfosuccinimide in

DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4

mM; aqueous triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide

aqueous solution, 50 µM, (5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC

TTA GGA CGT GTG TGA ATT GTC-3’). The reaction was stirred overnight at 25

°C; residual activated species were then quenched and simultaneously Fmoc

deprotected by addition of piperidine (500 mM in DMSO). Following HPLC

purification, coupling yield was estimated (see Appendix) and the desired fractions

were dried under reduced pressure and redissolved in 50 µL of water. The recovery

was determined measuring the absorption at 260 nm using a NanoDrop instrument

and the masses of the reacted oligonucleotides detected by LC-ESI-MS (see

Appendix). Subsequently, to a reaction volume of 310 µL, containing 70% (v/v)

DMSO/water, compounds were added to the respective final concentrations in the

order: amino acid (B, see Appendix) DMSO solution, 4 mM; N-

hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-

carbodiimide in DMSO, 4 mM; the resulting compound oligonucleotide-conjugate

aqueous solution, 15 µM; aqueous triethylamine hydrochloride solution, pH 9.0, 80

mM. After overnight stirring at 25 °C, residual activated species were quenched by

addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. The mixture was allowed to

quantitatively precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL

of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation

at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide pellet was

washed with ice-cold 90% (v/v) ethanol and then dissolved in 100 µL water.

Following HPLC, coupling yields on this reaction step was determined (see

Appendix). The desired fractions were dried under reduced pressure and redissolved

in 50 µL of water. The recovery was determined measuring the absorption at 260 nm

using a NanoDrop instrument and the masses of the reacted oligonucleotides detected

by LC-ESI-MS (see Appendix).

91

5.2.2 Coupling reactions of 20 Fmoc-protected amino acids. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds

were added to the respective final concentrations in the order: Fmoc-protected amino

acids DMSO solution, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N-

ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous

triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous

solution, 50 µM, (DEL_O_1_n, 1<n<20 : 5’-amino-C12-GGA GCT TGT GAA TTC

TGG XXX XXX GGA CGT GTG TGA ATT GTC-3’, XXX XXX unambiguously

identifies the individual Fmoc-protected amino acid compound). All coupling

reactions were stirred overnight at 25 °C; residual activated species were then

quenched and simultaneously Fmoc deprotected by addition of piperidine (500 mM in

DMSO). Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to

the reaction mixture. The reactions were then purified by HPLC and the desired

fractions were dried under reduced pressure and redissolved in 100 µL of water and

analyzed by LC-ESI-MS. The samples showed the expected Fmoc-deprotected

products. Typical coupling yields were >51% overall. 4.0 nmol of each DNA-

compound conjugate were pooled to generate a 20 member DNA encoded sub-library.

5.2.3 Coupling reactions of 200 carboxylic acids. To a reaction volume of 310 µL, containing 70%

(v/v) DMSO/water, compounds were added to the respective final concentrations:

DMSO-dissolved carboxylic acid, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10

mM; N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous

triethylamine hydrochloride, pH9.0, 80 mM; DNA-oligonucleotide sub-library pool,

1.5 µM. All coupling reactions were stirred overnight at 25 °C; residual activated

species were then quenched by addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. The

mixture was allowed to quantitatively precipitate by sequential addition of 25 µL of 1

M acetic acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol

followed by 2 h incubation at-23 ºC. The DNA was centrifuged and the resulting

oligonucleotide pellet was washed with ice-cold 90% (v/v) ethanol and then dissolved

in 100 µL water. Test coupling reactions were also performed with the reaction

conditions described above (see Chapter 3.1.2, Table 3-1); using model 42mer 5’-

Fmoc-deprotected amino acid oligonucleotide conjugates and model carboxylic acids.

92

The reactions were analysed by HPLC and the masses of the reacted oligonucleotides

detected by LC-ESI-MS. Typical HPLC coupling yields and recovery on this step

were always >51%.

5.2.4 Polymerase Klenow encoding of 200 carboxylic acids reactions. To a reaction volume of 50 µL, reagents were added to the respective final

concentrations: aqueous solution of the pool of 20 oligonucleotide conjugates coupled

with the specific carboxylic acid (see Chapter 5.2.3) 320 nM, 44mer oligonucleotide

coding oligonucleotide (DEL_O_2_n, 1<n<200: 5'-GTA GTC GGA TCC GAC CAC

XXXX XXXX GAC AAT TCA CAC ACG TCC-3', XXXX XXXX unambiguously

identifies the individual carboxylic acid compound, IBA) 600 nM, Klenow buffer

(NEB, cat.no B7002S), dNTPs (Roche, cat.no 11969064001), 0.5 mM, Klenow

Polymerase enzyme (NEB, cat.no M0210L), 5 units. The Klenow polymerization

reactions were incubated at 37 ºC for 1 h and then purified on ion-exchange cartridges

(Qiagen, cat.no 28306). The 200 purified reactions were dissolved in 50 µL of water,

each, and pooled to generate the 4000 member library (DEL4000) to a final total

oligonucleotide concentration of 300 nM.

5.2.5 Preparation of D-desthiobiotin oligonucleotide-conjugate (positive

control) D-desthiobiotin-oligonucleotide-conjugate was synthesized (DEL_O_1_21: 5’-amino-

C12—GGA GCT TGT GAA TTC TGG ATC GAG GGA CGT GTG TGA ATT

GTC-3’; underlined sequence represent coding sequence) and unambiguously

encoded (DEL_O_2_201: 5'-GTA GTC GGA TCC GAC CAC TTCA CACA GAC

AAT TCA CAC ACG TCC-3'; underlined sequence represent coding sequence ) as

described above. ESI-MS DEL_O_1_21 D-desthiobiotin conjugate: expected: 13572;

measured: 13573.

93

5.3 Library DEL 4000 selections

5.3.1 Streptavidin selection. The resulting library DEL 4000 (total oligonucleotide conjugate concentration 300

nM) was diluted 1:15 in PBS (20nM final concentration), spiked with D-desthiobiotin

oligonucleotide-conjugate (final concentration 1 pM). 50 µL of the library 20nM was

either added to 50 µl streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-

01) or to 50 µl sepharose slurry without streptavidin. Both resins were preincubated

with PBS, 0.3 mg/mL herring sperm DNA. After incubation for 1 h at 25 ºC the

mixture was transferred to a SpinX column, the supernatant was removed, and the

resin washed 4x with 400 µL PBS. After washing, the resin was resuspended in 100

µL water.

5.3.1.1 Identification of binding molecules.

The codes of the oligonucleotide-compound conjugates were amplified by PCR (total

volume 50 µL, 25 cycles of 1 min at 94 ºC, 1 min at 55 ºC, 40 s at 72 ºC) with either 5

µL of 100 fM DEL4000 library before selection as template, or 5 µL of each

resuspended resin after selection as template. The PCR primers DEL_P1_A (5’-GCC

TCC CTC GCG CCA TCA GGG AGC TTG TGA ATT CTG G-3’) and DEL_P2_B

(5’-GCC TTG CCA GCC CGC TCA GGT AGT CGG ATC CGA CCA C-3’)

additionally contain at one extremity a 19 bp domain (underlined) required for high-

throughput sequencing with the 454 Genome Sequencer system. The PCR products

were purified on ion-exchange cartridges. Subsequent high-throughput sequencing

was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform (Sequencing

service by Eurofins MWG GmbH, Ebersberg, Germany). Analyses of the codes from

high-throughput sequencing were performed by an in-house program written in C++.

The frequency of each code has been assigned to each individual pharmacophore.

5.3.1.2 Synthesis of the binding molecules as fluorescein conjugates.

In a polypropylene syringe, 50 mg (46 µmol) of O-bis-(aminoethyl)ethylene glycol

trityl resin (Novabiochem, cat.no 01-64-0235) was suspended in a mixture of the

appropriate Fmoc-protected amino acid (100 µmol, 1 mL), HBTU (Aldrich, 200

µmol, 1 mL), and DIEA (Fluka, 400 µmol, 0.5 mL) in dry DMF. After overnight

94

incubation at 25 °C, the resin was washed 6x with 2 mL dry DMF and the Fmoc

moiety was removed by addition of 1 mL piperidine (50% in dry DMF) for 1 h at 25

°C. After washing 6x with 2 mL dry DMF, the corresponding carboxylic acid (100

µmol, 1 mL DMF) was added and a further amide bond formation reaction was

performed as described above. The resulting product was cleaved by treating the resin

10x with 2 mL TFA (1% in CH2Cl2). The methylenchloride fractions were quenched

in 5 mL NaHCO3 aqsat and the water phase was back extracted 2x with 5 mL CH2Cl2.

The pooled organic phases were washed 3x with water, dried on Na2SO4 and

concentrated in vacuo. The crude product was reacted with 2 equivalents of

fluorescein isothiocyanate (800 µL of DMF) and 200 µL NaHCO3 aqsat in the dark

overnight at 25 °C. Following HPLC purification on an XTerra Prep RP18 column (5

µM, 10x150 mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the

desired fractions were collected and lyophilized. 2 mg of the fluorescein conjugates

were dissolved in DMSO as 5 mM stock solution. ESI-MS analysis confirmed the

mass of the expected FITC conjugate products: 02-78 (C45H49BrN4O10S2) m/z

expected: 949.93 measured: 951.31 [M+H+]; 07-78 (C50H59N5O12S2) m/z expected:

985.36 measured: 986.37 [M+H+]; 15-78 (C44H49N5O12S3) m/z expected: 935.25

measured: 936.25 [M+H+]; 02-107 (C47H47BrN4O9S) m/z expected: 923.87

measured: 925.12 [M+H+]; 13-40 (C46H49N5O14S) m/z expected: 927.30 measured:

928.42 [M+H+]; 11-78 (C49H58N4O13S2) m/z expected: 974.34 measured: 975.41;

17-49 (C45H49IN4O11S) m/z expected: 980.22 measured: 981.29 [M+H+]; 17-78

(C45H49IN4O10S2) m/z expected: 996.19 measured: 997.26; 16-78

(C43H48N4O10S3) m/z expected: 876.25 measured: 877.33; 15-117

(C45H45N5O12S2) m/z expected: 911.25 measured: 912.33; 02-49

(C45H49BrN4O11S) m/z expected: 932.23; measured: 933.32 [M+H+].

5.3.2 Affinity measurements. In a total volume of 60 µL, fluorescein-compound conjugates (500 nM) were

incubated with increasing amounts of streptavidin (from 10 nM to 200 µM, BIOSPA,

cat.no S002-6) or MMP3 (from 33 nM to 40 µM) in PBS, 5% DMSO, for 1 h at 25

ºC. The fluorescence polarization was determined with a TECAN Polarion instrument

by excitation at 485 nm and measuring emission at 535 nm (ε = 72000 M-1cm-1). [All

the curves were fitted applying a formula derived as following: [FC] = [FC]0 - [C] and

95

Kd = ([FC]* [P])/ [C]; substituting and solving for [C]: [C]2-([P]+ [FC]0+Kd)* [C]+

[P]*[FC]0 = 0. The solutions of the quadratic equation are:

considering that only the minus gives a meaningful solution and FP = a*[FC]+b*[C] =

a*[FC]0+(b-a)*[C], the solution of the quadratic equation can be derived in FP and

used in the fitting to determine the dissociation constant:

[FC] = fluorescein compound conjugates total molar concentration; [FC]0 =

fluorescein compound conjugates initial molar concentration (in the experiment 500

nM); [C] = concentration of the complex; [P] = protein total molar concentration; FP

= fluorescence polarization; a,b = proportionality constant; Kd = dissociation

constant]. 5.3.3 Polyclonal human IgG selection. The library DEL4000 (total oligonucleotide conjugate concentration 300 nM) was

diluted 1:15 in PBS (20 nM final concentration). 50µL of the library 20 nM was

added to 50 µl IgG-sepharose slurry. The resin was preincubated with PBS, 0.3

mg/mL herring sperm DNA (Sigma). After incubation for 1 hour at 25 °C the mixture

was transferred to a SpinX column (Corning Costar Incorporated), the supernatant

was removed, and the resin washed four times with 400 µL PBS. After washing, the

resin was resuspended in 100 µL water.

5.3.3.1 Polyclonal human IgG coating of sepharose beads.

100mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was swollen in

500 µL, 1 mM HCl, washed (10 times with 500 µL 1 mM HCl, 3 times with 500 µL

0.1 M NaHCO3aq), and mixed with 2.5 mg/ml polyclonal human IgG (Sigma-Aldrich-

Fluka, Buchs, Switzerland) dissolved in 1.2 mL 0.1 M NaHCO3aq. After 4 hour

incubation at 4°C, the slurry was repeatedly and alternatively washed with 0.1 M

NaHCO3aq 0.1 M Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4

then stored in 1 mL of PBS at 4°C.

5.3.3.2 Identification of human IgG binding molecules.


±

96









was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform. Analyses of

the codes from high-throughput sequencing were performed by an in-house program

written in C++. The frequency of each code has been assigned to each individual

pharmacophore.

5.3.3.3 Synthesis of affinity chromatography resin containing the compound 02-40

or 16-40.




µmol, 1mL), and DIEA (Fluka, 720 µmol, 0.5mL) in dry DMF. After overnight

incubation at 25°C, the resin was washed 6x with 2 mL dry DMF and the Fmoc


°C. After washing 6x with 2 mL dry DMF, 4-(4-(1-hydroxyethyl)-2-methoxy-5-

nitrophenoxy)butanoic acid (40, 54 mg, 180 µmol, 1 mL DMF) was added and a

further amide bond formation reaction was performed as described above. The

resulting product was cleaved by treating the resin with 10x with 2 mL TFA (1% in

CH2Cl2). The dichloromethylene fractions were quenched in 5 mL NaHCO3 aqsat and

the water phase was back extracted 2x with 5mL CH2Cl2. The pooled organic phases

were washed 3 times with water, dried on Na2SO4 and concentrated in vacuo.

Following HPLC purification on an XTerra Prep RP18 column (5µM, 10x150mm)

using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired fractions

were collected and lyophilized. ESI-MS analysis confirmed the mass of the expected

products m/z: [M+H+]; 02-40 (C29H41BrN4O9) m/z expected: 668.21 measured:

669.37 [M+H+]; 16-40 (C27H40N4O9S) m/z expected: 596.25 measured: 597.12

[M+H+]. 200mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was

97

swollen in 1 mM HCl, washed, and mixed in separate tubes with 15µmol of the

compounds dissolved in 2mL 0.1M NaHCO3aq, 10% DMF. After 4 hours incubation

at 25°C, the slurry was repeatedly and alternatively washed with 0.1M NaHCO3aq 0.1

M Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4 then stored in

PBS at 4°C.

5.3.3.4 Polyclonal human IgG Cy5 labeling.

Polyclonal human IgG (Sigma-Aldrich-Fluka, Buchs, Switzerland) was labelled with

Cy5 Mono-reactive kit (Amersham, cat.no PA25001) according to the protocol of the

provider and purified over a PD10 column (GE Healthcare, cat.no 17-0851-01) as

described by the supplier.

5.3.3.5 Biotinylated polyclonal human IgG.

Polyclonal human IgG (Sigma-Aldrich-Fluka, Buchs, Switzerland) was labelled with

NHS-LC-Biotin reagent (Pierce, cat.no 21336) according to the protocol of the

provider and purified over a PD10 column (GE Healthcare, cat.no 17-0851-01) as

described by the supplier.

5.3.3.6 Affinity chromatography of CHO cells supernatant spiked with human IgG

Cy5 labeled or biotinylated human IgG on IgG binding resin.

70mg of the resin containing compound 02-40 or 16-40 were loaded on a

chromatography cartridge (Glen Research, cat.no 20-0030-00) and washed 3 times

with 1mL PBS before loading a CHO cells supernatant (60 µL) spiked with human

IgG Cy5 labeled (40 µL, 9.68 µM) or with biotinylated human IgG (30 µL, 17.2 µM).

The flow-through, the washing fractions (washing 1 time with 10 mL PBS; 1x with 10

mL 500 mM NaCl, 0.5 mM EDTA; 1x with 10 mL 100 mM NaCl, 0.1% Tween 20,

0.5 mM EDTA) and the elutate (elution 3 times with 200 µL aqueous triethylamine

100 mM) were collected and concentrated back to a final volume of 100µL by

centrifugation in a Vivaspin 500 tube (Vivascience, cat.no VS0101, cut-off 10.000

MW). The samples were then analyzed by gel electrophoresis on a NuPAGE 4-12%

Bis-Tris Gel (Invitrogen, cat.no NP0321) using MOPS SDS as running buffer and

stained with Coomassie Blue. Cy5 activity was detected by a Diana III

Chemiluminescence Detection System (Raytest) by excitation at 675 nm and

measuring emission at 694 nm (ε = 250,000 M-1cm-1). Western Blot analysis has been

98

performed transferring the proteins to NC membrane (Millipore, Billerica, MA, USA)

with the Xcell II blot module (Invitrogen) using standard procedures. The membrane

was quickly rinsed with water before soaking them twice in methanol. The membrane

was dried at room temperature for 15 min and incubated for 1 h with 1:500 dilutions

in 4% defatted milk-containing PBS of the following protein: Streptavidin-

horseradish peroxidase conjugate (HRP-Streptavidin, Amersham Biosciences, Little

Chalfont Buckinghamshire, UK, cat.no RPN1231V ). For detection of

immunoreactive bands the membrane was washed three times for 5 min with PBS and

soaked in chemiluminescent reagent (ECL1plus Western Blotting Detection System

from Amersham Biosciences) for 5 sec and exposed to BioMax films (Kodak, Hemel,

UK) in an autoradiographic cassette.

5.3.4 Human MMP3 selection. The library DEL4000 (total oligonucleotide conjugate concentration 300nM) was

diluted 1:15 in PBS (20nM final concentration). 50µL of the library 20nM was added

to 50µl MMP3-sepharose slurry. The resin was preincubated with PBS, 0.3mg/mL

herring sperm DNA (Sigma). After incubation for 1 hour at 25°C the mixture was

transferred to a SpinX column (Corning Costar Incorporated), the supernatant was

removed, and the resin washed 4x with 400µL PBS. After washing, the resin was

resuspended in 100µL water.

5.3.4.1 Human MMP3 coating of sepharose beads.

100mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was swollen in 1

mM HCl, washed, and mixed in separate tubes with 1 mg/ml polyclonal human IgG

(Sigma-Aldrich-Fluka, Buchs, Switzerland) dissolved in. After 4 hour incubation at

4°C, the slurry was repeatedly and alternatively washed with 0.1M NaHCO3aq 0.1 M

Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4 then stored in PBS

at 4°C.

99

5.3.4.2 Identification of human MMP3 binding molecules.










was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform. Analyses of

the codes from high-throughput sequencing were performed by an in-house program

written in C++. The frequency of each code has been assigned to each individual

pharmacophore.

5.3.4.3 Synthesis of the MMP3 binding molecules as fluorescein conjugates.




µmol, 1 mL), and DIEA (Fluka, 400 µmol, 0.5 mL) in dry DMF. After overnight

incubation at 25 °C, the resin was washed 6x with 2 mL dry DMF and the Fmoc


°C. After washing 6 times with 2 mL dry DMF, the corresponding carboxylic acid

(100 µmol, 1 mL DMF) was added and a further amide bond formation reaction was

performed as described above. The resulting product was cleaved by treating the resin

10 times with 2 mL TFA (1% in CH2Cl2). The methylenchloride fractions were

quenched in 5 mL NaHCO3 aqsat and the water phase was back extracted 2 times with

5 mL CH2Cl2. The pooled organic phases were washed 3 times with water, dried on

Na2SO4 and concentrated in vacuo. The crude product was reacted with 2 equivalents

of fluorescein isothiocyanate (800 µL of DMF) and 200 µL NaHCO3 aqsat in the dark

overnight at 25 °C. Following HPLC purification on an XTerra Prep RP18 column (5

µM, 10x150 mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the

desired fractions were collected and lyophilized. 2 mg of the fluorescein conjugates

were dissolved in DMSO as 5 mM stock solution. ESI-MS analysis confirmed the

100

mass of the expected FITC conjugate products: 02-118 (C55H63BrN4O10S) m/z

expected: 1052.08 measured: 1053.30 [M+H+]; 13-17 (C49H55N5O11S) m/z

expected: 921.36 measured: 922.42 [M+H+]; 15-117 (C45H45N5O12S2) m/z

expected: 911.25 measured: 912.33 [M+H+]; 17-104 (C46H43I2N5O10S) m/z

expected: 1111.08 measured: 1112.19 [M+H+]; 18-96 (C49H45BrN4O9S) m/z

expected: 945.87 measured: 947.10 [M+H+].

5.3.5 Computational simulation The simulated distribution of number of codes represented by individual counts,

which are related to the probability that certain counts are experimentally found in a

non-biased mixture of equimolar compounds, was computed using home-written

software. The basic principle used in the simulation relies on the computer-assisted

random generation of numbers corresponding to any of the 4000 compounds in the

library. The repetition of the simulation more than once allows the computation of

fractional values for the number of codes associated to a given "count" value. For

example, a number of code-value of 0.1 corresponds to the observation of a given

"Counts" value in only 1 out of 10 simulations each performed with a total number of

counts equal to the total number of experimental sequences in a given experiment.

5.4 Stepwise coupling by selective deprotection and reaction of di-

amine derivatives.

5.4.1 DNA-compatible cleavage of different amino protective groups.

5.4.1.1 Synthesis of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid (1c).

To 4 mL dioxane/water solution (1:1) of 0.1 mmol of cis-2-

aminocyclopentanecarboxylic acid hydrochloric salt (17 mg) was added 0.1 mmol (1

eq.) of 4-pentenoic N-hydroxy succinimide ester (1e, for the synthesis see Chapter

5.4.1.4) and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir at room

temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and extracted with

ethyl acetate (5 times, 5 mL). The organic phases collected were washed with 10 mL

of brine and dried on Na2SO4. After removing of the solvent under vacuum the crudes

101

were dissolved in 1 mL of dry DMSO and used as such for the coupling to the

oligonucleotide. 1H NMR (400 MHz, CDCl3) δ = 1.73 (m, 2H), 1.82 (m, 1H), 2.10

(m, 3H), 2.32 (m, 2H), 2.41 (m, 2H), 3.15 (m, 1H), 4.55 (m, 1H), 5.10 (m, 2H), 5.82

(m, 1H), 6.37 (br, 1H) ppm. 13C NMR (100 MHz, CDCl3): δ = 22.1, 28.1, 29.6, 31.7,

35.7, 46.3, 52.2, 115.8, 136.7, 173.1, 178.0 ppm. ESI-MS 2-pent-4-enamido-cis-

cyclopentanecarboxylic acid (C11H17NO3) m/z expected: 211.12 measured: 212.07

[M+H+].

5.4.1.2 Synthesis of N-Bpoc cis-2-aminocyclopentanecarboxylic acid (1d).

Cis-2-aminocyclopentanecarboxylic acid hydroclhloric salt (17 mg, 0.1 mmol) was

dissolved in in 1 mL of water and Triton B was added (0.2 mmol, 0.1 mL, d = 0.920

g/mL, 40% MeOH). Following evaporation of MeOH under reduced preassure, DMF

(2 mL) was added to the residue and the suspension evaporated under high vacuum at

50 °C. This procedure was repeated 3 times and 5 mL DMF was then added to the

residue followed by 0.1 mmol of methyl 4-((2-(biphenyl-4-yl)propan-2-

yloxy)carbonyloxy)benzoate (Bpoc carbonate reagent, 1 eq., 39 mg). The suspension

was heated at 50 °C and stirred for 5 h, during which time the solids dissolved.

Afterwards the DMF was removed at 50 °C under reduced pressure and the residue

distributed between water (10 mL) and ether (5 mL). To facilitate the phase separation

the aqueous phase was acidified with citric acid until pH = 4 and then extracted 5

times with 5 mL of ether. The collected ether phases were washed with citric/citrate

aqueous buffer pH =4 2 times 10 mL, with water 2 times 5 mL and dried (Na2SO4).

After removing of the solvent under vacuum the crudes were dissolved in 1 mL of dry

DMSO and used as such for the coupling to the oligonucleotide. 1H NMR (400 MHz,

MeOD) δ = 1.55-1.80 (m, 12H), 2.73 (m, 1H), 3.95 (m, 1H), 7.33 (t, J = 8 Hz, 1H),

7.44 (m, 4H), 7.58 (m, 4H) ppm. 13C NMR (100 MHz, MeOD): δ = 23.2, 24.1, 29.5

(2C), 32.3, 50.8, 54.2, 81.7, 125.9, 127.8, 127.9, 128.3, 129.8, 140.8, 142.1, 147.3,

157.2, 181.7 ppm. ESI-MS N-Bpoc-cis-2-aminocyclopentanecarboxylic acid

(C22H25NO4) m/z expected: 367.18 measured: 390.04 [M+Na+].

5.4.1.3 Synthesis of N-Nvoc cis-2-aminocyclopentanecarboxylic acid (1b).

To 4 mL dioxane/water solution (1:1) of 0.1 mmol of cis-2-

aminocyclopentanecarboxylic acid hydroclhloric salt (17 mg) was added 0.1 mmol (1

eq.) of NvocCl and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir at

102

room temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and extracted

with ethyl acetate (5 times, 5 mL). The organic phases collected were washed with 10

mL of brine and dried on Na2SO4. After removing of the solvent under vacuum the

crudes were dissolved in 1 mL of dry DMSO and used as such for the coupling to the

oligonucleotide. 1H NMR (400 MHz, MeOD) δ = 1.61-1.92 (m, 6H), 2.82 (m, 1H),

3.85 (s, 3H), 3.90 (s, 3H), 3.93 (m, 1H), 5.44 (s, 2H), 7.11 (s, 1H), 7.71 (s, 1H), 7.89

(s, 1H) ppm. 13C NMR (100 MHz, MeOD): δ = 21.1, 25.3, 32.6, 46.1, 52.1, 55.8

61.3, 113.5, 128.9, 132.2, 143.3, 145.2, 159.6, 163.1, 173.8 ppm. ESI-MS confirmed

the mass of the expected products: N-Nvoc-cis-2-aminocyclopentanecarboxylic acid

(C16H20N2O8) m/z expected: 368.12 measured: 369.41 [M+H+].

5.4.1.4 Synthesis of 4-pentenoic N-hydroxy succinimide ester (1e).

N-hydroxysuccinimide (14 mmol, 1.61g) was suspended in CH2Cl2 (10mL) with

diisopropylethylamine (14 mmol). A solution of pent-4-enoyl chloride (13 mmol, d =

1.074 g/ml) in CH2Cl2 (10 mL) was added drop wise in 1 h to the suspension. The

mixture, during which time turned into a yellowish solution, was stirred for further 4

hours and then poured onto water (80 mL) and the water phase extracted with CH2Cl2

(5 times, 5 mL). The organic phase was washed 2 times with 10 mL water, dried

(Na2SO4) and concentrated under reduced pressure.

The crude product as white solid was used as such in the further reaction. 1H NMR

(400 MHz, CDCl3) δ = 2.44 (m, 2H), 2.63 (t, J = 7.9 Hz, 2H), 2.7-2.8 (m, 2H), 5.02

(dd, J1 = 20 Hz, J2 = 2 Hz, 1H), 5.64 (dd, J1 = 24 Hz, J2 = 2 Hz, 1H), 5.77 (m, 1H)

ppm. 13C NMR (100 MHz, CDCl3): δ = 25.6, 28.3, 30.3, 99.1, 116.6, 135.2, 168.0,

169.1 ppm. ESI- MS 4-pentenoic N-hydroxy succinimide ester (C9H11NO4): m/z


5.4.1.5 Synthesis of Nα-Fmoc-Nε-Nvoc-lysine (2).

To 4 mL of dioxane/water solution (1:1) of 0.1 mmol of Nα-Fmoc-lysine was added

0.1 mmol of NvocCl and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir

at room temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and

extracted with ethyl acetate (5 times, 5 mL). The organic phases collected were

washed with 10 mL of brine and dried on Na2SO4. After removing of the solvent

under vacuum the crudes were dissolved in 1 mL of dry DMSO and used as such for

103

the coupling to the oligonucleotide. 1H NMR (400 MHz, d6DMSO) δ = 1.4-1.75 (m,

6H), 3.1 (m, 2H), 3.85 (s, 3H), 3.89 (s, 3H), 3.93 (m, 1H), 4.20-4.29 (m, 3H), 5.44 (s,

2H), 7.17 (s, 1H), 7.33 (t, J =8.9 Hz, 2H), 7.43 (t, J =8.9 Hz, 2H), 7.70 (s, 1H), 7.75

(d, J =8.8 Hz, 2H), 7.88 (d, J =8.9 Hz, 2H). 13C NMR (100 MHz, d6DMSO): δ = 23.6,

29.8, 32.3, 37.9, 46.6, 53.2, 57.2 (2C), 64.3, 66.6, 108.3, 120.1, 125.2, 126.8, 128.1,

129.0, 139.2, 141.3, 144.3, 147.6, 154.4, 156.7, 157.3, 174.5 ppm. ESI-MS N-Fmoc-

N’-Nvoc-lysine (C31H33N3O10) m/z expected: 607.22 measured: 608.34 [M+H+].

5.4.1.6 Oligonucleotide conjugation of Bpoc or Nvoc N-protected cis-2-

aminocyclopentanecarboxylic acid derivatives and Nα-Fmoc Nε-Nvoc-lysine.

To a reaction volume of 300 µL, containing 70% (v/v) DMSO/water were added 5 µL

either of the crude N-protected (Bpoc or Nvoc) cis-2-aminocyclopentanecarboxylic

acid derivative DMSO solution or of the crude Nα-Fmoc-Nε-Nvoc-lysine DMSO

solution and in the order the following compounds to the respective final

concentrations: N-hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-

dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous triethylamine

hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous solution, 50 µM,

(5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC TTA GGA CGT GTG TGA

ATT GTC-3’) and 1. All coupling reactions were stirred overnight at 25 °C; residual

activated species were then quenched by addition of 50 µL Tris-Cl buffer, 500 mM

pH 9.0. Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to

the reaction mixture. The reactions were then purified by HPLC and the desired

fractions were dried under reduced pressure and redissolved in 100 µL of water and

an amount (ca. 1 nmol) analyzed by LC-ESI-MS. ESI-MS N-Nvoc cis-2-

aminocyclopentanecarboxylic olgionucleotide conjugate: expected: 13676 measured:

13678; ESI-MS N-Bpoc cis-2-aminocyclopentanecarboxylic olgionucleotide

conjugate: expected: 13675 measured: 13674; ESI-MS Nα-Fmoc-Nε-Nvoc-lysine

olgionucleotide conjugate: expected: 13915 measured: 13916.

5.4.1.7 Cleavage of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid

oligonucleotide conjugate.

150 μL of a water solution 27 μM of the oligonucleotide conjugate (according to

absorption measurement at 260 nm using a NanoDrop instrument) were added to 150

104

μL of 200 mM I2 solution in THF. After 1h at room temperature the reaction was

quenched with 100 μL of aqueous 1M sodium thiosolfate and purified over HPLC

(yield 80%). The desired fractions were dried under reduced pressure and analyzed by

LC-ESI-MS, revealing the expected product. ESI-MS cis-2-

aminocyclopentanecarboxylic oligonucleotide conjugate: expected: 13437 measured:

13437.

5.4.1.8 Cleavage of N-Bpoc cis-2-aminocyclopentanecarboxylic acid.

oligonucleotide conjugate.

150 μL of a water solution 27 mM of the oligonucleotide conjugate (according to


μL of aqueous AcOH/AcONa pH 3-4 and heated at 35 ºC for 1h. Subsequently the

mixture was directly injected in HPLC (yield 90%). The desired fractions were dried

under reduced pressure and analyzed by LC-ESI-MS, revealing the expected product.

ESI-MS cis-2-aminocyclopentanecarboxylic oligonucleotide conjugate: expected:

13437 measured: 13438.

5.4.1.9 Cleavage of N-Nvoc cis-2-aminocyclopentanecarboxylic acid and Nα-Fmoc

Nε-Nvoc-lysine oligonucleotide conjugate.

150 μL of a water solution 27 mM of the oligonucleotide conjugate (according to


μL of aqueous 2mM AcOH/AcONa pH 4.7 in a pyrex glass vial and irradiated at 366

nm at 4 ºC for 30 min. Subsequently the mixture was directly injected in HPLC

(quantitative conversion). The desired fractions were dried under reduced pressure

and analyzed by LC-ESI-MS, revealing the expected product. Notably, using Nα-

Fmoc, Nε-Nvoc-lysine oligonucleotide conjugate none Fmoc cleavage was observed.

ESI-MS cis-2-aminocyclopentanecarboxylic olgionucleotide conjugate: expected:

13437 measured: 13439. ESI-MS Nε-Nvoc-lysine olgionucleotide conjugate:

expected: 13676 measured: 13679.

105

5.4.2 Synthesis of model scaffolds for Nα-Fmoc, Nε-Nvoc di-amino

carboxylic acid derivative based library.

5.4.2.1 Synthesis of (1R,3R,4R)-methyl 3-azido-4-Boc-amino-

cyclopentanecarboxylate (4).

To a solution of the alcohol 3 (1 mmol, 259 mg) in CH2Cl2 (20 mL) was added

triethylamine (3 mmol, 0.45 mL) and methanesulfonyl chloride (1.6 mmol). The

solution was stirred 45 min and then treated with water (100 mL). The water phase

was extracted with CH2Cl2 (5 times, 25 mL) and the organic extract washed with

brine (2×25 mL), dried (Na2SO4) and concentrated under reduced pressure. The crude

was dissolved in 20 mL of DMF and a solution of sodium azide (20 mL DMF) added.

The suspension was heated at 70 ºC for 8h and then quenched in water (100 mL) and

extracted in ethyl acetate (5 times, 25 mL). Subsequently the organic extract was

washed with brine (2×25 mL), dried (Na2SO4) and concentrated under reduced

pressure prior to use in the further reaction. 1H NMR (400 MHz, MeOD) δ = 1.33 (s,

9H), 1.46 (m, 1H), 1.61 (m, 2H), 2.11 (m, 2H), 2.75 (m, 1H), 3.37 (m, 1H), 3.77 (s,

3H). ESI-MS C12H20N4O4 m/z expected: 284.15 measured: 306.96 [M+Na+].

5.4.2.2 Synthesis of (1S,3R,4R)-methyl 3-amino-4-Boc-amino-


To a stirred suspension of the crude 4 (ca. 1 mmol) and Pd/C (102mg, 10% Pd) in 5

mL MeOH was added an overpressure of H2 for 3h. Subsequently catalyst was filtered

off and the MeOH removed at reduced pressure. The crude was used as such in the

further reaction. 1H NMR (400 MHz, MeOD) δ = 1.38 (s, 9H), 1.47 (m, 1H), 1.59 (m,

2H), 2.26 (m, 2H), 3.10 (m, 1H), 3.57 (m, 1H), 3.77 (s, 3H).ESI-MS C12H22N2O4

m/z expected: 258.16 measured: 259.00 [M+H+].

5.4.2.3 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Boc-amino-


FmocCl (1.25 mmol, 323 mg) and diisopropylethylamine (2.5 mmol, 0.44 mL) were

added to a solution of the crude 5 (ca. 1 mmol) in 25 mL DMF. Following 3h stirring,

the mixture was treated with water (100 mL) and the water phase extracted with ethyl

acetate (5 times, 25 mL) and the organic extract washed with brine (2 times 25 mL),

106

dried (Na2SO4) and concentrated under reduced pressure. The crude was directly used

in the next reaction. 1H NMR (400 MHz, CDCl3) δ = 1.41 (s, 9H), 1.46 (m, 1H), 1.55

(m, 2H), 2.32 (m, 2H), 3.77 (s, 3H), 3.96 (m, 1H), 4.12 (m, 1H), 4.77 (m, 1H) 5.02

(m, 2H), 7.29-7.41 (m, 4H), 7.53-7.89 (m, 4H). [M+H+].ESI-MS C27H32N2O6 m/z


5.4.2.4 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Nvoc-amino-


To water/dioxane 1:1 (20 mL) sospension of the crude of 6 (ca. 1 mmol) was added

10 mL 6 N aqueous HCl and the mixured allowed stirring at 50 ºC for 5h, during

which time the solids dissolved. The solvent was then removed and the residue

dissolved again in water/dioxane 1:1 (10 mL). Following adjustment of the pH = 9 by

Na2CO3, NvocCl (1.5 mmol, 414 mg) was added and the reaction stirred for 3h at

room temperature. The mixture was then treated with water (50 mL) and the water

phase extracted with ethyl acetate (5 times, 10 mL) and the organic extract washed

with brine (2×15 mL), dried (Na2SO4) and concentrated under reduced pressure.

Preparative HPLC were performed on an XTerra Prep RP18 column (5µm,

10x150mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired

fractions were collected and lyophilized (yellowish solid, 73 mg). 1H NMR (400

MHz, MeOD) δ = 1.21 (m, 1H), 1.71 (m, 2H), 2.23 (m, 2H), 2.85 (m, 1H), 2.89 (m,

1H), 3.73 (s, 3H), 3.81 (s, 3H), 3.90-3.92 (m, 2H), 4.3 (m, 1H), 5.14 (d, J =20, 1H),

5.41 (d, J =20, 1H), 7.13-7.63 (m, 10H). ESI-MS C31H31N3O10 m/z expected:

605.20 measured: 606.02 [M+H+].

5.5 Stepwise encoding Gel electrophoresis was performed either using 15% Tris-Borate-EDTA-Urea

denaturing polyacrylamide gels (TBE-Urea, Invitrogen, cat.no EC68852) or 20%

Tris-Borate-EDTA native polyacrylamide gels (TBE, Invitrogen, cat.no EC63152)

and stained with SYBRgreenII. DNA Ethanol precipitation of DNA was performed by

adding 1/10 volumes of 3M AcOH/AcONa buffer pH 4.7, and 3 volumes of ethanol

relative to the volume of the DNA sample. After 2h incubation at -23 ºC the mixture

was centrifuged in a table-top centrifuge for 40min (16.000g) at 4 ºC, the supernatant

107

removed and the pellet washed with 300 µL ice-cold ethanol 90%. After a further 20

min centrifugation (16.000g) at 4 ºC, the pellet was dried and redissolved in water.

5.5.1 Stepwise encoding by Ligation.

Hybridization of 3 pairs (A, B, C) of oligonucleotides (A: 5’-CAT GGA ATT CGC

TCA CTC CGA CTA GAG G-3’ and 5’-(Phosphate)-CGT ACC TCT AGT CGG

AGT GAG CGA ATT CCA TG-3’; B: 5’-(Phosphate)-TAC GTG AGC TTG ACC

TGG TGA G-3’ and 5’- (Phosphate)-GCT TCT CAC CAG GTC AAG CTC A-3’; C:

5’-(Phosphate)-AAG CAC GTT CGC TGG ATC CTC AAC TGT G-3’ and 5’-CAC

AGT TGA GGA TCC AGC GAA CGT-3’; underlined sequences represent coding

sequences) was carried out by mixing the oligonucleotides at a concentration of 1.25

µM per oligonucleotide in 1x ligase buffer (40 mM Tris-HCl, 10 mM MgCl2, 10 mM

DTT, 0.5 mM ATP, pH 7.8) and incubating the mixtures for 10 minutes at 50 °C.

Subsequently the ligations were performed mixing 10 µl of hybridized

oligonucleotide pairs A and B with 10 µl of 1x ligase buffer and 1 µL of T4 ligase

(Roche Applied Science, Basel, Switzerland), and incubated at 25 ºC for 2 hours. The

ligation product was purified using a Qiagen Nucleotide Removal Kit, and eluted with

50 µl of 10 mM Tris-HCl pH 8.0. 18 µl of the eluate was mixed with 10 µl of

hybridized oligonucleotide pair C (which was present in excess), 2 µl of 10x ligase

buffer, and 1 µl of T4 ligase, and incubated for 2 hours at 25 ºC. Aliquots of the two

starting oligonucleotides, and the different ligation products were subjected to

electrophoresis on a 20% TBE gel.

5.5.2 Stepwise encoding by a combination of Klenow polymerase and Ligation.

To a reaction volume of 50 µL, reagents were added to the respective final

concentrations: a 42mer 5’-amino-C12-DNA-oligonucleotide (5’-GGA GCT TGT

GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’), 2 µM, a 42mer 5’-

C6-biotinylated-oligonucleotide containing the non-palindromic BssSI restriction site

(in boldface type) (5‘-GTA GTC GGA CAC GAG TAC TGG TAA TCG ACA ATT

CAC ACA CGT CC–3‘; underlined sequences represent coding sequences), 3 µM,

klenow buffer, dNTPs (Roche, cat.no 11969064001), 0.5 mM, and Klenow

Polymerase enzyme, 5 units. After incubation at 37 ºC for 1 h, the reaction mixture

was purified on ion-exchange cartridge and eluted in 25 µL of water. 8 units of BssSI

enzyme were added to the purified Klenow product in 50 µL of BssSI restriction

108

buffer. The restriction cutting reaction was carried out at 37 ºC for 1.5 h. 50 µL of

streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-01) was added and the

slurry was incubated for 30 min at 4 ºC. After SpinX column centrifugation, the

supernatant was collected and purified on ion-exchange cartridge and eluted in 25 µL

of water. Subsequently were added the following reagents to the final volume of 50

µL: preincubated mixture 1:1 of hybridized oligonucleotides (27mer 5`-phosphate-

TCG TGA AAT TTG CTA GGA TCC ATA TTG–3` and 23mer 5`-CAA TAT GGA

TCC TAG CAA ATT TC–3`), 3 µM, T4 ligase buffer (Roche Applied Science, Basel,

Switzerland) and T4 ligase (Roche Applied Science, Basel, Switzerland), 4 units. The

ligation was performed overnight at 16ºC then purified on ion-exchange cartridge.

Aliquots of the starting oligonucleotides, and the different Klenow, restriction and

ligation products were analyzed on a 15% TBE-Urea gel. Sequencing of the excised

band after three stepwise encoding confirmed the identity of the expected product.

5.5.3 Stepwise encoding by Klenow Polymerase.

To a reaction volume of 50 µL, reagents were added to the respective final

concentrations: a 42mer 5’-amino-C12-DNA-oligonucleotide (5’-GGA GCT TGT

GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’), 2 µM, 42mer 3’-

C6-biotinylated-oligonucleotide (5’-GTA GTC GGA TCC GAC CAC GTT CCT

GAC AAT TCA CAC ACG TCC-3’; underlined sequences represent coding

sequences), 3 µM, Klenow buffer, dNTPs (Roche, cat.no 11969064001), 0.5mM,

Klenow Polymerase enzyme, 5 units. The Klenow polymerization reaction was

incubated at 37 ºC for 1 h, purified on ion-exchange cartridge and eluted in 100 µL of

4 M urea. After incubating at 94 ºC for 2 min, 50 µl of streptavidin-sepharose slurry

(GE Healthcare, cat.no 17-5113-01) were added and the slurry was incubated for 1 h

at 4 ºC. The streptavidin sepharose resin and the supernatant were separated by

centrifugation in a SpinX column. The DNA in the supernatant was ethanol

precipitated as described above. The resulting single-stranded oligonucleotide was

mixed with a 42mer unmodified DNA oligonucleotide (5’-GTC GTA TCG CCA

TGG TCC AAC ATC GTA GTC GGA GAG GAC CAC-3’) and a Klenow

polymerization reaction was performed as described above. Aliquots of the three

starting oligonucleotides, and the different Klenow products were applied on a 15%

TBE-Urea gel.

109

5.5.4 Stepwise coupling and encoding of model compound for Nα-Fmoc, Nε-Nvoc

di-amino carboxylic acid derivative based library.

To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds

were added to the respective final concentrations in the order: N-Fmoc-N’-Nvoc-

lysine (2) DMSO solution, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N-

ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous

triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous

solution, 100 µM, (5’-amino-C12-GGA GCT TGT GAA TTC TGG GTT AGT GGA

CGT GTG TGA ATT GTC-3’, underlined sequence represent coding sequence). The

reaction was stirred overnight at 25 °C; residual activated species were then quenched

and simultaneously Fmoc deprotected by addition of piperidine (500 mM in DMSO).

Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to the

reaction mixture. The reaction was then purified by HPLC and the desired fractions

were dried under reduced pressure and redissolved in 100 µL of water and analyzed

by LC-ESI-MS. The sample showed the expected Fmoc-deprotected product N’-

Nvoc-lysine DNA-oligonucleotide conjugate. Subsequently a further peptide forming

reaction step was performed. Therefore to a final volume of 310 µL, containing 70%

(v/v) DMSO/water, the following compounds were added to the respective final

concentrations: 3-p-tolylpropanoic acid DMSO solution, 4 mM; N-

hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-

carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride, pH9.0, 80 mM;

50 µM N’-Nvoc-lysine DNA-oligonucleotide conjugate. The reaction was stirred

overnight at 25 °C; residual activated species were then quenched by addition of 50

µL Tris-Cl buffer, 500 mM pH 9.0. The mixture was allowed to quantitatively

precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL of 3 M sodium

acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation at-23 ºC. The

DNA was centrifuged and the resulting oligonucleotide pellet was washed with ice-

cold 90% (v/v) ethanol and then dissolved in 300 µL of aqueous 1mM AcOH/AcONa

pH 4.7 in a pyrex glass vial. Following irradiation at 366 nm at 4 ºC for 30 min, an

aliquot of the mixture was injected in HPLC and the desired fractions analyzed by

LC-ESI-MS, revealing the expected product (3-p-tolylpropanoyl)-lysine

oligonucleotide conjugate. The encoding of the 3-p-tolylpropanoyl moiety was

achieved adding the following reagents to a final volume of 50 µL to the respective

110

final concentrations: (3-p-tolylpropanoyl)-lysine oligonucleotide conjugate (5’-GGA

GCT TGT GAA TTC TGG GTT AGT GGA CGT GTG TGA ATT GTC-3’,

underlined sequence represent coding sequence), 2 µM, a 42mer 5’-C6-biotinylated-

oligonucleotide containing the non-palindromic BssSI restriction site (in boldface

type) (5‘-GTA GTC GGA CAC GAG TAC TGG TAA TCG ACA ATT CAC ACA

CGT CC–3‘; underlined sequences represent coding sequences), 3 µM, klenow

buffer, dNTPs (Roche, cat.no 11969064001), 0.5 mM, and Klenow Polymerase

enzyme, 5 units. After incubation at 37 ºC for 1 h, the reaction mixture was purified

on ion-exchange cartridge and eluted in 50 µL of water. Subsequently the encoded 40

µL (3-p-tolylpropanoyl)-lysine oligonucleotide conjugate was coupled to Cy5 by

addition of Cy5-NHS ester (Amersham, cat.no PA25001) and aqueous triethylamine

hydrochloride solution, pH 9.0, to a final concentration of 4 mM and 80 mM

respectively. The reaction was stirred overnight at 25 °C. The mixture was then

allowed to quantitatively precipitate by sequential addition of 25 µL of 1 M acetic

acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2

h incubation at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide

pellet was washed with ice-cold 90% (v/v) ethanol. Ultimately, the encoding of the

Cy5 moiety was performed. Therefore 8 units of BssSI enzyme were added to the N’-

Cy5-N-(3-p-tolylpropanoyl)-lysine oligonucleotide conjugate in 50 µL of BssSI

restriction buffer. The restriction cutting reaction was carried out at 37 ºC for 1.5 h. 50

µL of streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-01) was added

and the slurry incubated for 30 min at 4 ºC. After SpinX column centrifugation, the

supernatant was collected and purified on ion-exchange cartridge and eluted in 25 µL

of water. Subsequently were added the following reagents to the final volume of 50

µL: preincubated mixture 1:1 of hybridized oligonucleotides (27mer 5`-phosphate-

TCG TGA AAT TTG CTA GGA TCC ATA TTG–3` and 23mer 5`-CAA TAT GGA

TCC TAG CAA ATT TC–3`, underlined sequences represent coding sequences), 3

µM, T4 ligase buffer (Roche Applied Science, Basel, Switzerland) and T4 ligase

(Roche Applied Science, Basel, Switzerland), 4 units. The ligation was performed

overnight at 16ºC then purified on ion-exchange cartridge. Aliquots of the starting

oligonucleotides, and the different Klenow, restriction and ligation products were

analyzed on a 15% TBE-Urea gel. Sequencing of the excised band after three

stepwise encoding confirmed the identity of the expected product.

111

5.5.5 Bacterial cloning and sequencing.

Following gel polyacrylamide electrophoresis the band of interest was excised,

extracted in aqueous TrisCl 10 mM and PCR amplified using the following

oligonucleotides as primer: DEL_P1 (5’-GGA GCT TGT GAA TTC TGG-3’,

underlined EcorI restriction site) and DEL_P2 (5’-GTA GTC GGA TCC GAC CAC-

3’, underlined BamHI restriction site). The PCR products were purified on ion-

exchange cartridges and cloned in pUC19 vector using the restriction sites EcorI and

BamHI and electroporated in TG1 bacteria. Sequencing of the vector in a number of

colonies was performed using an ABI PRISM 3130 Genetic Analyzer (Applied

Biosystem).

112

6. REFERENCES 1 E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin, K.

Devon, K. Dewar and M. Doyle et al., Initial sequencing and analysis of the

human genome, Nature 409 (2001),860–921.

2 J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton,

H.O. Smith, M. Yandell and C.A. Evans et al., The sequence of the human

genome, Science 291 (2001), 1304–1351.

3 S.D. Patterson and R.H. Aebersold, Proteomics: the first decade and beyond,

Nat. Genet. 33 (2003) (Suppl.), 311–323.

4 Robert L. Strausberg and Stuart L. Schreiber, From Knowing to Controlling:

A Path from Genomics to Drugs Using Small Molecule Probes, Science 300

(2003), 294-295.

5 Stoughton RB, Applications of DNA Microarrays in Biology, Annu Rev

Biochem 74 (2005), 53-82

6 Drews J. Drug discovery: a historical perspective. Science 287 (2000), 1960-

1964

7 A. Furka, F. Sebestyen, M. Asgedom and G. Dibo, General method for rapid

synthesis of multicomponent peptide mixtures, Int. J. Pept. Protein Res. 37

(1991), 487–493.

8 R.A. Houghten, C. Pinilla, S.E. Blondelle, J.R. Appel, C.T. Dooley and J.H.

Cuervo, Generation and use of synthetic peptide combinatorial libraries for

basic research and drug discovery, Nature 354 (1991), 84–86.

9 K.S. Lam, S.E. Salmon, E.M. Hersh, V.J. Hruby, W.M. Kazmierski and R.J.

Knapp, A new type of synthetic peptide library for identifying ligand-binding

activity, Nature 354 (1991), 82–84.

113

10 R.B. Merrifield, Solid phase peptide synthesis. I. The synthesis of a

tetrapeptide, J. Am. Chem. Soc. 85 (1963), 2149–2154.

11 R. Frank, W. Heikens, G. Heisterberg-Moutsis and H. Blocker, A new general

approach for the simultaneous chemical synthesis of large numbers of

oligonucleotides: segmental solid supports, Nucl. Acids Res. 11 (1983), 4365–

4377.

12 R.A. Houghten, General method for the rapid solid-phase synthesis of large

numbers of peptides: specificity of antigen-antibody interaction at the level of

individual amino acids, Proc. Natl. Acad. Sci. U. S. A 82 (1985), 5131–5135.

13 G.P. Smith, Filamentous fusion phage: novel expression vectors that display

cloned antigens on the virion surface, Science 228 (1985), 1315–1317.

14 T. Clackson, H.R. Hoogenboom, A.D. Griffiths and G. Winter, Making

antibody fragments using phage display libraries, Nature 352 (1991), 624–

628.

15 E.T. Boder and K.D. Wittrup, Yeast surface display for screening

combinatorial polypeptide libraries, Nat. Biotechnol. 15 (1997), 553–557.

16 J. Hanes and A. Pluckthun, In vitro selection and evolution of functional

proteins by using ribosome display, Proc. Natl. Acad. Sci. U. S. A 94 (1997),

4937–4942.

17 J. Bertschinger, D. Grabulovsky, D. Neri, Selection of single domain binding

proteins by covalent DNA display, Protein Eng Des Sel 20 (2007), 57-68.

18 S. Brenner and R.A. Lerner, Encoded combinatorial chemistry, Proc. Natl.

Acad. Sci. U. S. A. 89 (1992), 5381–5383.

114

19 J. Nielsen, S. Brenner and K.D. Janda, Synthetic methods for the

implementation of encoded combinatorial chemistry, J. Am. Chem. Soc. 115

(1993), 9812–9813.

20 M.C. Needels, D.G. Jones, E.H. Tate, G.L. Heinkel, L.M. Kochersperger, W.J.

Dower, R.W. Barrett and M.A. Gallop, Generation and screening of an

oligonucleotide-encoded synthetic peptide library, Proc. Natl. Acad. Sci. U. S.

A. 90 (1993), 10700–10704.

21 Meo T, Gramsch C, Inan R, Hollt V, Weber E, Herz A, Riethmuller G,

Monoclonal antibody to the message sequence Tyr-Gly-Gly-Phe of opioid

peptides exhibits the specificity requirements of mammalian opioid receptors,

Proc. Natl. Acad. Sci. U. S. A. 80 (1983), 4084-4088.

22 Mukund S. Chorghade, Drug Discovery and Development - Combinatorial

Chemistry in the Drug Discovery Process ISBN: 9780471398486 Ed. 2006

John Wiley & Sons, Inc., 129-167

23 K. FitzGerald, In vitro display technologies – new tools for drug discovery,

Drug Discov. Today 5 (2000), 253–258.

24 Pedersen, H., Gouilaev, A.H., Sams, K.C., Slok, F.A., Freskgard, P.-O.,

Holtmann, A., Kampmann Olsen, E., Husemoen Gitte, N., Felding, J., et al.,

2002. Methods for template-directed synthesis of and modification of

polymers and screening for desired activity. WO02103008.

25 Pedersen, H., Holtmann, A., Franch, T., Gouliaev, A.H., Felding, J., 2003.

Methods for template-directed synthesis of and modification of polymeric

libraries and their use in screening for biological activity. WO03078625.

26 Freskgard, P.-O., Franch, T., Gouliaev, A.H., Lundorf, M.D., Felding, J.,

Olsen, E.K., Holtmann, A., Jakobsen, S.N., Sams, C., et al., 2004.

Bifunctional substances and their use in preparation and enzyme-based

encoding of combinatorial libraries. WO2004039825.

115

27 Morgan, B., Hale, S., Arico-Muendel, C.C., Clark, M., Wagner, R., Israel,

D.I., Gefter, M.L., Benjamin, D., Hansen, N.J.V., et al., 2004. Methods and

building blocks for synthesis of combinatorial libraries of mols. comprising

functional moieties operatively linked to encoding oligonucleotides.

WO2005058479.

28 D.R. Halpin and P.B. Harbury, DNA display I. Sequence-encoded routing of

DNA populations, PLoS Biol. 2 (2004), 1015–1021.

29 D.R. Halpin and P.B. Harbury, DNA display. II. Genetic manipulation of

combinatorial chemistry libraries for small-molecule evolution, PLoS Biol. 2

(2004), 1022–1030.

30 T. Meo, C. Gramsch, R. Inan, V. Hollt, E. Weber, A. Herz and G. Riethmuller,

Monoclonal antibody to the message sequence Tyr-Gly-Gly-Phe of opioid

peptides exhibits the specificity requirements of mammalian opioid receptors,

Proc. Natl. Acad. Sci. U. S. A 80 (1983),. 4084–4088.

31 D.R. Halpin, J.A. Lee, S.J. Wrenn and P.B. Harbury, DNA display III. Solid-

phase organic synthesis on unprotected DNA, PLoS Biol. 2 (2004), 1031–

1038.

32 C.A. Lipinski, F. Lombardo, B.W. Dominy and P.J. Feeney, Experimental and

computational approaches to estimate solubility and permeability in drug

discovery and development settings, Adv. Drug Deliv. Rev. 46 (2001), 3–26.

33 Z.J. Gartner and D.R. Liu, The generality of DNA-templated synthesis as a

basis for evolving non-natural small molecules, J. Am. Chem. Soc. 123 (2001),

6961–6963.

34 C.T. Calderone, J.W. Puckett, Z.J. Gartner and D.R. Liu, Directing otherwise

incompatible reactions in a single solution by using DNA-templated organic

synthesis, Angew Chem., Int. Ed. Engl. 41 (2002), 4104–4108.

116

35 D. Summerer and A. Marx, DNA-templated synthesis: more versatile than

expected, Angew Chem., Int. Ed. Engl. 41 (2002), 89–90.

36 X. Li and D.R. Liu, DNA-templated organic synthesis: nature's strategy for

controlling chemical reactivity applied to synthetic molecules, Angew Chem.,

Int. Ed. Engl. 43 (2004), 4848–4870.

37 M.W. Kanan, M.M. Rozenman, K. Sakurai, T.M. Snyder and D.R. Liu,

Reaction discovery enabled by DNA-templated synthesis and in vitro

selection, Nature 431 (2004), 545–549.

38 Z.J. Gartner, M.W. Kanan and D.R. Liu, Multistep small-molecule synthesis

programmed by DNA templates, J. Am. Chem. Soc. 124 (2002), 10304–10306.

39 T.M. Snyder and D.R. Liu, Ordered multistep synthesis in a single solution

directed by DNA templates., Angew Chem., Int. Ed. Engl. 44 (2005), 7379–

7382.

40 J.B. Doyon, T.M. Snyder and D.R. Liu, Highly sensitive in vitro selections for

DNA-linked synthetic small molecules with protein binding affinity and

specificity, J. Am. Chem. Soc. 125 (2003), 12372–12373.

41 Z.J. Gartner, B.N. Tse, R. Grubina, J.B. Doyon, T.M. Snyder and D.R. Liu,

DNA-templated organic synthesis and selection of a library of macrocycles,

Science 305 (2004), 1601–1605.

42 Mannocci L., Zhang Y., Scheuermann J., Leimbacher M., De Bellis G., Rizzi

E., Dumelin C., Melkko S, Neri D., High-throughput sequencing allows the

identification of binding molecules isolated from DNA-encoded chemical

libraries, Proc. Natl. Acad. Sci. U. S. A. 105(46), (2008), 17670-17675.

43 Buller F., Mannocci L., Zhang Y., Dumelin C.E., Scheuermann J., Neri D.,

Design and synthesis of a novel DNA-encoded chemical library using Diels-

Alder cycloadditions, Bioorg Med. Chem. Lett. 18(22), (2008), 5926-5931.

117

44 Margulies M., et al., Genome sequencing in microfabricated high-density

picolitre reactors. Nature, 437(7057) (2005), 376-80.

45 S.C. Schuster, Nat. Methods 5 (1) (2008), 16-18.

46 K. Hoogsteen, The crystal and molecular structure of a hydrogen-bonded

complex between 1-methylthymine and 9-methyladenine. Acta

Crystallographica 16 (1963), 907-916.

47 S. Melkko, J. Scheuermann, C.E. Dumelin and D. Neri, Encoded self-

assembling chemical libraries, Nat. Biotechnol. 22 (2004), 568–574.

48 Cheng Y.K., Pettitt B.M., Stabilities of double- and triple-strand helical

nucleic acids. Prog Biophys Mol Biol. 58(3) (1992), 225-257.

49 Aich P., Ritchie S., Bonham K., Lee J.S., Thermodynamic and kinetic studies

of the formation of triple helices between purine-rich deoxyribo-

oligonucleotides and the promoter region of the human c-src proto-oncogene.

Nucleic Acids Res. 26(18) (1998), 4173-4177.

50 S. Melkko, C.E. Dumelin, J. Scheuermann, D. Neri, On the magnitude of the

chelate effect for the recognition of proteins by pharmacophores scaffolded by

self-assembling oligonucleotides. Chem Biol. 13(2) (2006), 225-231.

51 M. Lovrinovic and C.M. Niemeyer, DNA microarrays as decoding tools in

combinatorial chemistry and chemical biology, Angew Chem., Int. Ed. Engl.

44 (2005), 3179–3183.

52 M. Uttamchandani, D.P. Walsh, S.Q. Yao and Y.T. Chang, Small molecule

microarrays: recent advances and applications, Curr. Opin. Chem. Biol. 9

(2005), 4–13.

53 S. Melkko, J. Sobek, G. Guarda, J. Scheuermann, C.E. Dumelin and D. Neri,

Encoded self-assembling chemical libraries, Chimia 59 (2005), 798–802.

118

54 Dumelin C.E., Scheuermann J., Melkko S., Neri D., Selection of streptavidin

binders from a DNA-encoded chemical library. Bioconjug Chem., 17(2)

(2006), 366-70.

55 Dumelin CE, Trüssel S, Buller F, Trachsel E, Bootz F, Zhang Y, Mannocci L,

Beck SC, Drumea-Mirancea M, Seeliger MW, Baltes C, Müggler T, Kranz F,

Rudin M, Melkko S, Scheuermann J, Neri D. A portable albumin binder from

a DNA-encoded chemical library. Angew Chem Int Ed Engl. 47(17) (2008);

3196-201.

56 Melkko S, Zhang Y, Dumelin CE, Scheuermann J, Neri D., Isolation of high-

affinity trypsin inhibitors from a DNA-encoded chemical library. Angew Chem

Int Ed Engl. 46(25) (2007), 4671-4674.

57 Scheuermann J, Dumelin CE, Melkko S, Zhang Y, Mannocci L, Jaggi M,

Sobek J, Neri D., DNA-Encoded Chemical Libraries for the Discovery of

MMP-3 Inhibitors. Bioconjug Chem. 19(3) (2008), 778-785.

58 Melkko S., Neri D., 2002 Encoded Self-Assembling Chemical Libraries,

WO/2003/076943

59 Michael J. Heller, DNA-microarray Technology: Devices, Systems, and

Applications. Annu. Rev. Biomed. Eng., 4 (2002). 129–153.

60 Southern, E.M., Detection of specific sequences among DNA fragments

separated by gel electrophoresis. J Mol Biol., 98 (1975), 503-517.

61 Kulesh D.A., Clive D.R., Zarlenga D.S., Greene J.J., Identification of

interferon-modulated proliferation-related cDNA sequences. Proc Natl Acad

Sci USA, 84 (1987), 8453–8457.

62 Schena M., Shalon D., Davis R.W., Brown P.O., Quantitative monitoring of

gene expression patterns with a complementary DNA microarray. Science 270

(1995), 467–470.

119

63 Lashkari D.A., DeRisi J.L., McCusker J.H., Namath A.F., Gentile C., Hwang

S.Y., Brown P.O., Davis R.W., Yeast microarrays for genome wide parallel

genetic and gene expression analysis. Proc Natl Acad Sci USA 94 (1997),

13057–13062.

64 Shendure, J., Mitra, R.D., Varma, C., Church G.M., Advanced sequencing

technologies: methods and goals. Nat. Rev. Genet. 5 (2004), 335–344.

65 Sanger, F. , Nicklen, S. & Coulson, A. R. DNA sequencing with chain-

terminating inhibitors. Proc. Natl Acad. Sci. USA, 74 (1977), 5463–5467.

66 Prober, J. M. et al. A system for rapid DNA sequencing with fluorescent

chain-terminating dideoxynucleotides. Science, 238 (1987), 336–341.

67 Nyren, P., Pettersson, B. & Uhlen, M. Solid phase DNA minisequencing by an

enzymatic luminometric inorganic pyrophosphate detection assay. Anal.

Biochem. 208 (1993), 171–175.

68 Ronaghi, M. et al. Real-time DNA sequencing using detection of

pyrophosphate release. Anal. Biochem. 242 (1996), 84–89.

69 Jacobson, K. B. et al. Applications of mass spectrometry to DNA sequencing.

GATA 8 (1991), 223–229.

70 Bains, W. & Smith, G. C. A novel method for nucleic acid sequence

determination. J. Theor. Biol. 135 (1988), 303–307.

71 Jett, J. H. et al. High-speed DNA sequencing: an approach based upon

fluorescence detection of single molecules. Biomol. Struct. Dynam. 7 (1989),

301–309.

72 M. Margulies, M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bemben,

J. Berka, M.S. Braverman and Y.J. Chen et al. Genome sequencing in

microfabricated high-density picolitre reactors, Nature 437 (2005), 376–380.

120

73 J. Shendure, G. J. Porreca, N.B. Reppas, X. Lin, J.P. McCutcheon, A.M.

Rosenbaum, M. D. Wang, K. Zhang, R.D. Mitra, G.M. Church. Accurate

Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science,

309(5741) (2005), 1728 – 1732.

74 http://solid.appliedbiosystems.com/ - Applied Biosystems' SOLiD technology.

75 http://www.illumina.com/

76 Braslavsky I., Hebert H., Kartalov E., Quake S.R.. Sequence information can

be obtained from single DNA molecules. Proc. Natl Acad. Sci. USA, 100

(2003), 3960–3964.

77 M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, P. Nyren. Real-time

DNA sequencing using detection of pyrophosphate release. Anal. Biochem.,

242 (1996), 84-89.

78 S. C. Macevicz, US Patent 5750341, filed 1995

79 Droege, M., The Genome Sequencer FLXTM System – Longer reads, More

applications, Straight forward bioinformatics & More Complete Data Sets. J.

Biotechnol. (2007), in press.

80 http://www.helicosbio.com/

81 Mitra R.D., Shendure J., Olejnik J., Olejnik E.K., Church G.M., Fluorescent in

situ sequencing on polymerase colonies. Anal. Biochem., 320(1) (2003), 55-

65.

82 Huber C.G., Oberacher H., Analysis of nucleic acids by on-line liquid

chromatography-mass spectrometry. Mass Spectrom Rev. 20(5) (2001), 310-

343.

121

83 Klenow H., Henningsen I., Selective Elimination of the Exonuclease Activity

of the Deoxyribonucleic Acid Polymerase from Escherichia coli B by Limited

Proteolysis. Proc Natl Acad Sci 65 (1970), 168–175.

84 Silacci M., Brack S., Schirru G., Mårlind J., Ettorre A., Merlo A., Viti F., Neri

D., Design, construction, and characterization of a large synthetic human

antibody phage display library. Proteomics, 5(9) (2005), 2340–2350.

85 Janeway, Travers, Walport, Shlomchik, Immunobiology, 6th Ed. (2005)

Churchill Livingstone.

86 Walsh G., Biopharmaceutical benchmarks 2006. Nature Biotechnology, 24(7)

(2006), 769-776.

87 Liotta L. A. et al. Metastatic potential correlates with enzymatic degradation

of basement membrane collagen. Nature, 284 (1980), 67–68.

88 Brinckerhoff C. E., Matrisian, L. M., Matrix metalloproteinases: a tail of a

frog that became a prince. Nature Rev. Mol. Cell Biol. 3 (2002), 207–214.

89 Coussens L. M., Fingleton B., Matrisian, L. M., Matrix metalloproteinase

inhibitors and cancer: trials and tribulations. Science 295, 2387–2392 (2002).

90 Egeblad M., Werb Z., New functions for the matrix metalloproteinases in

cancer progression. Nature Rev. Cancer. 2, (2002), 163–175.

122

7. Curriculum Vitae

Luca Mannocci

Wolfgang-Paulistrasse 10

ETH Zürich, HCI G398

CH-8093 Zürich

Switzerland

Tel.: +41 44 63 37 453

Fax.: +41 44 63 31 358

[email protected]

Personal Details

Name Luca Mannocci

Date of birth 07th of September 1979

Citizen Zürich

Nationality Italian

Civil state Unmarried

Address: Kolbenacker 34

8052 Zürich

Switzerland

Tel.: +41 43 44 39 900

Mobile: +41 76 43 76 485

123

Education

2005 – 2008

ETH Swiss Federal Institute of Technology

Zürich, Switzerland. Ph.D. in Sciences

2004 Italian State exam for the habilitation to

chemistry profession

1998 - 2004 Università degli Studi di Pisa Pisa, Italy.

Degree in Chemistry (Mark: 110/110 e lode).

1998

Liceo Scientifico Statale “Ulisse Dini”

(Scientific Lyceum), Pisa, Italy. High school

diploma (Mark: 60/60).

124

Research Experience

SWISS FEDERAL INSTITUTE OF TECHNOLOGY (ETH)

2005-2008 Doctoral student, Institute of Pharmaceutical Sciences.

Ph.D. Thesis “DNA-encoded Chemical Libraries”

Advisor: Prof. Dr. Dario Neri

UNIVERSITY OF PISA

2004-2005 Internship, Organic Chemistry Division of the Department of Chemistry

at the University of Pisa

Silver(I)-catalysed

protiodesilylation

Advisor: Prof. Adriano Carpita and

Prof. Renzo Rossi

Collaboration for natural science book

publication

Prof. R. Rossi, Prof. A. Carpita, Prof.

F. Bellina, “Sostanze organiche

naturali e loro derivati da analoghi

strutturali con proprietà

antineoplastiche”, (2005), Ed. Plus.

(“Natural occurring substances and

structural analogues with anti-

neoplastic properties”).

125

2002-2004 Training in the Organic Chemistry Division of the Department of

Chemistry at the University of Pisa

Diploma Thesis

“La prima sintesi totale del (-)-

nitidone, una sostanza naturale con

proprietà antitumorali, e del suo

enantiomero” (“First total synthesis of

naturally-occurring (-)-nitidon and its

enantiomer”).

Advisor: Prof. Adriano Carpita

Languages Italian Native speaker English

Fluent

German

Basic knowledge

Hungarian

Basic knowledge

126

Publications and Patents

• Mannocci L., Zhang Y., Scheuermann J., Leimbacher M., De Bellis G., Rizzi E., Dumelin C.E., Melkko S., Neri D., “High-throughput sequencing allows the identification of binding molecules isolated from DNA-encoded chemical libraries”. PNAS, 2008, 105(46), 17670-17675.

• Mannocci L., Neri D., Melkko S., “DNA-Encoded Chemical Libraries”.

SCREENING - Trends in Drug Discovery, 2009, 10, 16-18. • Mannocci L., Melkko S., Neri D., DNA-encoded chemical libraries. US

Patent Application 2008 No 61/008,249. • Scheuermann J., Dumelin C.E., Melkko S., Zhang Y., Mannocci L., Jaggi

M., Sobek J., Neri D., “DNA-encoded chemical libraries for the discovery of MMP-3 inhibitors”. Bioconjug Chem., 2008, 19(3), 778-785.

• Dumelin C.E., Trüssel S., Buller F., Trachsel E., Bootz F., Zhang Y.,

Mannocci L., Beck S.C., Drumea-Mirancea M., Seeliger M.W., Baltes C., Müggler T., Kranz F., Rudin M., Melkko S., Scheuermann J., Neri D. “A portable albumin binder from a DNA-encoded chemical library”. Angew Chem Int Ed Engl., 2008, 47(17), 3196-3201.

• Buller F., Mannocci L., Zhang Y., Dumelin C.E., Scheuermann J., Neri D., “Design and synthesis of a novel DNA-encoded chemical library using Diels-Alder cycloadditions”. Bioorg Med Chem Lett. 2008

• A. Carpita, L. Mannocci, R. Rossi, “Silver(I)-catalysed Protiodesilylation

of 1-(Trimethylsilyl)-1-alkynes”. Eur. J. Org. Chem. 2005, 12, 1367-1377.

• F. Bellina, A. Carpita, L. Mannocci, R. Rossi “First total synthesis of

naturally-occurring (-)-nitidon and its enantiomer”, Eur. J. Org. Chem. 2004, 12, 2610-2619.

127

Poster Presentations

• L. Mannocci, Y. Zhang, J. Scheuermann, M. Leimbacher, G. De Bellis, E. Rizzi, C. Dumelin, S. Melkko, D. Neri, “Novel Strategies for the Synthesis, Selection and Decoding of DNA-Encoded Chemical Libraries” - Molecular Medicine Tri-Conference, San Francisco, USA, 25th - 28th March 2008.

• A. Carpita, L. Mannocci, R. Rossi, XVII Convegno Nazionale della

Divisione di Chimica Farmaceutica della Società Chimica Italiana, Pisa (PI), Italy, 6th – 10th September 2004.

• A. Carpita, L. Mannocci - poster and flash communication, XXIX

Convegno Nazionale della Divisione Chimica Organica, Potenza (PZ), Italy, 31st August – 4th September 2004.

• M. Biagetti, A. Carpita, L. Mannocci, R. Rossi “Studi sulla sintesi del

(-)-nitidone”, communication, VI Convegno Nazionale “Giornate di Chimica delle Sostanze Naturali”, Vietri sul Mare (SA), Italy, 29th September – 1st October 2003. Acts pag. 3.

128

8. ACKNOWLEDGMENTS

First of all, I would like to express my sincere gratitude to my PhD advisor Professor

Dr. Dario Neri for giving me the privilege to pursue my doctoral studies in his

laboratory. In every discussion, I constantly perceived a brilliant intellect beyond all

his answers, as well as an enormous wisdom in his questions. I especially appreciated

his success focus attitude towards research and I was impressed by his excellent and

ubiquitous scientific knowledge, inexhaustible source of creativity and curiosity.

I would like to thank Dr. Yixin Zhang, Dr. Jörg Scheuermann and Dr. Samu Melkko.

Their constant scientific and personal support inspired me in the day-by-day

experiments and was absolutely essential for the accomplishment of this work. Dr.

Yixin Zhang was an invaluable help in the design and in the set up of all the

bioinformatic tools. His expertise and precious advice were often crucial for the

synthesis and the assembling of the library. Dr. Jörg Scheuermann introduced me into

the field of DNA-encoded chemistry and to the laboratory life. I constantly benefit

from his priceless critical input and exciting discussions. Samu Melkko was a brilliant

support for the selection procedures and a very big help with the gene cloning and the

radioactive selections.

Further thanks go to the “chemistry team”: Christoph Dumelin, Fabian Buller, Jean-

Paul Gapian, Sabrina Trüssel, Madalina Jaggi and Ilona Molnàr for their priceless

support and for the numerous daily, open, controversial and stimulating scientific (and

non-scientific) discussions, which created the constant extraordinary enjoyable

atmosphere, typical of the room G398. Special thanks go to Markus Leimbacher for

helping me on the assembling of the library over his Master Thesis and for his heroic

efforts for the set up of the encoding strategies and of the selection procedures.

I greatly acknowledge my co-examiner Prof. Karl-Heinz Altmann for thoroughly

proof-reading my Thesis for all his valuable input.

Additionally, I am most grateful to Gianluca De Bellis and Ermanno Rizzi from the

Institute for Biomedical Technologies of Milan, who enabled us to use “454” high-

throughput sequencing technology, by providing the platform.

129

Finally, an enormous hug goes to all present and former members (and friends) of

Professor Dario Neri’s group, who contributed to the pleasant atmosphere and

provided me advices, friendship, vitality and for many years a home away from home.

Without you the winter here would have certainly been much colder and the clouds

not so bright. A smile lasts only for while, but in the memory can be forever: you will

always have a special place in the treasure of my heart.

Un credito non solo di gratitudine ma soprattutto di affetto e stima lo devo al Prof.

Adriano Carpita. La sua sincera amicizia mi ha accompagnato e supportato in tutti

questi anni. M’insegnò, con quell’arte che oggi è mio mestiere, come spesso la

piacevole scoperta è più frutto della tenace ingegnosa pazienza che di qualsiasi altro

talento.

Un ringraziamento e un abbraccio non possono certo mancare per il mio amico,

compagno di sogni grandiosi e viaggi avventurosi Dott. Dario Lombardi. Con vino,

parole e allegria mi ha sempre aiutato a buttar giù le pillole più amare e a lasciare gli

errori in fondo al boccale. Ti auguro di cuore di mietere presto tutto quel successo che

in questi anni hai seminato con il tuo talento.

Voglio inoltre esprimere la mia più sincera gratitudine a tutti i miei amici “vecchi” e

“nuovi” per il loro supporto e per la loro vicinanza nella lontananza. E poiché mi pare

ingiusto nominarne alcuno quanto nessuno allora li riporto tutt’e centomila: Alessio

Catarsi, Luca Mantilli, Mirko Sardelli, Enrico Marsili, Roberto Scamuzzi, Sandro

Orsini, Luca Reggiani, Giorgio La Corte, Francesco Attuali, Andrea Scarpellini,

Silvia Anthoine Dietrich, Giulio Casi, Stefania Capone, Andrea Chicca, Cesare

Borgia e tutti coloro che non trovano spazio qua, ma di sicuro lo hanno tra i miei

ricordi. Brindo a voi amici di oggi, amici di ieri, amici, spero, di sempre!

Un riconoscimento speciale va a mio Padre, a mia Madre, ai miei Nonni e a tutti i

miei cari oggi presenti e a tutti quelli che purtroppo non posso essere qui con me a

festeggiare questo traguardo. Questo lavoro è dedicato a voi che durante questa mia

storia, con amore e pazienza, mi avete sostenuto e incoraggiato giorno per giorno a

130

inseguire i miei sogni, a vedere aquiloni là dove c’erano soltanto nuvole. Un posto

d’onore sarà sempre per voi nello scrigno del mio cuore.

Infine, un immenso abbraccio e un ringraziamento speciale lo devo ad Anita. E’ stata

la mia scorta di sole quando l’inverno sembrava infinito e il mio più potente antidoto

contro i più diversi vapori di questo laboratorio.

Forse è vero, la vita e i sogni sono fogli di uno stesso libro: leggerli in ordine è vivere,

sfogliarli a caso è sognare, ma questo libro parla comunque di te. Sei tutto quello che

so sulla felicità. Grazie!

131

9. APPENDIX

9.1 Model compound oligonucleotide conjugates.

Fmoc-amino acid (A) DEL_O_1 Conjugated

(5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC

TTA GGA CGT GTG TGA ATT GTC-3’) Yield*) % 97 Recovery†) % 53

N H F m ocO

H O

ESI-MS (Da) (in brackets is the expected MS)

13474 (13473)

Yield*) % 90 Recovery†) % 65

O O H

F m oc N H

ESI-MS (Da) (in brakets the expected MS) 13439 (13437)

Yield*) % 73 Recovery†) % 55

O

O H

F m oc N H ESI-MS (Da) (in brakets the expected MS) 13457 (13459)

Table 9-1: HPLC coupling yields and recovery assessed after peptide bond formation reaction between three

selected Fmoc-amino acids (A) and a model 5’-amino-oligonucleotide (DEL_O_1). *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis

spectrophotometer) following HPLC purification (see Chapter 3.1.6).

132

Table 9-2: HPLC coupling yields and recovery assessed after peptide bond formation reaction between three

selected 5’-Fmoc-deprotected amino acids (A) oligonucleotide conjugated and four different model carboxylic

acids (B). **) In the row are schematically represented the structures of the Fmoc-deprotected amino acids (A)

oligonucleotide conjugated, while in the column, the structures of the model carboxylic acids (B). *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis

spectrophotometer) following HPLC purification (see Chapter 3.1.6). ‡) In brackets is reported the calculated MS for the oligonucleotide conjugated product.

N H 2

O

H N

D N A

OHN

H 2 N

D NA

O

N H

H 2 N

D NA

Structures**)

Yield*) %

Recovery†)

%

ESI-MS‡)

(Da)

Yield*) %

Recovery†)

%

ESI-MS‡)

(Da)

Yield*) %

Recovery†)

%

ESI-MS‡)

(Da)

O H

NHO H

H N

HS

O

98 90

13670

(13699) 83 68

13663

(13663) 65 60

13685

(13685)

O

H O

O I

70 60

13732

(13733) 72 60

13694

(13697) 76 65

13721

(13719)

O

O H

N

>70 70

13647

(13644) >64 64

13609

(13608) >57 57

13632

(13630)

O

H O

Br

>52 52

13670

(13670) >55 55

136332

(13634) >51 51

13657

(13656)

133

9.2 Library synthesis overview

List of the 20 Fmoc-amino acids and of the oligonucleotide codes used as initial building

block for constructing DEL4000:

DEL_Cn Structure Name MW Coding sequence

HPLC Yield*)

%

ESI-MS†) (Da)

1

(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(pyridin-4-yl)propanoic acid

388.42 ATCTTA 97 13474 (13474)

2

3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(4-bromophenyl)butanoic acid

480.35 GCTGCG 70 13610 (13608)

3

(1R,2S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)cyclopentanecarboxylic acid

351.4 AGAACG 86 13497 (13496)

4

3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(pyridin-2-yl)propanoic acid

433.52 GACATC 53 13528 (13529)

5

(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(3-fluorophenyl)propanoic acid

405.43 ATTACT 64 13491 (13491)

( S)

( R)

COOH

NHFmoc

N

(S)

FmocHNCOOH

Br

(S )

N HF m oc

C O O H

( S)H O O C

N H F m o c

F

MeS

NHFmoc

HOOC

134

6

(1S,4R)-4-(((9H-fluoren-9-yl)methoxy)carbonylamino)cyclopent-2-enecarboxylic acid

349.38 ACGGCA 72 13467 (13470)

7

(R)-3-(4-((((9H-fluoren-9-yl)methoxy)carbonylamino)methyl)phenyl)-2-(tert-butoxycarbonylamino)propanoic acid

516.58 AGAGAA 60 13586 (13685)

8

Acetic acid, [[5-[[(9H-fluoren-9-ylmethoxy)carbonyl]amino]-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-2-yl]oxy]

505.56 TCCAAA 87 13588 (13585)

9

(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(thiazol-4-yl)propanoic acid

394.44 TCGATC 75 13482 (13481)

10

(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(1-benzyl-1H-imidazol-4-yl)propanoic acid

467.52 TCCGGC 60 13553 (13555)

11

5-(4-((((9H-fluoren-9-yl)methoxy)carbonylamino)methyl)-3,5-dimethoxyphenoxy)pentanoic acid

505.56 CGTGCA 53 13618 (13617)

(S) ( R)

( Z)

NHFmocHOOC

(R )

HOOC

BocHN

FmocHN

NHFmoc

O

HOOC

S

N

( S)

HOOC

NHFmoc

N

N( S)

HOOC

NHFmoc

Ph

FmocHN

OMeMeO

O(CH2)4

HOOC

135

12

(R)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(4-chlorophenyl)propanoic acid

421.88 GGGTAA 80 13598 (13598)

13

(R)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)hex-5-ynoic acid

349.38 CCCTCC 98 13358 (13357)

14

(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4,4-diphenylbutanoic acid

477.55 TCTCCA 70 13521 (13524)

15

(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-2-(phenylsulfonamido)propanoic acid

466.51 CAAGCT 80 13564 (13562)

16

(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(thiophen-3-yl)butanoic acid

407.48 GCACTG 43 13520 (13519)

17

(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(4-iodophenyl)butanoic acid

527.35 ACGAAT 64 13648 (13647)

(R)NHFmoc

COOH

HOOC(S)

NHFmoc

PhPh

SHN

OO

(S)HOOC

NHFmoc

S( S)

NHFmoc

COOH

I

(S)

NHFmoc

COOH

( R)H O O C

N H F m o cC l

136

18

(R)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(naphthalen-2-yl)butanoic acid

451.51 TATCAG 80 13562 (13562)

19

(R)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(naphthalen-1-yl)butanoic acid

451.51 TGAAAT 62 13587 (13586)

20

(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(4-hydroxyphenyl)propanoic acid

403.43 GTTAGT 56 13543 (13545)

Table 9-3: List of the 20 Fmoc-amino acids and of the oligonucleotide codes used as initial building block for

constructing DEL4000: *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) In brackets is reported the calculated expected MS for the oligonucleotide conjugated product Fmoc-

deprotected.

(R)NHFmoc

COOH

(R)

COOH

FmocHN

( S)H O O C

N H F m o cO H

137

List of the 200 carboxylic acids and of the oligonucleotide codes used as second building block for the

construction of DEL4000:

Num. Structure Formula structure MW Coding

sequence Supplier

1

C13H8N2O4S2 320.34771 TTTTTTTT SALOR

2

C12H9NO5S 279.27323 GGGGTTTT SALOR

3

C7H7NO3S 185.20274 CCCCTTTT SALOR

4

C10H11NO4 209.20347 AAAATTTT SALOR

5

C6H7N3O4 185.14039 ACGTGTTT SALOR

6

C6H6ClN3O4 219.58542 CATGGTTT SALOR

N

N

S

O

S

O

O

OH

N

S O

O

O

OOH

SN

O

O

OH

N+

O

O

OOH

CH3

NN

N+

O

O

O

OH

CH3

NN

N+

O

OCl

O

CH3

OH

138

7

C8H15NO8 253.21065 GTACGTTT SALOR

8

C6H6O2S 142.17752 TGCAGTTT ALDRICH

9

C7H8O2S 156.20461 GACTCTTT ALDRICH

10

C8H10O2S 170.2317 TCAGCTTT ALDRICH

11

C16H29NO5 315.41323 AGTCCTTT SIGMA

12

C13H25NO5 275.3479 CTGACTTT FLUKA

13

C13H25NO5 275.3479 CGATATTT FLUKA

NO O

OHOH

OH

OH

OHOH

HH

H

HH

H

S O

OH

S

O

OH

S

O

OH

N O

O

O

OH

OH

CH3

CH3

CH3

N O

O

O

OH

OH

CH3

CH3CH3

CH3

CH3

Chiral

NO

O

O

OH

OH

CH3

CH3CH3

CH3 CH3

Chiral

139

14

C14H25NO6 303.35845 ATCGATTT FLUKA

15

C14H19NO4 265.31183 TAGCATTT FLUKA

16

C15H21NO4 279.33892 GCTAATTT FLUKA

17

C16H23NO4 293.36601 CAGTTGTT FLUKA

18

C21H25NO4 355.4377 ACTGTGTT FLUKA

19

C15H20ClNO4 313.78395 TGACTGTT FLUKA

20

C15H20ClNO4 313.78395 GTCATGTT FLUKA

N

O

O

OO

O

OH

CH3

CH3

CH3CH3

CH3

CH3

Chiral

NO

O

O OH

CH3 CH3

CH3

Chiral

N

O

O O

OH

CH3

CH3

CH3

Chiral

NO

O

O

OH

CH3CH3

CH3

CH3

Chiral

N

O

O O

OH

CH3CH3

CH3

NO

O

O

OH

Cl

CH3CH3

CH3

NO

O

O

OH

Cl

CH3

CH3CH3

140

21

C19H23NO4 329.39946 GGTTGGTT FLUKA

22

C16H20N2O4 304.3488 TTGGGGTT FLUKA

23

C16H20N2O4 304.3488 AACCGGTT FLUKA

24

C16H20F3NO4 347.3373 CCAAGGTT FLUKA

25

C16H20F3NO4 347.3373 ATATCGTT FLUKA

26

C16H20F3NO4 347.3373 CGCGCGTT FLUKA

27

C16H20F3NO4 347.3373 GCGCCGTT FLUKA

NO

O

O

OH

CH3

CH3

CH3

OH

ONO

CH3

CH3

CH3O

CN

OH

ONO

CH3

CH3

CH3O

CN

N

O

O O

F

FF

OH

CH3

CH3CH3 Chiral

OH

ONO

CH3

CH3

CH3O

CF3

NO

O

O

F

FF

OH

CH3CH3

CH3

Chiral

NO

O

O

F F

F

OH

CH3

CH3

CH3

Chiral

141

28

C14H20N2O4 280.3265 TATACGTT FLUKA

29

C16H20N2O4 304.3488 TCCTAGTT FLUKA

30

C21H25NO4 355.4377 GAAGAGTT FLUKA

31

C7H16ClNO2 146.21107 CTTCAGTT ALDRICH

32

C7H16ClNO3 162.21047 AGGAAGTT FLUKA

33

C13H28ClNO4 260.354 AGCTTCTT SIGMA

34

C17H36ClNO4 316.462 CTAGTCTT SIGMA

N

O

O

N

O

OH

CH3

CH3

CH3Chiral

OH

ONO

CH3

CH3

CH3O

CN

NO

O

O

OH

CH3

CH3

CH3

N+

O

OH

CH3CH3

CH3

Cl

N+ O

OHOHCH3 CH3

CH3

Cl

N

O

O

O

OH

CH3

CH3CH3

CH3

ClH

N

O

O

O

OH

CH3

CH3CH3

CH3

ClH

142

35

C6H11NO4 161.15887 GATCTCTT ALDRICH

36

C8H7NO5 197.14869 TCGATCTT ALDRICH

37

C10H11NO4 209.20347 TAATGCTT FLUKA

38

C9H8N2O5 224.17451 GCCGGCTT ALDRICH

39

C10H10N2O5 238.2016 CGGCGCTT ALDRICH

40

C13H17NO7 299.28294 ATTAGCTT FLUKA

41

C4H7NaO3 104.10739 CCTTCCTT FLUKA

N+

O

O O

OH

N+

OO

O

O

OH

N+

O

O

O

OH

N+

N

O

O

O

O

OH

N+

O

O

O O

N OH

N+

O

O

O

OHO

OOH

CH3 CH3

O

OOH

CH3Na

+

143

42

C4H7NaO3 104.10739 AAGGCCTT FLUKA

43

C5H10O3 118.13365 TTCCCCTT FLUKA

44

C7H13NO4 175.18596 GGAACCTT SALOR

45

C9H10O4 182.17765 GTGTACTT FLUKA

46

C10H12O5 212.20414 TGTGACTT FLUKA

47

C12H16O5 240.25832 ACACACTT FLUKA

48

C12H20O3 212.2914 CACAACTT ALDRICH

OO

OH

CH3

Chiral

Na+

O

OHOH

CH3

CH3

O

O

NH2

OH

OH

CH3

CH3

O

OOH

OH

O

O

OH

O

OH

CH3

O

OH

O

O

OH

CH3

O

OHCH3

CH3CH3

OH

144

49

C8H14O4 174.19838 GCATTATT ALDRICH

50

C18H22O6 334.37244 TACGTATT SALOR

51

C11H14N4O4 266.25863 ATGCTATT SIGMA

52

C14H11NO2 225.24927 CGTATATT FLUKA

53

C15H13NO2 239.27636 CTCTGATT FLUKA

54

C11H14N4O2S 266.32383 AGAGGATT SALOR

55

C18H17NO2 279.34169 TCTCGATT ALDRICH

O

O

OHO CH3

O

O

O

CH3

O

OH

O

CH3

CH3

N

N

N

N

O

O

O

CH3

CH3 OH

N

O

OH

N

O OH

N

NN

N

O

S OH

CH3

N

O

CH3

OH

H

145

56

C3H7N3O2 117.10814 GAGAGATT ALDRICH

57

C5H11N3O2 145.16232 TGGTCATT FLUKA

58

C4H10ClNO2 139.5828 GTTGCATT ALDRICH

59

C5H10O2 102.13425 CAACCATT ALDRICH

60

C10H18O2 170.25376 ACCACATT ALDRICH

61

C16H22O3 262.35194 AATTAATT SALOR

62

C5H8O3 116.11771 CCGGAATT FLUKA

NH

N

O

NH2OH

NH

N

O

NH2OH

NHO

NNH2

OHNH

O

NNH2

OH

O

OHCH3

O

OH

O

O

OH

O O

OH CH3

O

NO H

CH 3

CH 3

HC l

146

63

C7H10O3 142.15595 GGCCAATT ALDRICH

64

C8H15NO3 173.21365 TTAAAATT ALDRICH

65

C5H6N2O4 158.11457 GGTTTTGG ALDRICH

66

C13H14N2O4 262.26753 TTGGTTGG ALDRICH

67

C13H17N5O5 323.31094 AACCTTGG SIGMA

68

C6H6N2O2S 170.19092 CCAATTGG ALDRICH

69

C16H14O2S 270.35278 CAGTGTGG SALOR

O

O

OH

O

O

NOH

CH3

N

NO

OO

OH

N

N

O

O

O

OH

NN

N

N

NO

O

O

O

OH

NN

N

N

NO

O

O

O

OH

N

N S

O

OH

S

OOH

147

70

C9H10O2S 182.24285 ACTGGTGG ALDRICH

71

C9H8O2S2 212.29091 TGACGTGG ALDRICH

72

C9H10O2S 182.24285 GTCAGTGG ALDRICH

73

C9H10O2S 182.24285 TCCTCTGG ALDRICH

74

C9H10O2S 182.24285 GAAGCTGG ALDRICH

75

C8H7ClO2S 202.66079 CTTCCTGG ALDRICH

76

C12H10O2S 218.2763 AGGACTGG FLUKA

O

SOH

S

S

O

OH

OS

OH

CH3

O

S

OH

CH3

S

O

OH

CH3

SO

OHCl

S

O

OH

148

77

C3H6O2S 106.14407 ATATATGG ALDRICH

78

C8H14O3S 190.26298 CGCGATGG ALDRICH

79

C9H9FO2 168.16928 GCGCATGG ALDRICH

80

C10H9FO3 196.17983 TATAATGG SIGMA

81

C9H8F2O2 186.15971 ACGTTGGG ALDRICH

82

C9H6F4O2 222.14057 CATGTGGG ALDRICH

83

C10H9F3O2 218.17723 GTACTGGG ALDRICH

O

OHS

CH3

O

O

SOH

CH3

O

OH

F

O

O

OH

F

O

F

F OH

O

F

F

F

F

OH

OF

F

F

OH

149

84

C11H8F6O2 286.17561 TGCATGGG ALDRICH

85

C11H8F6O2 286.17561 TTTTGGGG ALDRICH

86

C9H7F3O3 220.14954 GGGGGGGG ALDRICH

87

C8H7ClO2 170.59679 CCCCGGGG ALDRICH

88

C9H9ClO2 184.62388 AAAAGGGG ALDRICH

89

C10H9ClO3 212.63443 CGATCGGG ALDRICH

90

C9H9ClO3 200.62328 ATCGCGGG ALDRICH

O

FF

F

F

F F

OH

O

F

FF

F

F

F

OH

OO

F

F

F

OH

O

Cl

OH

O

OH

Cl

O

O

OH

Cl

O

O

OH

Cl

CH3

150

91

C8H6Cl2O3 221.04122 TAGCCGGG ALDRICH

92

C13H12Cl2O4 303.14419 GCTACGGG SIGMA

93

C19H16ClNO4 357.79667 GACTAGGG SIGMA

94

C12H9ClO5 268.65553 TCAGAGGG ALDRICH

95

C8H7BrO2 215.04779 AGTCAGGG FLUKA

96

C8H7BrO2 215.04779 CTGAAGGG FLUKA

97

C8H7BrO2 215.04779 CTCTTCGG FLUKA

O

O

ClCl

OH

O

O

O

ClCl

CH2

OH

CH3

N

O

O

CH3

OH

Cl

OCH3

O O

Cl

O

O

CH3

OH

O

Br

OH

O

OH

Br

O

OHBr

151

98

C9H9BrO2 229.07488 AGAGTCGG ALDRICH

99

C8H7BrO3 231.04719 TCTCTCGG ALDRICH

100

C10H9BrO3 257.08543 GAGATCGG ALDRICH

101

C8H7IO2 262.04819 GCATGCGG ALDRICH

102

C10H11IO2 290.10237 TACGGCGG SIGMA

103

C8H7IO3 278.04759 ATGCGCGG ALDRICH

104

C9H8INO3 305.07341 CGTAGCGG FLUKA

O

OH

Br

O

OOH

Br

O

O

OH

Br

O

OHI

O

OH

I

O

OOH

I

N

O

O

I

OH

152

105

C6H7NO3 141.12759 AATTCCGG ALDRICH

106

C3H4N4O2 128.09093 CCGGCCGG ALDRICH

107

C10H12O2 164.20594 GGCCCCGG ALDRICH

108

C11H14O2 178.23303 TTAACCGG ALDRICH

109

C10H12O2 164.20594 TGGTACGG FLUKA

110

C10H12O2 164.20594 GTTGACGG FLUKA

111

C15H12O2 224.26169 CAACACGG ALDRICH

NO

O

OH

CH3

N

N

N

N

O

OH

O

OH

CH3

O

OH

CH3

CH3CH3

O OH

CH3

Chiral

O

OH

CH3

Chiral

O

OH

153

112

C7H8ClNO2 173.60031 ACCAACGG FLUKA

113

C10H10O3 178.1894 TAATTAGG ALDRICH

114

C11H12O3 192.21649 GCCGTAGG ALDRICH

115

C11H12O3 192.21649 CGGCTAGG ALDRICH

116

C8H8O3 152.15116 ATTATAGG ALDRICH

117

C9H10O3 166.17825 AGCTGAGG ALDRICH

118

C18H28O3 292.42206 GATCGAGG ALDRICH

N O

OH

ClH

O

O

OH

OO

OH

OO

OH

CH3

O

OOH

OO

OH

CH3

O

O

OH

CH3

CH3CH3

CH3

CH3

CH3 O

O

OH

CH3

CH3CH3

CH3

CH3

CH3

154

119

C9H10O3 166.17825 TCGAGAGG ALDRICH

120

C9H10O3 166.17825 GTGTCAGG FLUKA

121

C9H10O3 166.17825 TGTGCAGG FLUKA

122

C9H10O3 166.17825 ACACCAGG FLUKA

123

C10H12O4 196.20474 CACACAGG ALDRICH

124

C11H14O5 226.23123 AAGGAAGG FLUKA

125

C11H14O5 226.23123 TTCCAAGG FLUKA

O

OH

OH

O

OH

O CH3

O

OH

OCH3

O

OH

OCH3

O

OH

O

O

CH3

CH3

O

OO

OH

O

CH3

CH3

CH3

O

O

O

O

OHCH3

CH3

CH3

155

126

C10H12O3 180.20534 GGAAAAGG ALDRICH

127

C10H12O3 180.20534 CCTTTTCC FLUKA

128

C10H12O3 180.20534 AAGGTTCC FLUKA

129

C11H14O4 210.23183 TTCCTTCC ALDRICH

130

C11H14O4 210.23183 GTGTGTCC ALDRICH

131

C12H16O5 240.25832 TGTGGTCC FLUKA

132

C16H16O4 272.30352 ACACGTCC SALOR

O

OH

OCH3

O

OH OCH3

O

OHO

CH3

O

OH

O

O

CH3

CH3

O

OH

O

O

CH3

CH3

OO

O

OOH

CH3

CH3

CH3

O O

OH

OCH3

156

133

C12H12O4 220.22704 CACAGTCC ALDRICH

134

C12H10O5 234.2105 AGCTCTCC ALDRICH

135

C16H14O3 254.28818 CTAGCTCC ALDRICH

136

C9H9NO3 179.17698 GATCCTCC ALDRICH

137

C10H11NO3 193.20407 TCGACTCC ALDRICH

138

C10H11NO3 193.20407 TAATATCC ALDRICH

139

C10H11NO3 193.20407 GCCGATCC ALDRICH

O

O

OHOCH3

O O

O

OOH

CH3

O

OOH

N

O

O

OH

N

O

O

OH

CH3

N

O

O

OH

CH3

N

O

O

OH

CH3

157

140

C10H7NO4 205.17159 CGGCATCC FLUKA

141

C9H8NNaO4 217.15821 ATTAATCC ALDRICH

142

C10H11NO3 193.20407 TGGTTGCC SIGMA

143

C11H13NO3 207.23116 GTTGTGCC FLUKA

144

C12H15NO3 221.25825 CAACTGCC ALDRICH

145

C12H15NO3 221.25825 ACCATGCC ALDRICH

146

C16H17NO3 271.31879 AATTGGCC ALDRICH

OO

O

O

NH2

Na+

N

O

O

O

OH

O O

N OH

N

O

O

OH

CH3

N

OO

OH

CH3H

N

OO

OH

CH3H

N O OOH

CH3H

158

147

C16H17NO3 271.31879 CCGGGGCC ALDRICH

148

C11H11NO4 221.21462 GGCCGGCC ALDRICH

149

C20H21N3O6 399.40687 GCATCGCC SIGMA

150

C5H7ClN2O2 162.57674 TACGCGCC FLUKA

151

C7H8N2O4 184.15281 ATGCCGCC ALDRICH

152

C7H7NO4 169.13814 CGTACGCC ALDRICH

153

C9H9NO4 195.17638 CTCTAGCC ALDRICH

N O OOH

CH3H

N

O

O

O OH

NNO

NO

O

O

O

OH

N

N

O

OH

ClH

N

NO

O

O

CH3

OH

NO

O

O

OH

ON

O

O

OH

H

H

159

154

C3H5N4NaO2S 184.1527 AGAGAGCC ALDRICH

155

C9H10O2 150.17885 TCTCAGCC ALDRICH

156

C9H10O2 150.17885 GAGAAGCC ALDRICH

157

C11H14O2 178.23303 GACTTCCC SALOR

158

C9H10O2 150.17885 TCAGTCCC ALDRICH

159

C9H10O2 150.17885 AGTCTCCC ALDRICH

160

C10H12O2 164.20594 CTGATCCC ALDRICH

O OH

CH3

O

OH

CH3

O

OH

CH3

O

OH

CH3

O

OH

O

OH

CH3

NN

N

NS

O

O N a

160

161

C10H12O2 164.20594 CGATGCCC ALDRICH

162

C10H10O2 162.19 ATCGGCCC ALDRICH

163

C12H10O2 186.2123 TAGCGCCC FLUKA

164

C12H10O2 186.2123 GCTAGCCC ALDRICH

165

C14H12O2 212.25054 TTTTCCCC ALDRICH

166

C15H14O2 226.27763 GGGGCCCC FLUKA

167

253.256 CCCCCCCC

O

OH

CH3

O

OH

O

OH

O

OH

O

OH

O

OH

N

OH

O

O

161

168

154.139 AAAACCCC

169

202.208 ACGTACCC

170

170.138 CATGACCC

171

204.244 GTACACCC

172

202.208 TGCAACCC

173

C10H8N2O3 204.18686 ATATTACC SALOR

174

C11H10N2O4 234.21335 CGCGTACC SALOR

175

C10H12N4O4 252.23154 GCGCTACC SALOR

F

OH

O

OOH

O

OF

OH

O

N+

O

CH3 CH3CH3

OH

OCH3

O

Cl

O

OH

O

N

N

O

O

OH

N

N O

O

O OH

N

N N

NO

O

CH3

O

CH3

OH

162

176

C12H16O2 192.26012 TATATACC SALOR

177

C8H8O3S 184.214 TCCTGACC

178

C16H15ClN4O4S 394.83935 GAAGGACC SALOR

179

C16H12N2O3 280.28564 CTTCGACC SALOR

180

C6H7N3O4S 217.20439 AGGAGACC SALOR

181

C11H10N2O3 218.21395 CAGTCACC SALOR

182

C15H14O5 274.27583 ACTGCACC SALOR

O

OH

CH3

CH3

CH3

S

O

OH

O

NN

NN

SO

O

O

CH3

CH3

Cl

OH

N

N

O

O

OH

N

NN

O

S

OO

OH

N

N O

O

OH

O O

O

OOH

163

183

C11H14O2 178.23303 TGACCACC SALOR

184

C12H14N2O2 218.25758 GTCACACC SALOR

185

C10H8N2O3 204.18686 GGTTAACC SALOR

186

C9H8ClNO3 213.62201 TTGGAACC SALOR

187

C10H8F3NO3 247.17536 AACCAACC SALOR

188

C18H13ClO6 360.75371 CCAAAACC SALOR

189

C11H14N4O2S 266.32383 AATTTTAA SALOR

O

OH

CH3CH3

N

N

O

OH

CH3

N

N

O

O

OH

N

O

O

OHCl

N

O O

F

FF

OH

O

O

O

O

OOH

CH3 Cl

N

N

N

N

O

S OH

CH3

164

190

C10H3Cl4NO4 342.95171 CCGGTTAA SALOR

191

C10H9NO3S 223.25213 TTAATTAA SALOR

192

C9H8O2 148.16 TGGTGTAA ALDRICH

193

C10H8N2O3 204.184 GTTGGTAA

194

C10H12O2 164.21 CAACGTAA FLUKA

195

C11H14O2 178.23 ACCAGTAA Alfa Aesar

196

C9H9IO2 276.08 CTCTCTAA Trans World

Chemicals

197

C8H13NO3 171.195 AGAGCTAA

N

O

O

Cl

Cl

Cl

Cl

O

OH

N

S

O

OOH

H

OH

O

N

N

O

OH

O

OH

O

I

OH

O

N

O

OH

O

O H

O

165

198

C9H11NO4S 229.255 TCTCCTAA

199

C10H13O2N 179.2 GAGACTAA Aldrich

200

C7H13NO2S2 207.31516 GCATATAA SALOR

SN

O O

CH3OH

O

O

OH

NH2

SS

NO

OH

CH3

CH3

in copyright - non-commercial use permitted rights ...41651/... · dna-encoded chemical libraries...

Documents