in copyright - non-commercial use permitted rights ...41651/... · dna-encoded chemical libraries...
TRANSCRIPT
Research Collection
Doctoral Thesis
DNA-encoded chemical libraries
Author(s): Mannocci, Luca
Publication Date: 2009
Permanent Link: https://doi.org/10.3929/ethz-a-005783014
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.
ETH Library
Luca
Man
nocc
i D
NA
-Enc
oded
Che
mic
al L
ibra
ries
Dis
s. E
TH N
o.18
153 Diss. ETH No. 18153
DNA-Encoded Chemical Libraries
Luca Mannocci
DISS. ETH NO. 18153
DNA-Encoded Chemical Libraries
A dissertation submitted to the
ETH Zurich
For the degree of
Doctor of Sciences
Presented by
Luca Mannocci
Dott. Chim. Università degli Studi di Pisa
Born September 7, 1979
Citizen of Pisa (Italy)
Accepted on the recommendation of
Prof. Dr. Dario Neri, examiner
Prof. Dr. Karl-Heinz Altmann, co-examiner
Zurich, 2009
“I believe in intuition and inspiration. Imagination is more important
than knowledge. Knowledge is limited. Imagination embraces the
entire world, stimulating progress, giving birth to evolution.
It is, strictly speaking, a real factor in scientific research.”
Albert Einstein
Alla mia famiglia
TABLE OF CONTENTS
1. SUMMARY ...........................................................................................7
RIASSUNTO .............................................................................................9
List of abbreviations ...............................................................................11
2. INTRODUCTION ..............................................................................14
2.1 DNA-Encoded Chemical Libraries ................................................................16
2.1.1 Libraries of DNA displaying one covalently linked chemical entity ........20
2.1.1.1 DNA-encoded “Split-&-Pool” ............................................................20
2.1.1.2 DNA-assisted “Split-&-Pool” .............................................................21
2.1.1.3 DNA-templated synthesis ....................................................................24
2.1.1.4 Stepwise coupling of coding DNA fragments to nascent organic molecules .........................................................................................................28
2.1.2 DNA libraries displaying multiple covalently linked chemical entities ESAC libraries.....................................................................................................30
2.2. The decoding of DNA-encoded chemical libraries........................................38
2.2.1 Microarray-based decoding .......................................................................35
2.2.2 Decoding by high throughput sequencing ................................................38
2.2.2.1 “454” technology.................................................................................40
2.2.2.2 Solexa technology ...............................................................................42
2.2.2.3 SOLiD techonlogy ...............................................................................44
2.2.2.4 Single Molecule DNA Sequencing – Helicos technology..................48
2
3. RESULTS ............................................................................................50
3.1 DNA-Encoded Library “DEL4000”...............................................................50
3.1.1 Library design and synthesis .....................................................................51
3.1.2 Model Compounds .....................................................................................53
3.1.3 Oligonucleotides.........................................................................................54
3.1.4 Compounds.................................................................................................55
3.1.5 HPLC Purification .....................................................................................56
3.1.6 Mass Spectrometry .....................................................................................57
3.1.7 Oligonucleotide concentration determination ..........................................58
3.1.8 Polymerase Klenow encoding ....................................................................59
3.1.9 Summary ....................................................................................................59
3.2 Selections using the DEL4000 library ............................................................61
3.2.1 Streptavidin selection .................................................................................62
3.2.1.1 Identification of streptavidin binding molecules ...............................64
3.2.1.2 Characterization of streptavidin binding molecules ..........................65
3.2.2 Polyclonal human IgG selection ...............................................................68
3.2.2.1 Identification of polyclonal IgG binding molecules ..........................68
3.2.2.2 Characterization of polyclonal IgG binding molecules by affinity chromatography resins ...................................................................................70
3.2.3 Matrix metalloproteinase 3 (MMP3) selection .........................................71
3
3.2.3.2 Characterization of MMP3 binding molecules..................................72
3.2.4 Computational simulation of DEL4000 selections ...................................73
3.3 General strategies for the stepwise construction of very large DNA encoded chemical libraries ...................................................................................................75
3.3.1 Selective deprotection and reaction of di-amine derivatives ....................75
3.3.1.1 Orthogonal protective group and selective deprotection ...................76
3.3.1.2 Core scaffolds design and synthesis strategy .....................................78
3.3.1.3 Model compounds for N-Fmoc, N’-Nvoc di-amino carboxylic acid core scaffold based library. .............................................................................80
3.3.2 Stepwise DNA-encoding ............................................................................82
3.3.2 Encoding by ligation ..............................................................................82
3.3.2.1 Encoding by a combination of Klenow polymerase and ligation......83
3.3.2.2 Encoding by Klenow polymerase........................................................84
3.3.3 Summary ....................................................................................................85
4. DISCUSSION ......................................................................................87
5. MATERIAL AND METHODS .........................................................89
5.1 Reagents and general remarks .......................................................................89
5.2 Synthesis of DEL4000 DNA Encoded Library..............................................89
5.2.1 Synthesis of library model compounds oligonucleotide conjugate. .........90
5.2.2 Coupling reactions of 20 Fmoc-protected amino acids. ...........................91
5.2.3 Coupling reactions of 200 carboxylic acids. .............................................91
4
5.2.4 Polymerase Klenow encoding of 200 carboxylic acids reactions. ............92
5.2.5 Preparation of D-desthiobiotin oligonucleotide-conjugate (positive control) ................................................................................................................92
5.3 Library DEL 4000 selections...........................................................................93
5.3.1 Streptavidin selection. ................................................................................93
5.3.1.1 Identification of binding molecules....................................................93
5.3.1.2 Synthesis of the binding molecules as fluorescein conjugates..........93
5.3.2 Affinity measurements. .............................................................................94
5.3.3 Polyclonal human IgG selection. ..............................................................95
5.3.3.1 Polyclonal human IgG coating of sepharose beads. .........................95
5.3.3.2 Identification of human IgG binding molecules. ..............................95
5.3.3.3 Synthesis of affinity chromatography resin containing the compound 02-40 or 16-40. ................................................................................................96
5.3.3.4 Polyclonal human IgG Cy5 labeling. .................................................97
5.3.3.5 Biotinylated polyclonal human IgG. ..................................................97
5.3.3.6 Affinity chromatography of CHO cells supernatant spiked with human IgG Cy5 labeled or biotinylated human IgG on IgG binding resin. 97
5.3.4 Human MMP3 selection. ...........................................................................98
5.3.4.1 Human MMP3 coating of sepharose beads. ......................................98
5.3.4.2 Identification of human MMP3 binding molecules. .........................99
5.3.4.3 Synthesis of the MMP3 binding molecules as fluorescein conjugates...........................................................................................................................99
5.3.5 Computational simulation .......................................................................100
5
5.4 Stepwise coupling by selective deprotection and reaction of di-amine derivatives.............................................................................................................100
5.4.1 DNA-compatible cleavage of different amino protective groups. ..........100
5.4.1.1 Synthesis of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid (1c).........................................................................................................................100
5.4.1.2 Synthesis of N-Bpoc cis-2-aminocyclopentanecarboxylic acid (1d).........................................................................................................................101
5.4.1.3 Synthesis of N-Nvoc cis-2-aminocyclopentanecarboxylic acid (1b).........................................................................................................................101
5.4.1.4 Synthesis of 4-pentenoic N-hydroxy succinimide ester (1e)............102
5.4.1.5 Synthesis of Nα-Fmoc-Nε-Nvoc-lysine (2). ......................................102
5.4.1.6 Oligonucleotide conjugation of N-protected cis-2-aminocyclopentanecarboxylic acid derivatives and Nα-Fmoc-Nε-Nvoc-lysine.........................................................................................................................103
5.4.1.7 Cleavage of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid oligonucleotide conjugate. ............................................................................103
5.4.1.8 Cleavage of N-Bpoc cis-2-aminocyclopentanecarboxylic acid. oligonucleotide conjugate. ............................................................................104
5.4.1.9 Cleavage of N-Nvoc cis-2-aminocyclopentanecarboxylic acid and N-Fmoc-N’-Nvoc-lysine oligonucleotide conjugate. .......................................104
5.4.2 Synthesis of model scaffolds for Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative based library. ............................................................................105
5.4.2.1 Synthesis of (1R,3R,4R)-methyl 3-azido-4-Boc-amino-cyclopentanecarboxylate (4). ........................................................................105
5.4.2.2 Synthesis of (1S,3R,4R)-methyl 3-amino-4-Boc-amino-cyclopentanecarboxylate (5). ........................................................................105
5.4.2.3 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Boc-amino-cyclopentanecarboxylate (6). ........................................................................105
6
5.4.2.4 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Nvoc-amino-cyclopentanecarboxylate (8). ........................................................................106
5.5 Stepwise encoding ..........................................................................................106
5.5.1 Stepwise encoding by Ligation. ...............................................................107
5.5.2 Stepwise encoding by a combination of Klenow polymerase and Ligation.............................................................................................................................107
5.5.3 Stepwise encoding by Klenow Polymerase. .............................................108
5.5.4 Stepwise coupling and encoding of model compound for Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative based library..................................109
5.5.5 Bacterial cloning and sequencing. ..........................................................111
6. REFERENCES..................................................................................112
7. CURRICULUM VITAE ..................................................................122
8. ACKNOWLEDGMENTS................................................................128
9. APPENDIX........................................................................................131
9.1 Model compounds oligonucleotide conjugate..............................................131
9.2 Library synthesis overview ...........................................................................133
7
1. SUMMARY
The isolation of small organic molecules capable of specific binding to biological
targets is a central problem in chemistry, biology and pharmaceutical sciences.
Consequently, there is a considerable interest in the development of powerful and
convenient technologies for the construction of large sets (“libraries”) of chemical
compounds and of novel screening methodologies for the identification of binding
molecules. DNA-encoded chemical libraries represent an innovative approach to the
construction and screening of libraries of unprecedented dimension and quality. Such
libraries consist of a collection of chemical compounds, each individually coupled to
a distinctive DNA fragment which serves as identification bar code. DNA-encoded
chemical libraries can be "panned" on a target protein immobilized on a solid support.
Typically, high-throughput sequencing reveals the different composition of the library
before and after panning, thus allowing the identification of binding molecules to the
target protein of interest. In this respect, DNA-encoded chemical libraries bear a
logical similarity to phage display libraries of proteins and peptides, in which the
binding specifically displayed on the tip of the phage surface (“phenotype”) is
physically linked to the gene coding for the polypeptide (“genotype”).
In the first part of the this thesis, I present a general strategy for the stepwise coupling
of coding DNA fragments to nascent organic molecules following individual reaction
steps, as well as the implementation of high-throughput sequencing for the
identification and relative quantification of library members. The methodology was
exemplified in the construction of a DNA-encoded chemical library containing 4’000
compounds (DEL4000) covalently attached to unique DNA-fragments serving as
amplifiable identification bar-codes. We have also assessed the relative composition
of the new library and its functionality by performing selection experiments on
sepharose resin coated with streptavidin. This study has led to the identification of
novel chemical compounds with submicromolar dissociation constants towards
streptavidin. Moreover we have found that selections can conveniently be decoded
using a recently described high throughput DNA sequencing technology (termed “454
technology”), originally developed for genome sequencing,
8
In a second selection experiment binding molecules to polyclonal human IgG were
identified. I could show that, upon coupling to resin, these compounds could be used
for the affinity purification of human IgG from culture supernatants.
Furthermore we also carried out a selection against the catalytic domain of human
matrix metalloproteinase 3 (MMP3). Matrix metalloproteinases (MMPs) are zinc-
dependent proteases which are involved in tissue remodelling of a variety of
physiological and pathological processes. The selection facilitated the identification of
a binding compound with dissociation constant in the low μM range.
Encouraged by these results we investigated methodologies for the construction of
very large DNA-encoded chemical libraries, featuring the stepwise addition of at least
three independent sets of chemical moieties onto an initial scaffold, using suitable
orthogonal chemical reactions and/or protecting strategies, followed by the sequential
addition of the corresponding DNA codes. Our experiments have shown that it should
be possible to construct DNA-encoded libraries containing over one million
individual chemical compounds. The construction of such libraries is currently in
progress.
9
RIASSUNTO L’isolamento di sostanze organiche in grado di interagire specificamente con target
biologici è un problema cruciale sia in chimica, biologia che in campo farmaceutico.
Di conseguenza sta emergendo un crescente interesse in sviluppare nuove rapide ed
efficienti tecnologie per la costruzione e lo screening di ampie raccolte (“librerie”) di
composti organici. Un’innovativa e brillante soluzione a questo problema è
rappresentato dalle librerie chimiche “DNA-encoded”. Essenzialmente queste
tecniche prevedono la costruzione di librerie di composti organici in cui ciascun
membro è covalentemente coniugato a uno specifico frammento di DNA che
“codifica” inequivocabilmente la sua natura. Per tanto, la selezione di composti
d’interesse con specifiche attività biologiche (“screening”) utilizzando librerie “DNA-
encoded” può essere facilmente eseguita incubando ad esempio la libreria con
l’opportuno target biologico immobilizzato su un supporto solido. Dopo aver escluso i
composti non-leganti, attraverso appropriati lavaggi del supporto, le moderne tecniche
di “high-throughput sequencing” permettono di sequenziare gli specifici codici di
DNA, di determinare la composizione della libreria prima e dopo la selezione e di
conseguenza di identificare i composti effettivamente interagenti con il target
biologico d’interesse. Da questo punto di vista le librerie chimiche “DNA-encoded”
racchiudono un’intrinseca analogia con le librerie di fagi utilizzate nella “phage
display”, in cui ciascuna proteina o peptide (“fenotipo”) è fisicamente associata al
corrispondente gene codificante (“genotipo”).
Nella prima parte di questa Tesi è descritta una strategia generale per la costruzione di
librerie chimiche “DNA-encoded” e l’implementazione delle tecniche di “high-
throughput sequencing” per l’identificazione e la relativa quantificazione dei membri
della libreria prima e dopo la selezione. La metodologia è qui esemplificata nella
costruzione di una libreria chimica “DNA-encoded” contenente 4’000 composti
(DEL4000) ciascuno univocamente identificato tramite specifici DNA-oligonucleotidi
covalentemente coniugati. In seguito è stata determinata la relativa composizione
della libreria e la sua funzionalità eseguendo esperimenti di selezione impiegando
strptavidina immobilizzata su resina di sefarosio. Questi studi hanno condotto
all’identificazione di nuovi composti chimici con costanti di dissociazione sub-
10
micromolare verso la streptavidina e hanno inoltre dimostrato che le tecniche di
“high-thoughput sequencing” (denominate “tecnologie 454”), originariamente
sviluppate per la sequenziazione del genoma, possono essere efficacemente impiegate
nel processo di decodifica delle selezioni.
In una seconda selezione utilizzando DEL4000 sono stati identificati composti
specifici per “polyclonal human IgG”. E’ stato quindi dimostrato che tali composti, in
seguito a immobilizzazione su resina cromatografica, possono essere utilizzati nella
purificazione per affinità di IgG umani da supernatanti derivanti da colture cellulari.
Infine è stata eseguita una selezione per l’identificazione di nuovi composti specifici
per il dominio catalitico del “human matrix metalloproteinase 3” (MMP3). Le “matrix
metalloproteinases” (MMPs) sono una famiglia di proteasi zinco-dipendenti coinvolte
nel rimodellamento del tessuto in una varietà di processi fisiologici e patologici. La
selezione ha permesso l’identificazione di un composto con costante di dissociazione
micromolare.
Incoraggiati da questi risultati, abbiamo deciso di approfondire le ricerche per la
costruzione di una libreria chimica “DNA-encoded” di dimensioni superiori
prevedendo la congiunzione sequenziale di almeno tre serie indipendenti di composti,
utilizzando reazioni chimiche ortogonali e/o strategie di protezione/deprotezione di
gruppi funzionali, seguita dall’introduzione di corrispondenti codici di DNA. E’ stata
quindi dimostrata la possibilità di costruire una libreria chimica “DNA-encoded”
contenente oltre un milione di composti. La costruzione di questa libreria (DEL10e6)
è attualmente in corso.
11
List of abbreviations
aq. aqueous
ATP Adenosine-5'-triphosphate
bp base pair
CAII Carbonic Anhydrase II
CHO Chinese Hamster Ovary
CNBr Cyanobromide
Cy3 2-((1E,3E)-3-(1-(5-(2,5-dioxopyrrolidin-1-yloxy)-5-oxopentyl)-3,3-dimethylindolin-2-ylidene)prop-1-enyl)-3,3-dimethyl-1-propyl-3H-indolium
Cy5 2-((1E,3E,5E)-5-(1-(5-(2,5-dioxopyrrolidin-1-yloxy)-5-oxopentyl)-3,3-dimethylindolin-2-ylidene)penta-1,3-dienyl)-1,3,3-trimethyl-3H-indolium
DCM Dichloromethane
DEL DNA Encoded Library
DIEA N,N'-Diisopropyethylamine
DMBAA Dimethylbuthylammonium acetate
DMF N,N'-Dimethylformamide
DMSO Dimethylsulfoxyde
DMT dimethoxytrityl
DNA Deoxyribonucleic acid
dNTPs deoxyribonucleotides
DTT Dithiothreitol
ECM Extracellular Matrix
EDC N-ethyl-N'-(3-dimethylaminopropyl)-carbodiimide
EDTA Ethylenediamineetracetic acid
12
equiv. equivalent
ESAC Encoded Self-Assembling Chemical library
ESI Electrospray ionization
FG Functional Group
FITC Fluorescein isothiocyanate
Fmoc (9-fluorenylmethoxycarbonyl)
HATU O-(7-Azabenzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate
HBTU 2-(1H-benzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate
HFIP 1,1,1,3,3,3-hexafluoroisopropanol
HOBt N-hydroxybenzotriazole
HPLC High Performance Liquid Chromatography
HSA Human Serum Albumin
HTS High Throughput Screening/Sequencing
IgG Immunoglobulin G
Kd Dissociation constant
LC Liquid Chromatography
MMP3 human Matrix MetalloProteinase 3
MS Mass Spectrometry
ND NanoDrop
NHS N-hydroxysuccinimmide
NMR Nuclear magnetic resonance
Nvoc 4,5-dimethoxy-2-nitrobenzylmethoxycarbonyl
PAGE Polyacrylamide gel electrophoresis
13
PBS Phosphate buffered saline
PCR Polymerase Chain Reaction
Prep Preparative
RNA Ribonucleic acid
RP Reverse Phase
SDS sodium dodecyl sulfate
SNP Single Nucleotide Polymorphism
SOLiD Sequencing by Oligonucleotides Ligation and Detection
sst single strand
TBE Tris-borate-EDTA
TEAA Triethylammonium acetate
THF Tetrahydrofuran
TFA Trifluoroacetic acid
TFE Trifluoroethanol
Tris 2-Amino-2-hydroxymethyl-propane-1,3-diol
tSMS True Single Molecule Sequencing
Tween 20 Polyoxyethylene (20) sorbitan monolaurate
UV Ultraviolet
14
2. INTRODUCTION
The discovery of molecules binding to macromolecular targets is formidable task in
chemistry, biology and pharmaceutical sciences. Following the sequencing of the
human genome1,2, the advances in proteome research3,4 and transcriptomics5, a
multitude of biological targets associated with relevant processes in healthy and
diseased cells have been discovered. With an aging population and an increased
understanding of the mechanisms of disease at a molecular level, biomedical scientists
are facing the demand for more and better drugs. Additionally, elucidation of the
biological function of proteins will, in many cases, require access to specific ligands
(an approach that is often termed ‘Chemical Genetics’4). Even though the specific
binding to the biological target is not per se sufficient to turn a binding molecule into
a drug, as it is widely recognized that other molecular properties (such as
pharmacokinetic behaviour and stability) contribute to the performance of a drug.
Nevertheless the isolation of specific binders against a relevant biological target
typically represents the starting point in the process, which leads to a new drug.6
Techniques for the general, fast, inexpensive isolation of small, organic, binding
compounds are lacking at present. Currently, hundreds of thousands of molecules
typically have to be screened, in order to find a suitable candidate.6 High-throughput
screening (HTS) in certain cases allows the screening of some 100,000 compounds
per day. However, HTS is cumbersome both in terms of costs (for robotic equipment
and material consumption) and technical development (set-up of sophisticated bio-
assays, storage and handling of the chemical archives). Similarly, the preparation,
storage and screening of very large synthetic libraries of organic molecules can be
very demanding, not only from the synthetic point of view, but also in terms of
logistics. Although combinatorial synthetic approaches such as the intriguing “split-
&-pool”7,8,9 methods and solid phase synthesis10,11,12 facilitated the construction of
chemical pools of compounds, inevitably the complexity associated to the specific
binding molecules grows together with the size of the chemical library to be screened
while the relative concentration of each individual member in the library decreases.
Consequently, chemical libraries as pool of compounds are often limited in size due to
15
sensitivity limits of biochemical assays and of the chemical analytical methods for
structural characterization.
Over the last decade, the interest in the development of powerful and convenient
technologies for chemical library construction and screening has increased
dramatically. Techniques such as phage display13,14, yeast display15, ribosome
display16 and covalent display17. In this light it would be useful to devise strategies for
the identification of small organic molecules, capable of binding to target proteins
with high affinity and specificity, based on the association of individual chemical
compounds to unique DNA-fragments serving as identification bar-codes.
16
2.1 DNA-Encoded Chemical Libraries The concept of DNA-encoding was first described in a theoretical paper by Brenner
and Lerner in 1992 who anticipated a “split-&-pool”-based combinatorial synthesis in
which monomeric chemical compounds and coding oligonucleotide tags would be
attached on beads in an alternated fashion (Figure 2-1).18 Shortly afterwards, the first
practical implementation of this approach was presented by S. Brenner and K. Janda19
and similarly by the group of M.A. Gallop20. Brenner and Janda suggested to generate
individual encoded library members by an alternating parallel combinatorial synthesis
of the heteropolymeric chemical compound and the appropriate oligonucleotide
sequence on the same bead in a “split-&-pool”-based fashion, using the solid support
as a structural linker between the nascent chemical entity and its corresponding
oligonucleotide label. Therefore they developed as a test system the synthesis of a
functionally active leucine-enkephalin pentapeptide, with the aim of testing the
feasibility of alternating peptide and oligonucleotide synthesis on the bead. Totally
they accomplished five alternating rounds of peptide and oligonucleotide synthesis.19
............
......
......
Split1 Pool1
aa1
aa2
aa3
aan
tag1
tag2
tag3
tagn
1. m roundssplit-&-pool
2. Release from beads
aax
aa3aa2aa1
tag1tag2tag3
tagm
nm compounds
Figure 2-1: Schematic representation of the DNA-encoding of peptides on beads. The coupling of
amino acids by peptide forming reaction to a growing peptidic chain, alternated to the stepwise
synthesis of a DNA bar-code lead to DNA encoded beads displaying peptides, which can be probed for
binding to selected target protein of interest. ‘aa’ represents the different amino acids, while ‘tag’ refers
to a DNA sequence encoding the corresponding amino acid added in the split-&-pool procedure.
17
Controlled-pore glass was used as a solid matrix to facilitate an efficient
oligonucleotide synthesis. The solid support was derivatized with a succinyl
aminohexanol-sarcosine appendage that allowed the easy detachment of the
oligonucleotide-encoded peptide after synthesis (Figure. 2-2a). In order to fulfill
orthogonality requirements, O-DMT-protected serine and N-Fmoc protected lysine
scaffolds were used for the attachment of the emerging oligonucleotide and peptide
sequences, respectively (Figure. 2-2a). The oligonucleotide-tagged peptides were
released from the beads and Edman-sequenced. The leucine-enkephalin pentapeptide
(YGGFL) constructed in this fashion (Figure. 2-2b) was shown to bind to the anti-
leucine-enkephalin antibody 3-E7 as efficiently as the reference peptide21 (Kd = 7.1
nM). Remarkably, the codes of released oligonucleotide-tagged peptides could be
amplified by standard polymerase chain reaction (PCR).
NH
NO (C H2)6N H
HN
OO
NH
OO
O
O
O O
O
DM T
O
O
NH Fm o c
Site for the nascent peptide
Site for the oligonucleotide code
Cleavage site
a)
5’-AGCTACTTCCCAAGG GAG CTG CTG CTA GTC GGGCCCTATTCTTAG-3’ LINKER LNHPGGY
Peptide sequencePeptide sequence
PCR priming sitePCR priming site
b)
Figure 2-2: Derivatized solid support allows the oligonucleotide encoding of a nascent peptide
sequence. a) Schematic representation of the derivatized support with succinyl aminohexanol-sarcosine
cleavable appendage. The cleavable linker enables the easy detachment of the oligonucleotide-encoded
peptide after synthesis, while O-DMT-protected serine and N-Fmoc protected lysine allows the
bidirectional synthesis of oligonucleotide and peptide sequences. Recent approaches to DNA-encoded
chemical libraries prefer to omit the beads and link the compounds directly to DNA. b) Leucine-
enkephalin pentapeptide (YGGFL) oligonucleotide conjugate after release from beads. The codes of
released oligonucleotide-tagged peptides could be amplified by standard polymerase chain reaction
(PCR). Leucine-enkephalin pentapeptide was shown to bind to the anti-leucine-enkephalin antibody 3-
E7 as efficiently as the reference peptide (Kd = 7.1 nM)
18
In the same year, Gallop and co-workers constructed an 823´543-member DNA-
encoded heptapeptide library performing seven alternating split-&-pool synthesis
cycles on spherical beads using seven different D- and L-amino acid building
blocks.20 Beads were conjugated with a mixture of two different linkers, one of which,
with a DMT-protected hydroxyl group serving for the stepwise nucleotide addition,
while the other in ca. 20-fold excess over the first one,with an Fmoc-protected amine
was used for building up the polypeptide. After removal of the Fmoc group the beads
were uniformly split into seven pools and reacted with one of the seven amino acid
building blocks. A dinucleotide coding tag was synthesized on the beads of the
individual pools and this process was repeated until the heptapeptide had been
obtained. An additional oligonucleotide sequence was attached to all beads to allow
PCR-based decoding. The final oligonucleotide cleavage in trifluoroacetic acid would
lead to the depurination of deoxyguanosine and deoxyadenosine, which were
therefore deliberately excluded from the oligonucleotide. The final library was
subjected to on-bead screening against the fluorescent monoclonal antibody D32.39
that specifically binds the heptapeptide RQFKVVT. The corresponding
oligonucleotide sequence could be revealed after FACS-based sorting and PCR.
Since unprotected DNA is restricted to a narrow window of conventional reaction
conditions, until the end of the 1990s a number of alternative chemical and physical
encoding strategies were envisaged (i.e. MS-based compound tagging, peptide
encoding, haloaromatic tagging, encoding by secondary amines, semiconductor
devices.) 22, mainly to avoid inconvenient solid phase DNA synthesis and to create
easily screenable combinatorial libraries in high-throughput fashion.
There is considerable evidence that the isolation of binding polypeptides (e.g.
antibodies) requires libraries comprising at least >107-108 members23. In full analogy,
it appears reasonable to assume that large libraries will facilitate the isolation of small
organic binders to protein of interest. However, using conventional methods, even the
largest pharmaceutical companies cannot screen more then few hundred thousands
compounds in HTS campaign. The selective amplifiability of DNA greatly facilitates
library screening and it becomes indispensable for the encoding of organic
compounds libraries of this unprecedented size. Consequently, at the beginning of the
2000s DNA-encoded combinatorial chemistry experienced a revival.
19
Around 2002 several groups realized that omitting beads and attaching chemical
compounds directly to oligonucleotides or DNA fragments could conveniently lead to
very large DNA-encoded chemical libraries. The set-up of DNA-encoded chemical
libraries (DEL) was pursued investigating completely novel avenues. The resulting
libraries can be grouped in libraries DNA-encoded presenting single or multiple
oligonucleotides displaying one covalently linked putative binding molecules
(Figure 2-3).
5‘
3‘
5‘
3‘ 5‘
3‘
a) b)
Multiple pharmacophore format Single pharmacophore format
Figure 2-3: Schematic representation of DNA-encoded library displaying chemical compounds
directly attached to oligonucleotides. a) DNA-encoded library presenting multiple pairing
oligonucleotides each displaying a covalently linked binding molecule. b) DNA-encoded library
presenting a single oligonucleotide covalently linked to a putative binding molecule.
20
2.1.1 Libraries of DNA displaying one covalently linked chemical entity
2.1.1.1 DNA-encoded “Split-&-Pool”
An alternative strategy to construct DNA-encoded library in full analogy with the
encoded “split-&-pool” technique described by Brenner and R. Lerner18, features the
synthesis of chemical compounds directly on the oligonucleotide, omitting the use of
the solid support (i.e., beads) (Figure 2-4). Initially a set of unique oligonucleotides
each containing a specific sequence is chemically conjugated to a corresponding set of
small organic molecules carrying a suitable reactive group. Typically a carboxylic
acid is coupled to amino-modified oligonucleotide. Consequently the oligonucleotide-
conjugate compounds are mixed and divided into a number of groups.
x
x
x
x
x
x
x
x
Split1
x
xx
x
Pool1
tag1
tag2
tag3
tagn
bb1
bb2
bb3
bbn
m roundssplit-&-pool
x
xx
x
Reactive site
x
x
x
x
nm compounds
Figure 2-4: Schematic representation of hypothetical DNA-encoded libraries of linear peptides
constructed in a split-&-pool fashion omitting bead support. An initial building block is conjugated to
oligonucleotide and encoded with a further set of oligonucleotide either by ligases or by polymerase.
Consequently the oligonucleotide-conjugate compounds are mixed, divided into a number of groups
and reacted again with an additional building block. Following encoding, these steps are repeated a
given number of times. ‘bb’ represents the different building block, while ‘tag’ refers to a DNA
sequence encoding the corresponding amino acid added in the split-&-pool procedure.
In appropriate conditions a second set of building blocks are coupled to the first one
and a further oligonucleotide which is coding for the second modification is
21
hybridized to the initial oligonuclotide and enzymatically encoded either by ligases or
by polymerase. In a “split-&-pool” fashion these steps are then repeated. In 2002 the
Danish company Nuevolution and the US company Praecis filed patent applications
for proprietary enzymatic ligation strategies for DNA code assembly enabling
sequential chemical synthesis and DNA-tagging steps.24,25,26,27 Thus far, the two
companies have not yet described practical library application in the literature.
2.1.1.2 DNA-assisted “Split-&-Pool”
In 2004, D.R. Halpin and P.B. Harbury presented a novel intriguing method for the
construction of DNA-encoded libraries.28 For the first time the DNA-conjugate
templates served for both encoding and programming the infrastructure of the “split-
&-pool” synthesis of the library components. The design of Halpin and Harbury
enabled alternating rounds of selection, amplification and diversification with small
organic molecules, in complete analogy to phage-display technology.
In a further milestone paper on DNA-encoded chemical libraries, Halpin and Harbury
demonstrated the efficiency of unique DNA-routing machinery, consisting of series-
connected columns bearing resin-bound anticodons, which could sequence-
specifically separate a population of DNA-templates into spatially distinct locations
by hybridization (termed DNA-routing), (Figure 2-5).28 A 340-mer oligonucleotide
template combinatorial library was constructed in two steps by PCR assembly of
overlapping complementary 40-mer oligonucleotides which contained a 20 base
coding and an adjacent 20-mer non-coding constant region. Therefore, a 108 member
340-mer DNA-duplex template library was obtained which was further converted into
single-stranded DNA format by reverse-transcription and sodium hydroxide
hydrolysis of the RNA strand. These templates were used for investigating the
feasibility of sequence-specific gene routing. A number of anticodon columns were
produced in which the anticodon sequences to the template genes were covalently
coupled to sepharose resin. In high salt conditions, the template genes hybridized
sequence-specifically to the corresponding anticodon columns connected in series.
The individual sequence-specific columns were then joined in series with weak anion-
exchange (DEAE) columns. When changing the conditions from high salt to low salt
and 50% DMF, the oligonucleotides were eluted from the anticodon columns and
could bind to the DEAE columns, where chemical reaction can take place. Following
22
elution from the DEAE columns in high salt conditions the combined DEAE column
eluates were split again by sequence-specific columns, thus entering a new cycle of
“split-&-pool” synthesis. Using a radioactively labelled 340-mer template the authors
showed that the routing was indeed both sequence-specific and efficient (>95% for
anticodon to DEAE column and >90% for DEAE to anticodon column), resulting in
an overall yield of 0.85n for n hybridization rounds. Furthermore, the anticodon
columns proved to be reusable for at least 30 rounds of hybridization and elution.
NH2
NH2NH2 NH2
. . . . . . . . .
NH NH NH
. . . . . . . . .
NH2 NH2 NH2
NHNH2
NH
z7 z6 z5 z4 z3 z2 z1
(a-j)1(a-j)2(a-j)3(a-j)4(a-j)5(a-j)6
a1
a*1 b*
1
b1 j1
j*1
a1 b1 j1
(a-j)1(a-j)2(a-j)3(a-j)4(a-j)5(a-j)6
(a-j)1
z7 z6 z5 z4 z3 z2 z1
Split
Coupling
NH2 NH2 NH2
Pool
6 roundsSplit-&-pool
Figure 2-5: Synthesis of a DNA-encoded chemical library by ‘DNA-routing’. The initial
oligonucleotide template contains six coding regions for ten different amino acids [(a-j)1-6] as well as
seven constant domains (z1-7). The library of coding oligonucleotides, comprising all the possible
combinations of the different coding regions was split by affinity chromatography using specific
complementary oligonucleotides bound on resin [(a*-j*)1-6]. Following separation, each
oligonucleotide template was conjugated to the corresponding amino acid and subsequently pooled
together. The whole cycle was repeated totally six times, yielding to a library of DNA-encoded
hexapeptides.
According to this split-and-pool protocol (Figure 2-5), a combinatorial library
composed of 106 N-acylated pentapeptides conjugate to 340-mer oligonucleotides was
generated.29 Ten different amino acid building blocks were used for the first positions
and nine carboxylic acids for the N-acylation step. The library included acylated
leucine-enkephalin pentapeptides as positive control. After conversion into a DNA
duplex form, the library was subjected to an affinity-based selection against the
23
monoclonal antibody 3-E7, which was known to bind the leucine-enkephalin
pentapeptide YGGFL with 7.1 nM affinity30 (the same selection system was used by
Brenner und Janda in 1993)19. Two iterated cycles of panning were performed. The
eluted DNA from the first round was PCR-amplified and used as input for the
following round of synthesis and selection. After sequencing both input DNA and
eluted DNA after two rounds of panning a strong round-to-round of leucine-
enkephalin pentapeptide DNA conjugates could be demonstrated, leading to a
consensus sequence matching leucine-enkephalin.
To confirm that the coding sequences did not bias the synthesis of leucine-enkephalin
DNA-conjugates, an analogous DNA-pentapeptide library was constructed, differing
only in the coding sequences. Selections performed with this library also evidenced a
105-fold enrichment of the leucine-enkephalin encoded compound.
This novel embodiment of “split-&-pool” library construction, together with the
possibility of chemical translation and diversification, holds promises for the
construction of large DNA-encoded chemical libraries. While the set-up of the routing
technology seems to be tedious at a first glance, exponentially larger libraries can be
constructed with only a linear increase of work. Yet, chemistry has so far been limited
to peptide synthesis. In an additional publication, Harbury and co-workers describe
the feasibility and efficiency of solid phase peptide synthesis on unprotected DNA.31
Yields over 90% per individual coupling step could be achieved which might be
sufficient for the construction of big libraries. Future selection experiments will reveal
whether the accumulation of synthesis failure sequences accumulating from step to
step does not encumber the identification of the best binders. From a drug discovery
point of view, the linear peptides which so far have been produced by this approach
may not represent the drug-like structures pharmaceutical industry is interested in.32
Nonetheless the potentiality of this technology can probably be increased by enlarging
the repertoire of building blocks and expanding the range of chemical reactions.
24
2.1.1.3 DNA-templated synthesis
In 2001 David Liu and co-workers showed that complementary DNA
oligonucleotides can be used to assist certain synthetic reactions, which do not
efficiently take place in solution at low concentration.33,34 At the same time,
Summerer and Marx demonstrated that the use of reagents in close spatial proximity
may lead to an enhancement of reaction rates.35 Indeed, a DNA-heteroduplex can be
used to accelerate the reaction between chemical moieties displayed at the extremities
of the two DNA strands.33,34D.R. Liu and coworkers were the first to show an
efficient series of solution-phase DNA sequence-programmed chemical reactions. In
these reactions, oligonucleotides carrying one chemical reactant group are hybridized
to complementary oligonucleotide derivatives carrying a different reactive chemical
group (Figure 2-6).36 The close proximity conferred by the DNA hybridization
drastically increases the effective molarity of the reaction reagents attached to the
oligonucleotides, enabling the desired reaction to occur even in an aqueous
environment at concentrations which are several orders of magnitude lower than those
needed for the corresponding conventional organic reaction not DNA-templated.36 A
variety of oligonucleotide-derivatives can be paired and can be used to discover novel
chemical reactions.36,37
Figure 2-6: DNA sequence-programmed chemical reactions: schematic overview of the reactions
compatible with the ‘DNA-templated synthesis’ approach. The close proximity conferred by the DNA
hybridization drastically increases the effective molarity of the reaction reagents attached to the
oligonucleotides, enabling the desired reaction to occur. (Adapted from Li, X. and Liu, D.R.36)
25
To a certain extent, this proximity effect which accelerates bimolecular reaction is
distance-independent (at least within a distance of 30 nucleotides), allowing the
introduction of variable DNA coding regions on the oligonucleotide template at
different position. These DNA-templated reactions can be performed in multiple
consecutive steps38 and in step-programmed fashion39. Crucially, by linking chemical
compounds directly to DNA, a linkage of phenotype and genotype may be
established, in full analogy to protein display methodologies. Subsequently the
information content can be amplified by PCR after affinity capture. In a later step,
sequence-programmed synthesis of DNA-conjugates may facilitate library
amplification after selection. The selection efficiency which could be achieved with
DNA-encoded binding molecules and affinity captures, was investigated by
performing selections on glutathione S-transferase with suitable inhibitors, revealing
enrichment factors of the cognate DNA derivatives up to 10,000-fold.40 Recently, Liu
and co-workers described the DNA-templated set-up of a small library of macrocycles
which they subjected to in vitro selection (Figure 2-7).41 For this purpose, a 48-base
DNA-template library comprising 48-mer oligonucleotides carrying an amino group
at 5’ end and containing three consecutive coding regions was used. A lysine was
coupled to the primary amino group at the oligonucleotide extremity by amide bond
reaction formation. The lysine was ε-protected by acylation with a compound
containing a vicinal diol, which allows the cleavage to an aldehyde which serves for
the final ring-closing step through a Wittig-olefination. Initially a code-1
complementary 10-mer oligonucleotide, carrying both a biotin at its 5′ end and an
amino acid N-protected with a base-labile cleavable linker at its 3′ terminus, was
hybridized to the template. The free carboxylic acid moiety of the protected amino
acid was activated to a sulfo-N-hydroxysuccinimidyl ester and covalently reacted with
the free-amino group displayed on the 48-mer template oligonucleotide to form an
amide bond. A purification step of the resulting covalent conjugate was obtained by
capture on avidin-coated beads which retained all biotin-containing fragments, thus
washing away residual, not covalently conjugate 48-mer template oligonucleotide.
26
...
... ...
. . . ...
... 1
2
n
Library of n DNA templates
Reagent Library 1
Annealing and DNA-templated reaction 1
Reagent Library 2
Annealing and DNA-templated
reaction 2
Reagent Library 3
Annealing and DNA-templated
reaction 3
1
2
n
Ring
closure
Selection with
target protein
PCR-selectionusing primer
DNA-sequencing Binder synthesis
Enter next round:Reconstitute enriched
library members
Enrichedconjugates
Figure 2-7: Schematic representation of a DNA-encoded library by ‘DNA-templated synthesis’. A
library of oligonucleotides (i.e, 64 different oligonucleotides) containing three coding regions was
hybridized to a library of reagent compound-oligonucleotide conjugates (i.e., 4 reagent oligonucleotide
conjugates), able of pairing with the initial coding domain of the template oligonucleotide. After
transferring of the compounds on the corresponding olgonucleotide template, the synthesis cycle was
repeated the desired number of times with further sets of carrier compound-oligonucleotide conjugates
(i.e., two rounds with four carrier compound-oligonucleotide conjugates per round). Subsequently
functional selection was performed and the sequence of the binding template amplified by PCR. Thus,
DNA-sequencing allowed the identification of the binding molecule. In the construction of the 65
member library, the 65th template which served as positive control was also subjected to the DNA-
templated synthesis scheme.
By increasing the pH, the base-labile linker could be cleaved and the reaction product
(i.e., the α-amino acylated 48-mer DNA fragment) could be eluted. This procedure
was repeated with an additional code-2 specific specific 12-mer reagent
oligonucleotide and a code-3 specific 12-mer reagent oligonucleotide. In the last
coupling step, the reagent amino acid building block was connected to the
oligonucleotide not by a base-labile linker, but with a linker containing a
phosphonium group. After the third conjugation step and avidin-coated resin
purification, the geminal diol linker of the α-amino group of the 48-mer template was
cleaved by periodate and the resulting aldehyde could undergo a Wittig-olefination to
form a fumaramide, leading to ring-closure to a macrocycle. As in the course of the
27
Wittig reaction the P–C bond between reagent oligonucleotide and template
oligonucleotide was broken, the desired macrocycle-template conjugate self-eluted
from the avidin beads. The authors generated a 65-fumaramide macrocycles library,
starting from four initial building blocks for the three synthetic steps plus plus one
additional aryl sulfonamide building block in the first step which was known to bind
to carbonic anhydrase with nanomolar affinity. The DNA-template of the positive
control included a NlaIII restriction site, which facilitated the monitoring of the
enrichment after the selection by polyacrylamide gel electrophoresis (PAGE)
following PCR amplification and NlaIII digestion. 100 fmol of the DNA-conjugate
macrocycle library were subjected to an in vitro experiment against immobilized
carbonic anhydrase. In a further pseudo-round of selection the eluted DNA was again
loaded onto a carbonic anhydrase column. As decoding strategy of the positive
control binder, the DNA was PCR-amplified and NlaIII digested before selection and
after each elution. Liu and coworkers demonstrated that a significant enrichment of
the positive control oligonucleotide-macrocycle conjugate was detectable after the
second elution. However, the decoding method described in the paper41 was quite
rudimentary and not directly applicable to libraries of larger size. Furthermore, the
possibility to re-synthesize the unbiased library after selection was not demonstrated.
Assisting oligonucleotide strands and proximity-based chemical reactions may
represent an alternative to “split-&-pool” strategies for the construction of large
libraries in solution. While amide bond forming reactions have so far been used for
library construction, it is expected that different chemistries may be used in order to
generate non-peptidic structures. The group of Liu considered a variety of other
possible reaction, which may occur in the presence of DNA (Figure 2-6).36
Additionally, even though the overall yields for the multi-step synthesis of DNA-
encoded compounds were not excellent (approx. 5% over three steps), the use of
avidin resins for products purification contributed to the purity of library compounds.
Nevertheless, quality controls of library synthesis may become more difficult for
libraries of larger size. In this light, DNA-templated synthesis method as the one
described by D.R. Liu and co-workers for constructing libraries with complexities of
pharmaceutical interest remains at present a formidable challenge.
28
2.1.1.4 Stepwise coupling of coding DNA fragments to nascent organic molecules
A promising strategy for the construction of DNA-encoded libraries is represented by
the use of multifunctional building blocks covalently conjugate to an oligonucleotide
serving as a “core structure”for library synthesis. In a ‘spit-&-pool’ fashion a set of
multifunctional scaffolds could undergo orthogonal reactions with series of suitable
reactive partners. Following each reaction step, the identity of the modification could
be encoded by an enzymatic addition of DNA segment to the original DNA “core
structure” (e.g., by ligation, Figure 2-8). This feature has been exploited for the first
time by our group.42,43 Initially we envisaged the use either of a variety of N-protected
amino acids or of diene carboxylic acid derivatives. The use of N-protected amino
acids covalently attached to a DNA fragment allow, after a suitable deprotection step,
a further peptide bond formation with a series of carboxylic acids or a reductive
amination with aldehydes. Similarly, diene carboxylic acids used as scaffolds for
library construction at the 5’-end of amino modified oligonucleotide, could be
subjected to a Diels-Alder reaction with a variety of maleimide derivatives.
FG2
FG2
FG2
FG1
FG1
FG1
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
FG2
. . . . . . . . . . . .
EncodingSplit /Reaction
Pool EncodingSplit /Reaction
Pool....
....
Figure 2-8: Schematic representation of a DNA-encoded library by stepwise coupling of coding DNA
fragments to nascent organic molecules. An initial set of multifunctional building blocks (FGn
represents the different orthogonal functional groups) are covalently conjugate to a corresponding
encoding oligonucleotide and reacted in a split-&-pool fashion on a specific functional group (FG1 in
red) with a suitable collection of reagents. Following enzymatic encoding, a further round of split-&-
pool is initiated. At this stage the second functional group (FG2 in blue) undergoes an additional
29
reaction step with a different set of suitable reagents. The identity of the final modification could be
ensured yet again by enzymatic DNA encoding by means of a further oligonucleotide carrying a
specific coding region.
After completion of the desired reaction step, the identity of the chemical moiety
added to the oligonucleotide could be established by the annealing of a partially
complementary oligonucleotide and by a subsequent Klenow fill-in DNA-
polymerization, yielding a double stranded DNA fragment. The synthetic and
encoding strategies described above enable the facile construction of DNA-encoded
libraries of a size up to 104 member compounds carrying two sets of “building
blocks”. However the stepwise addition of at least three independent sets of chemical
moieties to a tri-functional core building block for the construction and encoding of a
very large DNA-encoded library (comprising up to 106 compounds) (see Chapter 3.3)
can also be envisaged.
Importantly we have found that selections of DNA-encoded chemical libraries can
conveniently be decoded after PCR amplification of the DNA-tags using recently
described high-throughput DNA sequencing technologies (such as “454 technology”),
which had originally been developed for genome sequencing (see Chapter 2.2.2).44
Recent advances in ultra high-throughput DNA sequencing allow the sequencing of
over one million sequence tags per sequencing run (see Chapter 2.2.2)44,45 and may
thus allow the decoding of DNA-encoded libraries containing millions of chemical
compounds.
30
2.1.2 DNA libraries displaying multiple covalently linked chemical
entities—ESAC libraries
Watson-Crick and Hoogsteen46 base pairing allow the sequence-specific assembly of
oligonucleotides to form stable heterodimers and heterotrimers, respectively. Our
laboratory has exploited this feature for the combinatorial self-assembly of
oligonucleotide-chemical compound conjugates.47 In principle, the self-assembly of
two sublibraries of a size of only 103 members containing a constant complementary
hybridization domain can yield a combinatorial DNA-duplex library after
hybridization with a complexity of 106 uniformly represented library members
(Figure 2-9).
Compound 1
Hybridizationdomain
Code 1
Target
Knownbinder
Target Target Target
a) b) c) d)
Target
I
II III
IV
V
VI
e)
Single pharmacophore Affinity maturation Duplex library Triplex library
Figure 2-9: ESAC library technology overview. Small organic molecules are coupled to 5’-amino
modified oligonucleotides, containing a hybridization domain and a unique coding sequence, which
ensure the identity of the coupled molecule. The ESAC library can be used in single pharmacophore
format (a), in affinity maturations of known binders (b), or in de novo selections of binding molecules
by self assembling of sublibraries in DNA-double strand format (c) as well as in DNA-triplexes (d).
The ESAC library in the selected format is used in a selection and read-out procedure (e). Following
incubation of the library (i) with the target protein of choice (ii) and washing of unbound molecules
(iii), the oligonucleotide codes of the binding compounds are PCR-amplified and compared with the
library without selection on oligonucleotide micro-arrays (iv, v). Identified binders/binding pairs are
validated after conjugation (if appropriate) to suitable scaffolds (vi).
31
A third strand can be added introducing Hoogsteen base pairing46. Hoogsteen and
reversed-Hoogsteen48,49 base pairing mediate the interaction of a third cognate
oligonucleotide with a Watson-Crick DNA double helix. Using a triplex DNA format,
three 103 member sublibraries could yield a 109 member library (Figure 2-9). Each
sub-library member would consist of an oligonucleotide containing a variable, coding
region flanked by a constant DNA sequence, carrying a suitable chemical
modification at the oligonucleotide extremity (Figure 2-9). This approach has been
termed ESAC (for Encoded Self-Assembling Chemical libraries). In contrast to the
library formats described in the previous section (see Chapter 2.1.1), in which only
one oligonucleotide in the DNA-heteroduplex would carry a chemical group, the
ESAC method enables multiple (i.e. single-, double-, triple-) oligonucleotides
displaying different pharmacophores. Moreover each sub-library member can be
individually produced and purified by HPLC in nanomolar quantities, thus enabling
reliable analytics and quality controls. These sublibraries can be used in at least four
different embodiments. In a first example, a sub-library can be paired with a
complementary oligonucleotide and used as a DNA encoded library displaying a
single covalently linked compound for affinity-based selection experiments (Figure
2-9a). Alternatively, a sub-library can be paired with an oligonucleotide displaying a
known binder to the target, thus enabling affinity maturation strategies (Figure 2-9b).
In a third embodiment, two individual sublibraries can be assembled combinatorially
and used for the de novo identification of bindentate binding molecules (Figure 2-9c).
Finally, three different sublibraries can be assembled to form a combinatorial triplex
library (Figure 2-9d). The multiple pharmacophore display approaches may lead to
high binding affinities, by virtue of a simultaneous engagement of adjacent binding
sites, thus exploiting the chelate effect in analogy to fragment-based drug discovery.50
The conjugation of two pharmacophores to the two strands of a DNA double helix
introduces a spacing of roughly 10-15 Ǻ, with some flexibility between the binding
moieties and the core DNA structure. Preferential binders isolated from an affinity-
based selection can be PCR-amplified and decoded on complementary
oligonucleotide microarrays51,52 (Figure 2-9e) or by concatenation of the codes,
subcloning and sequencing53. The individual building blocks can eventually be
conjugated using suitable linkers to yield a drug-like high-affinity compound. The
characteristics of the linker (e.g. length, flexibility, geometry, chemical nature and
32
solubility) influence the binding affinity and the chemical properties of the resulting
binder.
A first 138-member ESAC library (termed ‘elib1’ library) which consisted of
carboxylic acids covalently linked to 5′ amino-modified 48-mer oligonucleotides and
contained a biotin-oligonucleotide conjugate as positive control. The library was
hybridized with an oligonucleotide conjugated to a cyanine dye (irrelevant for the
binding) and subjected to affinity-based selection on streptavidin. A significant
enhancement of the biotin-oligonucleotide conjugate signal was observed after
selection and microarray-based decoding.47
In a second proof of principle, the 137-member ESAC library was employed in
affinity maturation experiments. A dansylamide and a benzoyl sulfonamide
conjugated at the 3’ extremity of an oligonucleotide were used as lead binders to
human serum albumin (HSA) and bovine carbonic anhydrase II (CAII) respectively.
The oligonucleotide derivatives were hybridized with the 137-member library and
subjected to selection using immobilized HSA and CAII. Following microarray-based
decoding, the enriched binding molecules were linked to the lead-binder with a set of
bifunctional linkers of different length and the affinities of the respective conjugates
towards the target protein were determined. The simultaneous engagement of the
lead-binder and the selected compound led to a 10–40-fold increase in affinity.47
Encouraged by the results, ‘Elib1’ ESAC library was extended from 137-compounds
to over 600 compound members and termed ‘elib2’ library. Thereby, a further series
of bio-panning experiments on streptavidin and HSA were performed, leading to the
identification after micro-array based read-out of novel target specific binding
molecules ranking dissociation constant from the mM to the fM range.54,55 Notably
the screening of the ‘Elib2’ ESAC library towards HSA allowed the isolation of the
4-(p-iodophenyl)butanoic moiety. The compound discovered by our group represents
the core structure of a series of portable albumin binding molecules and of
Albufluor™, a recently developed fluorescein angiographic contrast agent currently
under clinical evaluation.55
33
Recently, ESAC technology has been used by our group for the isolation of potent
inhibitors of bovine trypsin56 and for the identification of novel inhibitors of
stromelysin-1 (MMP-3)57, a matrix metalloproteinase involved in both physiological
and pathological tissue remodeling processes. Benzamidine, a trypsin inhibitor with
an IC50 value in the 100 μΜ range, was used aslead in an ESAC-based affinity
maturation procedure. 5-(4-carbamidoylbenzylamino)-5-oxopentanoic acid was
conjugated at the 3’-end of an amino-modified oligonucleotide and hybridized with a
620-member ESAC sublibrary. After selection using immobilized trypsin and
microarray-based decoding, a number of bidentate binders were identified and
synthesized, allowing for different linkers connecting the benzamidine moiety to the
other pharmacophore identified in the ESAC procedure. The most active inhibitor
exhibited an IC50 value of 98 nM, but various bidentate ligands also revealed a
dramatically improved affinity, compared to a set of parental benzamidine derivatives,
whose IC50 values were in the 11-220 μM range. Similarly for the identification of
novel inhibitors of stromelysin-1 matrix metalloproteinase (MMP-3) an ESAC library
of 550 DNA-encoded chemical compounds was used. After selection on immobilized
MMP-3 and microarray-based decoding, the best candidate was conjugated to the
amino-modified 3′-extremity of a 24-mer oligonucleotide capable of pairing with the
initial 550 member ESAC sublibrary and used as lead for affinity maturation
selections. After a second round of selection enrichment of one synergistic binding
moiety was identified. The newly discovered pharmacophores were used for the
synthesis of low-molecular weight bidentate MMP-3 inhibitors with a series of
diamino linkers. The bidentate binder was superior compared to DNA conjugates
displaying the individual pharmacophores or no pharmacophore at all. After
measuring the corresponding inhibition constants to MMP-3, the best binder exhibited
an IC50 value of 9.9 µM.
In most cases, the spatial arrangement and the flexibility associated to the linker used
to conjugate the two pharmacophores identified after ESAC-library selection,
dramatically influence the binding affinity of the corresponding bidentate ligand. The
identification of optimal linkers may sometimes be a tedious procedure. Furthermore
the decoding of ESAC library in a multiple DNA-stranded format comprising over
104 compounds as for the de novo identification of binding molecules (Figure 2-9c, 2-
9d) cannot be efficiently achieved by a microarray-based approach due to suboptimal
34
read-out quality and to physical spotting limitation. In principle, high-throughput
sequencing techniques could be considered for the decoding of selections performed
with ESAC libraries (see Chapter 2.2.2).58
2.2 The decoding of DNA-encoded chemical libraries
The identification of specific binding compounds from DNA-encoded chemical
libraries requires the use of affinity-based selection strategies and of suitable decoding
techniques. Generally, selections are performed by capture of binding compounds on
a target protein, immobilized on a solid support. The stringency of both capture and
washing steps crucially influences the outcome of affinity selections.19,20,29,41,47 The
decoding strategy also greatly contributes to the successful use of DNA-encoded
chemical libraries. So far, most groups active in DNA-encoded libraries research
often used rudimentary techniques, mainly aiming at demonstrating the feasibility of
the DNA-encoded strategy principle, rather than exhaustively analyzing the decoding
aspect of the selection.19,20,29,41 Although many authors implicitly envisaged a
traditional Sanger-sequencing-based decoding (for an overview on Sanger sequencing
see Ref 65), the number of codes to sequence simply according to the complexity of
the library is definitely an unrealistic task for a traditional Sanger-sequencing
approach. If one assumes a library complexity of 106 and an enrichment factor of 100
for good binders versus non-binders in a round of selection then, statistically, 105
sequences are required to identify preferential binding compounds with suitable
confidence. Furthermore the number of sequences to be read is destined to grow up
together with the increase of library size. Nevertheless a first implementation of
Sanger-sequencing for decoding DNA-encoded chemical libraries in high-throughput
fashion was described by our laboratory.47 After selection and PCR amplification of
the DNA-tags of the library compounds, concatamers containing multiple coding
sequences were generated and ligated into an EcoRI-digested pUC19 vector.
Following sequencing of a representative number of the resulting colonies revealed
the frequencies of the codes present in the ESAC DNA sample before and after
selection. Besides the Sanger-sequencing-based decoding, our group investigated
microarray-based47 methodology and very recently implemented the novel robust
high-throughput sequencing techniques for efficiently decoding DNA-encoded
libraries42.
35
2.2.1 Microarray-based decoding A DNA microarray is a device for high-throughput investigations widely used in
molecular biology and in medicine.59 It consists of an arrayed series of microscopic
spots (‘features’ or ‘locations’) containing few picomoles of oligonucleotides carrying
a specific DNA sequence (Figure 2-10). This can be a short section of a gene or other
DNA element that are used as probes to hybridize a DNA or RNA sample under
suitable conditions. Probe-target hybridization is usually detected and quantified by
fluorescence-based detection of fluorophore-labeled targets to determine relative
abundance of the target nucleic acid sequences. In standard microarrays, the probes
are attached to a solid surface by a covalent bond to a chemical matrix (via epoxy-
silane, amino-silane, lysine, polyacrylamide or others). The solid surface can be glass
or a silicon chip, in which case they are commonly known as gene chip (Affy-chip
when an Affymetrix chip is used, Figure 2-10). Other microarray platforms,
(Illumina), use microscopic beads, instead of the large solid support.
„Feature“
Millions of DNA-probe strandsbuilt up on each „feature“
Probe oligonucleotide
Current size of last generation of GeneChip®
1.28 cm
1.28 cm
Figure 2-10: Schematic representation of an Affimetrix micro-array chip. Microscopic spots
(‘features’ or ‘locations’) on the solid support contain several millions of single stranded DNA-probes
immobilized. After hybridization to the chip of fluorescent labelled DNA or RNA sample, detection
and quantification are carried out by fluorescence-based analysis. (Adapted from
http://www.affymetrix.com)
36
Microarray technology was originally derived from the Southern blotting60 technique,
in which DNA fragments are probed with a labelled oligonucleotide complementary
to the DNA segment. The use of a library of distinct DNAs in arrays format for
expression profiling was first described in 1987, and the arrayed DNAs were used to
identify genes whose expression is modulated by interferon.61 These first gene arrays
were prepared by spotting cDNAs onto filter paper with a pin-spotting device.
Conversely, the use of miniaturized microarrays was first reported in 1995,62 and a
complete eukaryotic genome (Saccharomyces cerevisiae) on a microarray was
published in 199763.
So far, DNA microarrays have found many applications in a variety of technologies
(gene expression profiling, SNP detection, comparative genomic hybridization,
alternative splicing determination) and have dramatically accelerated many types of
investigations.59 Over the last few years, our laboratory used DNA microarray for the
decoding of DNA-encoded chemical libraries.47 In this setting 19-mer, 5' amino-
tagged oligonucleotides each containing a specific sequence representing the code of
the individual chemical compounds in the library, are spotted in quintuplicate onto
25x75 mm polyethylene glycol−coated and epoxy-activated microarray slides, using a
BioChip Arrayer robot and incubated in a humid chamber overnight at 25 °C.
Subsequently, the oligonucleotide tags of the binding compounds isolated from the
affinity-based selection are PCR amplified using a fluorescent primer and hybridized
onto the DNA-microarray slide. Afterwards, microarrays are analyzed using a laser
scan-array and spot intensities detected and quantified. The enrichment of the
preferential binding compounds is revealed comparing the spots intensity of the
DNA-microarray slide before and after selection.
Although DNA microarrays have provided a powerful approach to decode DNA-
encoded chemical libraries and to rapidly interrogate biological systems at a genomic
level, several limitations restrict the margins of its application. Even for the last
generation of high-density microarray chip (up to 7x106 features), the spotting and
hybridization of DNA-encoded libraries is quite demanding. Additionally, the
fundamental reliance of microarrays on nucleic-acid hybridization results in a “low-
fidelity” hybridization analysis of highly related sequences because of cross-
hybridization. This problematic is crucial in the decoding of DNA-encoded chemical
37
libraries. Since the differences between distinct compounds could be very small at the
level of the oligonucleotide tags, cross-hybridization may yield to false positive
identification. Additionally it is difficult to confidently detect and quantify low-
abundance species by DNA-microarray-based decoding even if the enrichment after
selection is substantial. Moreover, microarray decoding is currently challenging
regarding the reproducibility of results and is very dependent on specific platforms.
For instance, the “analog” quantification rather than “digital” limits the dynamic
range and the sample comparison. Last but not least, from the economical point of
view, the technology is costly (DNA probes and robotic equipment). However since
2004, massively parallel DNA sequencing technologies have became available,
offering dramatically lower per-base costs64 and promising to overcome the
limitations of microarrays. Millions of independently derived sequencing tags can
nowadays be simultaneously investigated in a single experiment at a cost below 1000
Sfr.
38
2.2.2 Decoding by high throughput sequencing According to the complexity of the DNA encoded chemical library (typically between
103 and 106 members), a conventional Sanger-sequencing based decoding is unlikely
to be usable in practice, due both to the high cost per base for the sequencing65 and to
the tedious procedure involved65. However nearly three decades have passed since the
invention of electrophoretic methods for DNA sequencing and various novel
sequencing technologies have recently been developed, each aiming to reduce costs to
the point at which the genomes of individual humans could be sequenced as part of
routine health care. Large-scale sequencing projects, including whole-genome
sequencing, have usually involved the Sanger sequencing method65 using fluorescent
chain-terminating nucleotide analogues66 and either slab gel or capillary
electrophoresis. Recent estimates of cost for human genome sequencing with standard
sequencing technologies are between $10 million and $25 million. Alternative
sequencing methods have been described67,68,69,70,71; nonetheless all these strategies
were essentially based on bacterial vectors and Sanger sequencing as the main final
generators of sequence information and consequently failed to develop new ultra-low-
cost massive sequencing techniques. Very recently new methods exploited strategies
that parallelize the sequencing process displacing the use of capillary electrophoresis
and producing thousands or millions of sequences at once.
Since the detection methods are often not sensitive enough for sequencing a single
molecule of DNA, the majority of the novel strategies use an in vitro amplification
step. Typically, it is possible to isolate individual DNA molecules along with primer-
coated beads in aqueous bubbles within an oil phase by emulsion PCR. A polymerase
chain reaction (PCR) then coats each bead with several clonal copies (called
“polony”) of the isolated library DNA molecule.72 This strategy is employed in the
methods commercialized by 454 Life Sciences, acquired by Roche, in the "polony
sequencing"73 and SOLiD sequencing (developed by Agencourt and acquired by
Applied Biosystems)74. Each bead is subsequently immobilized on a support for the
subsequent sequencing step. An alternative method for in vitro PCR amplification is
the "bridge-PCR", where fragments are amplified on primers anchored to a solid
surface. This system is developed and used by Solexa (now purchased by Illumina).75
Both approaches produce many physically isolated locations, each containing several
39
copies of a single DNA fragment. In 2006, Stephen Quake's laboratory (later
commercialized by Helicos) described the first second generation method for ultra
high throughput sequencing based on a single-molecule sequencing, skipping the
amplification step and directly fixing DNA molecules to a surface.76
Once every single sequence of DNA is physically localized to separate positions on a
support, various sequencing strategies may be applied to parallel determine the DNA
sequences. The "sequencing by synthesis", in full analogy with the dye-termination
electrophoretic sequencing used in the Sanger-method, employs the process of DNA
synthesis by DNA polymerase to identify the bases present in the complementary
DNA molecule.72 Pyrosequencing (used in “454” technology) also uses DNA
polymerization to add nucleotides, then detecting and quantifying the number of
nucleotides added to a given location through the light emitted by the release of
attached pyrophosphates.72,77 Alternatively “reversible terminator methods” (used by
Illumina and Helicos) are used.75,76 The nucleotides are added one at a time, then the
fluorescence corresponding to that position is detected, and the polymerization of
another nucleotide is enabled following removing of a blocking group. "Sequencing
by ligation" is another enzymatic method of sequencing, pioneered by the laboratory
of G.M. Church and employed in the “polony sequencing” and in the SOLiD
technology offered by Applied Biosystems. By means of a DNA ligase enzyme rather
than a polymerase and a pool of all possible oligonucleotide sequences of a fixed
length, labeled according to the sequenced position, oligonucleotides are annealed and
ligated.73,74,78 The corresponding ligation for matching sequences results in a signal
related to the complementary sequence at that position.
In this light, advances in high-throughput DNA sequencing technologies are likely to
revolutionize the strategies for the accurate decoding of DNA-encoded chemical
libraries of unprecedented size.
40
2.2.2.1 “454” technology
The “454” technology of Genome Sequencer FLX System (GS FLX), was developed
by 454 Life Sciences and has recently (2005) been acquired by Roche. The GS FLX is
a next generation DNA high throughput sequencing system featuring long reads, high
accuracy, and ultra-high throughput application.72,79 Currently GS FLX is one of the
most versatile high-throughput sequencing platforms available, supporting high
profile studies in a wide range of categories.72,79
Figure 2-11 schematically depicts the workflow of the “454” technology. Initially,
large DNA samples, such as genomic material, are fragmented in smaller fragments
(between 300 and 800 basepairs) by nebulisation. The DNA sample is then
denaturated to single stranded DNA (sstDNA). Subsequently specific short adaptors
(called A and B) are added to each fragment using standard molecular biology
techniques. An excess of sepharose beads carrying oligonucleotides complementary to
e.g. the A-adaptor sequence of the library fragments is added to the DNA library
previously generated in order to ensure that each of these beads hybridize to a unique
single-stranded DNA sequence. The bead-bound library is emulsified with the
amplification reagents in a water-in-oil mixture. Following an emulsion PCR is
performed yielding in several on-beads immobilized clonally copies of a specific
DNA fragment (ca. 10 million identical DNA molecules per bead). Afterwards, the
emulsion PCR is broken while the amplified fragments remain bound to their specific
beads. The clonally amplified on-bead fragments are enriched and loaded onto a
“PicoTiterPlate” device for sequencing (70x75 mm, containg 1.6 million wells), in
which the diameter of the single wells (44μm) allows for only one bead (round 30μm)
per well. After addition of a DNA bead incubation mix (containing DNA polymerase,
sulfurylase and luciferase), the fluidics subsystem of the Genome Sequencer FLX
instrument flows individual nucleotides in a fixed order across the wells containing
one bead each. Addition of one (or more) nucleotide(s) complementary to the
template strand yields in a chemiluminescent signal recorded by the CCD camera.
41
a b
de
sstDNA annealed to an excess
Capture Beads
emulsify beads and PCR reagents
Monoclonalamplification
Emulsification and em-PCR
break emulsion
sstDNA library
sequencing by synthesis: chemiluminescent signals upon nucleotide incorporation
deposit beads into wells1 well = 1 bead = 1 clonal amplification
add enzymes
Partitioning : one bead per well
SIGNAL
pyrophosphate release
Amplicon
Sequences
c
f
Figure 2-11: Workflow enabling “454” technology high-throughput sequencing technology. Adaptors
(A and B) - specific for both the 3' and 5' ends - are added to each sample fragment. The adaptors are
used for purification, amplification, and sequencing steps. Single-stranded fragments with A and B
adaptors compose the sample library used for subsequent workflow steps (a). The single-stranded DNA
library is immobilized onto specifically designed DNA Capture Beads. Each bead carries a unique
single-stranded DNA library fragment. The bead-bound library is emulsified with amplification
reagents in a water-in-oil mixture resulting in microreactors containing just one bead with one unique
sample-library fragment (b). The emulsion PCR (em-PCR) is performed and each fragment results in a
copy number of several million per bead. Subsequently, the emulsion PCR is broken while the
amplified fragments remain bound to their specific beads (c). The enriched beads are loaded onto a
PicoTiterPlate device for sequencing. The diameter of wells allows for only one bead per well (d).
After addition of sequencing enzymes, nucleotides are flowed in a fixed order across the wells
containing one bead each. Addition of one (or more) nucleotide(s) complementary to the template
strand results in a chemiluminescent signal recorded by the CCD camera (e). The combination of signal
intensity and positional information allows the software to determine the sequence (f). (Adapted from
http://www.454.com)
The nucleotide flow described above enables parallel sequencing of hundreds of
thousands of beads each carrying millions of copies of a unique single stranded DNA
molecule. Typically 400,000 individual reads per 7.5-hour instrument run
simultaneously. For sequencing-data analysis, different bioinformatics tools are
available supporting the various applications including de novo assembly;
42
resequencing and amplicon variant detection by comparison with a known reference
sequence. Currently the 454 Genome Sequencer FLX instrument ensures read
accuracies of >99.5% over the first 250 bases and 200 Mb of sequence information
per day.72,79
In this Thesis we describe a novel convenient implementation of “454” high-
throughput sequencing technology for the decoding of DNA encoded chemical
library.
2.2.2.2 Solexa technology
Solexa sequencing technology, acquired by Illumina in 2007, is based on massively
parallel sequencing employing reversible terminator-based sequencing chemistry.75
Figure 2-12 schematically describes the Solexa technology process. Similarly to the
“454” technology (see Chapter 2.2.2.1), after fragmentation of the double stranded
DNA genomic material, adapters are ligated to both the extremities. Subsequently the
randomly fragmented genomic DNA is denatured to single strand DNA (sstDNA) and
hybridized to the complementary adapter sequences attached on a planar, optically
transparent surface. Following addition of unlabelled nucleotides and DNA
polymerase, the attached adapters are extended and “bridge”-amplified, resulting in an
ultra-high density sequencing flow cell with ≥50 million clusters, each containing
~1,000 copies of the same template. These templates are sequenced using a four-color
DNA “sequencing-by-synthesis” technology that employs reversible terminators with
removable fluorescent dyes. The four fluorescent dye-nucleotides are added
simultaneously at the beginning of every chemistry cycle. Therefore, after wash of the
unincorporated reagents and laser excitation, the fluorescence emission from each
cluster on the flow cell is recorded and the corresponding base called. Afterward the
fluorophore-dyes at 3’ terminus are removed and the next chemistry cycle is initiated.
Repeating a number of times the sequencing cycles, the entire template sequence of
each cluster-fragment is determined. Furthermore, after completion of the first
sequence read, the templates can be regenerated in situ enabling a second read from
the opposite end of the fragments.75
43
AC
GT
AG
G
C
CT
AG
G
C
CT
A
G G
CC
A
G
CT
AA C
TA...GCG...AGG...CGC...TAC...ACA...C
. . .1st cycle 2nd cycle n cycle
a b c
d
e
f
g
hSequences
Adapter
DNAsample
Attached terminus
Free terminus
AdapterLigation
Attach DNA to surface
BridgePCR
Denaturation
Bridge PCRcycles
Clusters
Sequencing-by-synthesis
cycles
Laserimaging
Sequences determination
Figure 2-12: Schematic description of Solexa sequencing workflow. Initially adapters are ligated to the
DNA samples (a) and hybridized to the complementary adapter sequences on the slide support (b).
Following addition of nucleotides and DNA polymerase, “bridge”-PCR is performed, resulting in an
ultra-high density sequencing flow cell with ≥50 million clusters, each containing ~1,000 copies of the
same template (c, d, e). “Sequencing-by-synthesis” technology employs reversible terminators with
removable fluorescent dyes. After inclusion of the fluorescent dye-nucleotides and wash of the
unincorporated reagents, laser image capture the emitted fluorescence from each cluster, then
fluorophore-dyes at 3’ terminus are removed and the next chemistry cycle is initiated (f, g). Repeating
the sequencing cycles, the sequence of each cluster-fragment is determined (h).
Currently the range of applications of the Solexa technology includes gene
expression, small RNA discovery, and protein-nucleic acid interactions. So far the
main limitation of Solexa system especially for the decoding of DNA-encoded
chemical libraries implementations is represented by the short maximum read length
currently up to 50 basepairs (standard 36 basepairs) for each DNA fragment, that can
be extended to 100 basepairs (averagely 72 basepairs) in the case of the “double
reading” from both the adaptor ends. On the other hand, the Solexa system allows the
generation of up to 600 Mb/day of sequence information, three times more compared
to “454” Genome Sequencer FLX instrumentation with comparable accuracy
(>98.5%).75
44
2.2.2.3 SOLiD techonlogy
SOLiD (Sequencing by Oligonucleotides Ligation and Detection) technology was
firstly described by the group of G.M. Church in 200573 and has recently been
purchased by Applied Biosystem. The methodology is base on sequential ligation
with dye-labeled ologonucleotides.73 Moreover the ultra high throughput capability
and the unequalled accuracy features of the SOLiD system, together with the broad
range of possible applications, provide the vanguard of the next generation high
throughput sequencing technologies.
In full analogy to “454” technology (see Chapter 2.2.2.1), after preparation of a
suitable DNA fragment library containing specific adapters at the extremities, SOLiD
methodology employs emulsion PCR (em-PCR) to generate a clonal bead populations
(Figure 2-13a). Following em-PCR the templates are denatured and the beads with
the extended template are enriched from the undesired beads. A suitable 3’-end
modification allows the selected beads to be covalently attached to the sequencing
glass slide (Figure 2-12a). Thereafter, the sequencing process is started. Typically the
probe library set enabling the sequences determination contains 1024 different 8mer
single strand 5’-fluorescent DNA synthetic oligonucleotides (Figure 2-13b). Each
probe comprises a full randomized sequence of five bases, a cleavage site for
removing the 5’-fluorescent dye and an additional three bases constant domain as
depicted in Figure 2-13b. Importantly, only four different dyes are used for labelling
the entire probe library set (1’024 probes, 256 probes per dye). Thereby each of the
four dyes does not call for a single base, whereas it represents one of the four possible
di-base combinations of position 4 and 5 of the corresponding probe (4 colours coding
16 di-base possible combinations, Figure 2-13b).
45
z zn n T A
Cleavage site Fluorescent dyeLigation site
3‘ 5‘
n = degenerate bases z = Universal bases1,024 Octamer-Probes (45)
znA C G T
A
C
G
T
4 Dyes, 4 di-nucleotides per dyes, 1,024 Probes / 4 Dyes = 256 probes per dye
1st Base
2nd Base
sstDNAsample
Bead Hybridization Em-PCR and emulsion break
Random covalent bead deposition on glass slide
5‘
3‘
a)
b)
5,4
I III IVIIIII
Adapterligation
Figure 2-13: Sample preparation for SOLiD sequencing and schematic representation of the probes
system enabling the sequence identification. a) Single stranded DNA sample fragments (sstDNA) are
ligated to specific adapters to the 5’ and 3’ terminus (i). Following hybridization to capture beads
carrying the corresponding complementary adapter sequence (ii), emulsion PCR with suitable primers
is performed (iii). Lastly, the emulsion is broken and the amplified beads are covalently attached to the
sequencing glass slide by the 3’-end (iv). b) Each probe of the probe library comprises from the 3’-end
a random sequence of five bases, a cleavage site and an additional constant domain of three bases. Four
different dyes are used for labelling the entire probe library set (1024 probes, 256 probes per dye). Each
dye represents one of the four possible di-base combinations of position 4 and 5 of the corresponding
probe.
The sequencing process starts hybridizing an n-base long universal sequencing primer
to the adapter attached to the bead. Subsequently a set of four 5’-fluorescent
removable di-base probes of fixed length together with DNA ligase are flowed on the
slide, competing for ligation to the sequencing primer (Figure 2-14). Therefore after
laser excitation, the fluorescence emission from each cluster on the flow cell reveals
the nature of the di-base probe ligated. Following cleavage of the fluorescent dye by
restriction of the probe at a specific position, the ligation process is repeated (Figure
2-14). Consequently, after every cycle a precise “di-base position” of each template
fragment is interrogated. Following a series of ligation cycles the extension product is
removed and the template is reset with a primer complementary to the n-1 position for
a full second round of ligation cycles (Figure 2-14). After multiple cycles of reset
(typically five) and ligation every base of the template sequence results to be “double
46
interrogated” by different probes (Figure 2-14). Therefore starting from a known base
(e.g. the last base of the initial adapter) it is possible to univocally translate the entire
colour-sequence into the corresponding base-sequence (Figure 2-14).
G T
3,4
A T
8,9
T C
4,5
13
C
A C G T C G C A T T C A C
4,5
Bead
Bead
Bead
Universal primer n
4,5
T CBead
Universal primer n
Bead
Universal primer n-1
T C4,5
T T
9,10Bead
Universal primer n
Universal primer n-1
Bead
Universal primer n
1st Ligation Cycle: Universal nprimer hybridization 1st Ligation Cycle and 1nd di-base calling
Fluorescent dye cleavage
9,10
1nd Ligation Cycle, 2nd di-base calling
1st Ligation Cycle base calling complete
2nd Ligation Cycle: Universal n-1primer hybridization
2nd Ligation Cycle base calling
After n Ligation Cycle: ‚color-sequence‘
Corresponding base-sequence starting from a known base
n-1
n-2
n-3
n-4
3,4 8,9 13
4,5 9,10
2,3 7,8 12,13
1,2 6,7 11,12
0,1 5,6 10,11
n 1st Cycle
2nd Cycle
4th Cycle
5th Cycle
3th Cycle
Multiple cycles di-base calling(each base is “double interrogated” by two different probes)
aBead
3‘5‘
b c
d e f
Repeatingligation cycle and base calling
g h
h i
j
Figure 2-14: Sequencing by oligonucleotides ligation workflow. An n long universal sequencing
primer is hybridized to the adapter attached to the bead (a). Subsequently the probe library 5’-
fluorescent labelled together with DNA ligase are flowed on the slide, competing for ligation to the
sequencing primer (b). The fluorescence emission from each cluster (bead) on the flow cell reveals the
nature of the di-base probe ligated (b). After cleaving of the fluorescent dye the ligation process is
repeated until the terminal adapter (in green) is reached (c, d). Hence, the first cycle of ligation and “di-
base” calling is completed and the system reset (e). Following hybridization with an n-1 long universal
sequencing primer, a second cycle of ligation is started and a second round of “di-base” calling
accomplished (f, g). Repeating a number of ligation cycles and “di-base” calling (typically 5), each
base of the template sequence results to be “double interrogated” by different probes and a ‘colour’-
sequence can be generated (h, i). Starting from a known base (e.g. the last base of the initial adapter)
the entire ‘colour’-sequence is converted into the corresponding base-sequence (j).
Although the SOLiD double base interrogation might appear more cumbersome, it
facilitates the discrimination between system errors and true polymorphism. In
essence, a true single nucleotide polymorphism (SNP) results in a consecutive double
colour change between the colour-sequence of the reference-template and the
observed, while sequencing errors unambiguously result in single colour change
47
(Figure 2-15). The double base interrogation enables ultra high base calling accuracy
(>99.94%). Additionally, the SOLiD system is able to generate 600 Mb of sequence
information per day and up to 6 Gb in a single experiment.74
A C G T C G C A T T C A C
A C G T C G G A T T C A C
A C G T C G C A T T C A C
A C G T C G G T A A G T G
Expected
Observed
SNP(two color change)
Sequencing error(single color change)
Figure 2-15: SOLiD discrimination between true polymorphism (SNP) and system sequencing errors.
True single nucleotide polymorphism (SNP) results in a consecutive double colour change between the
reference-template and the observed ‘colour’-sequence (left panel), while sequencing errors
unambiguously result in single colour change (right panel).
As for the Solexa system (see Chapter 2.2.2.2), the main drawback of the SOLiD
technology, particularly for decoding of DNA-encoded chemical libraries, is
represented by the narrow maximum read length currently fixed to 35 basepairs for
standard applications. However the double base interrogation feature of the SOLiD
approach is undoubtedly very attractive for high fidelity decoding of large DNA-
encoded libraries, where the mismatch on a single base calling might be crucial for
the proper identification of the binding structures.
48
2.2.2.4 Single Molecule DNA Sequencing – Helicos technology
An alternative ambitious solution to address the issues of costs, speed and sensitivity
of the conventional sequencing technologies and the exponentially increasing demand
of DNA and RNA sequence information was very recently presented by Stephen
Quake's laboratory describing the use of DNA polymerase and fluorescence
microscopy to obtain sequence information from single DNA molecules.76
Furthermore, single DNA molecule sensitivity might permit direct sequencing of
mRNA from rare cell populations or perhaps even individual cells.
The technology has been commercialized in 2006 as Helicos True Single Molecule
Sequencing (tSMS). Initially the DNA samples are restricted in fragments comprising
up to 55 basepairs. Subsequently the DNA library fragment is denatured, ligated to an
adaptor sequence at the 3’-terminus and captured on the flow-cell by hybridization to
the complementary adapter sequences attached on the surface (Figure 2-16).
According to a sequencing-by-synthesis approach, reversible fluorescently labeled
nucleotides are sequentially added to the nucleic acid templates (Figure 2-16). The
polymerase catalyzes the sequence-specific incorporation of fluorescent nucleotides
into nascent complementary strands on all the templates. After a washing step, which
removes all non-reacted nucleotides, the incorporated nucleotides are imaged and
their positions recorded (Figure 2-16). Angstrom spatial resolution is not necessary
since the hybridized templates distance is sufficiently high (0.1 micrometer range) and
the nucleotides are inserted sequentially; only the time resolution to discriminate
successive incorporations is required. Following removal of the fluorescent group the
process continues through the flowing of each of the other three bases (Figure 2-16).
Therefore multiple four-base cycles result in the parallel determination of billions of
template sequences and the generation of up to 900Mb/day sequence information
(Figure 2-16). 80 Unlike amplification-based sequencing technologies, in tSMS every
strand is unique and sequenced independently. As a result, the tSMS process is not
subject to “dephasing” errors that occur when amplified DNA clusters fall out of
step.80,81
49
Hybridization of DNA to surface
a
DNAsample Denaturation
3‘-endAdapter Ligation
b
AA
A
CC
C
GG
G
TT
T
3‘
5‘
Sequencing-by-synthesisFlow A
1. Capture image2. Cleavage 3. Flow T
1. Capture image2. Cleavage3. Flow G
1. Capture image2. Cleavage3. Flow C
Sequences
Sequencing-by-synthesisnext cycle
Captureimage
d
ef
g h
c
Figure 2-16: True Single Molecule Sequencing (tSMS) workflow. DNA library fragment is
denaturated, ligated to an adaptor sequence at the 3’-terminus and hybridized on the flow-cell (a, b, c).
Sequencing-by-synthesis is initiated adding sequentially reversible fluorescently labelled nucleotides.
The polymerase catalyzes the sequence-specific incorporation into the template strands of the specific
fluorescent nucleotides. After a washing step and removal of the fluorescent group, the incorporated
nucleotides are imaged, their positions recorded and the next fluorescent nucleotide flowed (d, f, g).
Multiple sequencing-by-synthesis cycles result in the parallel determination of the template sequences
(h).
Although the Helicos methodology is very promising and displays an accuracy over
99%80, research applications are currently not reported in literature. In the view of a
DNA-encoded chemical library implementation, the read length space is at present
very limited (55 basepairs). However, in the future, technology improvements may
permit the use of a True Single Molecule Sequencing in chemical library decoding.
50
3. RESULTS
3.1 DNA-Encoded Library “DEL4000” DNA-encoding facilitates the construction and screening of large chemical libraries.
Here, we describe general strategies for the stepwise coupling of coding DNA
fragments to nascent organic molecules throughout individual reaction steps. The
methodology was exemplified in the construction of a DNA-encoded chemical library
containing 4’000 compounds named “DEL4000” (DNA Encoded Library 4000). The
synthesis of the library was achieved using a split-and-pool procedure, which featured
the following sequential steps: (i) conjugation of different N-Fmoc-amino acids to
distinct amino-modified synthetic oligonucleotides; (ii) deprotection of the amino
moiety (iii) pool and split; (iv) amide bond formation reaction with selected
carboxylic acid; (v) encoding of the carboxylic acid used in the previous step by
hybridization of partially complementary oligonucleotides followed by Klenow-
mediated DNA polymerization, yielding the final compounds in a double-stranded
DNA format. Moreover the purity of the intermediate steps was extensively
investigated using HPLC and mass spectrometry.
51
3.1.1 Library design and synthesis
Figure 3-1 describes the strategy for the construction of a DNA-encoded chemical
library consisting of 20 x 200 modules (i.e., 4’000 compounds), joined together by the
formation of an amide bond.
. . . . . .
Pool 4000
HNFmoc
COOH
x20
1) Sulfo-NHSEDCDMSO30°C, 15min
2)
TEA/HCl pH = 1030 °C, o/n
3) Piperidine 500mM4°C, 1h
4) HPLC
(C12)NH2
NH2
(C12)5‘
3‘
x20
. . . . . . NH2
(1-20)
20NH2
NH2
5‘3‘
POOL
1) SPLIT 2002) Amide bond
formation
200 Carboxylic acids3) EtOH prec.
COOH
200
. . . . . .
20
.... . . . . .
Encoding(Annealing)
200. . . . . .
......
Encoding(Klenow)
200
. . . . . . . .
. . . . . . . .
. . . . . . . . 1) Ion-exchange
on cartridge2) POOL
HN O
Figure 3-1: Schematic representation of the strategy used for the synthesis and encoding of the
DEL4000 library. Initially, 20 different Fmoc-protected amino acids were coupled to unique
oligonucleotides derivatives, carrying a primary amino group at the 5’ extremity. After deprotection
and HPLC purification, these derivatives were pooled and coupled to 200 carboxylic acids in parallel
reactions. The identity of each carboxylic acid was encoded by means of a Klenow polymerization
step, using a set of partially complementary oligonucleotides. This procedure resulted in a 4000-
member library (DEL4000), in which each chemical compound was covalently attached to a double-
stranded DNA fragment, containing two coding domains which unambiguously identify the
compound’s structure (i.e., the two chemical moieties used for compound synthesis).
Initially, 20 Fmoc-protected amino acids (for the structures see Appendix 9.2) were
chemically coupled to 20 individual amino-tagged oligonucleotides. After
deprotection and HPLC purification, the 20 resulting DNA-encoded primary amines
were coupled to 200 carboxylic acids (for the structures see Appendix 9.2), generating
52
a library of 4’000 members. In order to ensure that each library member contained a
different DNA code, a split-and-pool strategy was chosen, which also minimizes the
number of oligonucleotides needed for library construction. As indicated in Figure 3-
1, the 20 primary amines covalently linked to individual single-stranded
oligonucleotides were mixed and aliquoted in 200 reaction vessels, prior to coupling
with the 200 different carboxylic acids (one per well). Following the reaction, the
oligonucleotides of each vessel were precipitated as sodium phosphate adducts, after
addition of an AcOH/AcONa solution (pH 4.7) and three volumes of ethanol. The
identities of the carboxylic acids used for the coupling reactions were encoded by
performing an annealing step with individual oligonucleotides, partially
complementary to the first oligonucleotide carrying the chemical modification. A
successive Klenow fill-in DNA-polymerization step yielded double stranded DNA
fragments, each of which contained two identification codes (one corresponding to the
initial 20 compounds and one corresponding to the 200 carboxylic acids, see Figure
3-1). The 200 reaction mixtures were then purified on an anion exchange cartridge
and pooled. Model reactions performed prior to library construction had shown that
the yields of the amide bond forming reaction ranged between 51% and 98% (see
Chapter 3.1.2, Table 3-1). The resulting DNA-encoded chemical library, containing
4’000 compounds, was aliquoted at a total DNA concentration of 300 nM and stored
frozen prior to further use.
53
3.1.2 Model Compounds A high quality library is crucial for reliable and reproducible selection experiments.
Unreacted oligonucleotide and side products may lead to erroneous decoding
interpretation and consequently incorrect binder identification. Therefore, since the
library quality relies essentially on the yield of the reactions used to produce each
compound member, model compounds of the library oligonucleotide conjugate were
synthesized in order to validate reaction conditions, yields and product recovery.
Three 42mer 5’-Fmoc-deprotected model amino acids oligonucleotide conjugates
carrying a primary amino group were individually coupled to four different carboxylic
acids using a solution of N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide (EDC)
and N-hydroxysulfosuccinimide, and finally buffering the pH by adding an aqueous
triethylamine hydrochloride, pH9.0. Following overnight stirring and quenching by
addition of Tris-Cl buffer, the reactions were analysed by HPLC and the masses of the
reacted oligonucleotides detected by LC-ESI-MS. Typical HPLC coupling yields and
recovery were assessed to range between 51% and 98% (Table 3-1, see also
Appendix 9.1).
Table 3-1: HPLC coupling yields and recovery assessed after peptide bond formation reaction
between three selected 5’-Fmoc-deprotected amino acids oligonucleotide conjugate and four different
model carboxylic acids (see also Appendix 9.1). *) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis
spectrophotometer) following HPLC purification (see Chapter 3.1.7).
N H 2
O
H N
D N A
OHN
H 2 N
D NA
O
N H
H 2 N
D NA
Structure
Yield % Recovery*) % Yield % Recovery*) % Yield % Recovery*) %
O H
NHO H
H N
HS
O 98 90 83 68 65 60
O
H O
O I 70 60 72 60 76 65
O
O H
N
>70 70 >64 64 >57 57
O
H O
Br
>52 52 >55 55 >51 51
54
3.1.3 Oligonucleotides In Figure 3-2 the two distinct sets of oligonucleotide used for the unambiguous
encoding of DEL4000 compounds are schematically depicted. The first set (Figure 3-
2a) consisted of 20 unique 42mer single-stranded DNA oligonucleotides, comprising
three domains: an 18 nucleotides primer region (including an EcoRI restriction site)
for PCR amplification at the 5’-terminus, a region of six bases serving as code (each
code differing from the others by at least three bases, see Appendix 9.2) and a
hybridization domain of 18 nucleotides at 3’-end. For the conjugation of the initial 20
Fmoc-protected amino acids, a NH2-(CH2)12-modification was added to 5’-terminal
phosphate group. The general sequence was 5’-NH2-(CH2)12PO4-GGA GCT TGT
GAA TTC TGG XXXXXX GGA CGT GTG TGA ATT GTC (a list with all the 20
codes used for library construction can be found in the Appendix 9.2). A second set of
oligonucleotides (Figure 3-2b) for the encoding of the further 200 carboxylic acids
used in the second step of the synthesis of the DEL4000 library (see Chapter 3.1.1)
consisted of 200 distinct 44mer single-stranded DNA oligonucleotides with a general
sequence: 5’-GTA GTC GGA TCC GAC CAC XXXXXXXX GAC AAT TCA CAC
ACG TCC-3’. The sequence contains an 18 nucleotides primer region (including a
BamHI restriction site) for PCR amplification at the 5’-terminus, a specific coding
region of eight nucleotides (each differing from the others by at least four bases, see
Appendix 9.2) and a hybridization domain of 18 bases always complementary to the
hybridization domain of the previous set of oligonucleotides (the list with all the 200
codes used is given in the Appendix 9.2).
55
GGA GCT TGT GAA TTC TGG XXX XXX GGA CGT GTG TGA ATT GTC
NH2 5‘ 3‘
18nt PCR primer domain 6nt code 18nt hybridization domainNH2-(CH2)12modification
EcorI restriction site
(42nt)X 20
GTA GTC GGA TCC GAC CAC XXXXXXXX GAC AAT TCA CAC ACG TCC5‘ 3‘
18nt PCR primer domain 8nt code 18nt complementaryhybridization domain
BamHI restriction site
(44nt)X 200
a)
b)
Figure 3-2: Schematic representation of the oligonucleotide sets employed for the encoding of
DEL4000 library. a) 20 unique 42mer single stranded DNA 5’-NH2-(CH2)12PO4- oligonucleotides. The
sequences contain three domains: an 18 nucleotides primer region (including an EcorI restriction site)
for PCR amplification, a coding region of six bases (each differing from the others by at least three
bases, see Appendix 9.2) and a hybridization domain of 18 nucleotides at 3’-end. The amino
modification serves as reactive group for the conjugation of the initial 20 Fmoc-protected amino acids.
b) Second set of 200 unique 44mer single stranded DNA oligonucleotides served as identification bar-
code for the 200 carboxylic acids used in the synthesis of DEL4000. The sequences contain from 5’-
terminus: an 18 nucleotides primer region (including a BamHI restriction site) for PCR amplification, a
coding region of eight nucleotides (each differing from the others by at least four bases, see Appendix
9.2) and a complementary hybridization domain of 18 bases.
3.1.4 Compounds Various considerations were taken into account for the selection of the 20 Fmoc-
protected amino acids and the 200 carboxylic acids to build the library. The
compounds had to be commercially available and suitable for conjugation to an amino
modified oligonucleotide forming a stable amide bond. The amide bond formation
reaction on amino-tagged oligonucleotides worked very well for the construction of
DNA-encoded ESAC libraries in our laboratory.47 We mainly utilized the amide bond
forming reaction for the conjugation of activated alkylic carboxylic acids to primary
amine moieties. The molecules selected were further restricted in size to be between
56
100 and 300 Dalton, (without removable protecting groups in the case of the Fmoc-
protected amino acids). We sought compounds with a range of functional groups, with
hydrophobic and hydrophilic properties. A complete list with all the structures of the
20 Fmoc-protected amino acids and the 200 carboxylic acids is given in the Appendix
9.2.
The protocol for the amide bond formation reaction was set up by testing several
compounds and analyzing them by HPLC and MS (see Chapter 3.1.2). A typical
reaction procedure is schematically depicted in Figure 3-3.
HNFmoc
COOH
18 mer 6 mer 18 mer
Code
42meroligonucleotide
5‘3‘
1. EDC 1 eq.S-NHS 4 eq.DMSO, rt, 30‘min
(C12)NH22.
pH=10 TEAA-HClrt, o/n
HNO
O
Piperidine
50 eq.4°C, 2h
Fmocremoving
Amide bond formation
5‘3‘
COOH
1. EDC 1 eq.S-NHS 4 eq.DMSO, rt, 30‘min
2.
pH=10 TEAA-HClrt, o/n
NH2
(C12)HNC
Code 5‘3‘ O
NHC
(C12)HNC
Code 5‘3‘ O
O
(C12)HNC
Code 5‘3‘ O
NH2
(C12)HNC
Code 5‘3‘ O
a)
b)
Figure 3-3: Reaction scheme of library synthesis. a) Coupling of Fmoc-amino acids to the initial 5’-
amino oligonucleotides and Fmoc removal. b) Amide bond formation reaction enabling the final
coupling with 200 different carboxylic acids. In the right panel is schematically depicted the structure
of the oligonucleotide.
3.1.5 HPLC Purification A challenging task for library construction was the separation of the conjugate
oligonucleotide from the unconjugate oligonucleotide precursor. Typically,
purifications after first step of library synthesis were performed by reversed phase
HPLC using an ion pairing reagent. In order to prevent the addition of contaminants, a
volatile buffer was employed and removed under vacuum after the chromatographic
step. The best purification profiles were obtained using a C18 column with increased
pH stability (Figure 3-4a). Dimethylbutylammonium acetate, (DMBAA, 100 mM,
57
pH = 7) was used for those oligonucleotides not sufficiently resolved by the TEAA
buffer.
In order to distinguish oligonucleotides and oligonucleotide conjugates from starting
compounds and side-products, absorption was monitored at 260 nm and 280 nm. The
oligonucleotide absorption ratio 260 nm: 280 nm is typically 1.8 : 1.
3.1.6 Mass Spectrometry Electrospray ionization mass spectrometry (ESI-MS) was employed for the
characterization of the reaction products after oligonucleotide conjugation in the first
step of library construction. Desalting of the oligonucleotide from sodium and
potassium adducts is crucial for the ESI-MS analysis. The multiple adducts of the
phosphate backbone of the oligonucleotide with sodium and potassium dramatically
decrease the sensitivity and complicate the interpretation of the spectra. To avoid
manual desalting (e.g. by Zip Tips), desalting was performed on-flow before each
mass spectrometric analysis. While several combinations of column package material
and buffer systems have been reported to efficiently desalt oligonucleotides on-flow
before mass spectrometry82, the only system working successfully in our hands was
1,1,1,3,3,3-hexafluoroisopropanol (HFIP) as volatile acid component and
triethylamine (TEA) as ion pairing reagent on a C18 column. Since TEA strongly
suppresses the ionization, its concentration was kept to 5 mM, thus allowing sufficient
ion formation and desalting. This protocol enabled the ESI-MS of oligonucleotides of
various sizes as multiple charged molecules (Figure 3-4b) with sensitivity up at 5
pmol.
58
a)
b)
Initial oligonucleotide
Compound-oligonucleotide conjugated
Carboxylic acid compound
10 -
11 -
12 -13 -14 -
9 - 8 -
7 -
Figure 3-4: Example of oligonucleotide HPLC purification and mass spectrometry charachterization.
a) HPLC purification of a typical coupling reaction of an Fmoc-amino acid and 5’-amino-
ologonucleotide after Fmoc removal. The green line indicates the absorption at 260 nm, the red line at
280 nm. The chromatogram is recorded using TEAA, 100 mM, pH = 7 as buffer system on a C18
column. b) ESI-MS of a compound oligonucleotide conjugate as multiple negative charged molecules.
The peaks with a mass over charge ratio between 7 and 14 are depicted.
3.1.7 Oligonucleotide concentration determination Following HPLC purification and solvent removal under vacuum, the oligonucleotide
fractions were dissolved in water. The concentration was determined measuring the
absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis
spectrophotometer). The extinction coefficient of each oligonucleotide was calculated
from the specific sequence assuming the following per nucleotide molar extinction
coefficient: εT = 8400 cm-1M-1; εA = 15200 cm-1M-1; εC = 7050 cm-1M-1; εG = 12010
cm-1M-1. The ratio of absorbance at 260 nm and 280 nm was used to estimate the
purity of DNA and other contaminants that absorb strongly at or near 280 nm. A ratio
260/280 of ~1.8 was generally accepted as “pure” for DNA. The ratio of absorbance
at 230 nm and 280 nm was used as a secondary measure of nucleic acid purity from
59
organic contaminants which absorb at or near 230 nm. Expected 260/230 values for
“pure” DNA are commonly in the range of 2.0-2.2.
3.1.8 Polymerase Klenow encoding Following coupling with the 200 different carboxylic acids in the second step of
library construction (see Chapter 3.1.1, Figure 3-1), an annealing step with individual
44mer oligonucleotides (see Chapter 3.1.3), partially complementary to the 42mer
oligonucleotides carrying the chemical modification (see Chapter 3.1.3), was
performed. A subsequent Klenow fill-in DNA-polymerization step at 37 °C yielded
double stranded DNA fragments, each of which contained both identification codes
(see Chapter 3.1.1, Figure 3-1). The Klenow fragment83 is a large protein fragment
produced when DNA polymerase I from E. coli is enzymatically cleaved by the
protease subtilisin. The Klenow Polymerase I exhibits optimal performance at 37 °C
retaining the 5’ → 3’ polymerase activity and the 3’ → 5’ exonuclease activity for
removal of precoding nucleotides and proofreading, but losing its 5' → 3' exonuclease
activity. Therefore Klenow Polymerase I is very suitable for fill-in reactions of
partially complementary DNA strands at mild temperature. Conversely, polymerase
fill-in using conventional Taq DNA polymerase requires higher polymerization
temperature (75-80 °C), which may compromises the stability of the conjugate
compounds.
Product of Klenow Polymerase fill-in encoding were analysed by gel electrophoresis
on polyacrylamide gels 20 % trisborate-EDTA (TBE) and 15 % trisborate-EDTA urea
(TBU). In all the reactions the rate of polymerization was complete.
3.1.9 Summary
The individual steps described above were used for the construction of the DEL4000
library as shown in Par 3.1.1, Figure 3-1. After dissolving the 20 Fmoc-protected
amino acid compounds and the specific 5’-amino-modified oligonucleotide tag, a
peptide bond formation reaction was performed. Following Fmoc protection removal,
the reaction products were purified by HPLC and the appropriate fractions dried under
vacuum, dissolved in water and analyzed by mass spectrometry. Typical HPLC yields
on this first step were over 43% (on average 65%). In a second step, each of the 20
60
compound oligonucleotide conjugates was mixed in equimolar amount (4 nmol each)
in order to generate a first DNA encoded sub-library of 20 amino-tagged compounds.
The pool was then equally split in 200 vessels and each vessel underwent a second
peptide bond formation with a different carboxylic acid. Following precipitation of
the oligonuclotides of each reaction as phosphate adducts, the modification was
enzymatically encoded by Klenow assisted polymerization using a further DNA
oligonucleotide fragment. At the same time, the encoding also generated the desired
double stranded DNA format of the final DEL4000 library. After purification of DNA
over ion-exchange cartridges, the 200 reaction vessels were pooled to produce the
final 4000 member compounds DNA Encoded Library (DEL4000). The library was
aliquoted at a total DNA concentration of 300 nM and stored frozen prior to further
use. Figure 3-5 schematically represents the general structure of a typical compound
in the library.
2nd building blockcorresponding
code: 8 bpCODE2
1st building block corresponding
code: 6 bpCODE1
1st constant domain:18 bp
2nd constant domain:18 bp
3rd constant domain:18 bp
Total: 68 bp
GGAGCTTGTGAATTCTGGXXXXXXGGACGTGTGTGAATTGTCYYYYYYYYGTGGTCGGATCCGACTAC-3’
3’ - CCTCGAACACTTAAGACCXXXXXXCCTGCACACACTTAACAGYYYYYYYYCACCAGCCTAGGCTGATG- 5‘
O
HNOHN
CODE1 CODE25‘ 3‘
5‘
Pharmacophore compound
Figure 3-5: Schematic representation of the general structure of a typical compound in the DEL4000
library. Each pharmacophore compound was assembled from two different building blocks (in green
and red) in a split-&-pool fashion and was encoded by two corresponding DNA domains (green X and
red Y) of six and eight base pairs respectively. The coding regions are both flanked by two constant
PCR priming domains of 18 base pairs and by a constant spacer of 18 base pairs that acts as spacer
between the codes.
61
3.2 Selections using the DEL4000 library In order to investigate the functionality of the newly synthesized DEL4000 library and
to validate the reliability of the selection and of the high-throughput sequencing read-
out procedure, DEL4000 was biopanned onto three target proteins (streptavidin,
matrix metalloproteinase 3 and polyclonal human IgG) immobilized on a sepharose
support in three independent selection experiments. Although the concentration of an
individual library member is below 1 nM, binding compounds can efficiently be
recovered by selection with biotinylated target protein in solution at concentrations
above the dissociation constant Kd, followed by streptavidin capture. Similarly, the
selection can be performed with the protein of interest immobilized at high surface
density on a solid support (e.g., CNBr activated sepharose), in full analogy to the
procedures commonly used for the selection of antibodies from phage display
libraries.84 Therefore selections were performed by incubating the DEL4000 library
with the target protein attached on a sepharose resin (Figure 3-6a).54 The resin,
containing the retained DNA-encoded binding molecules was washed four times with
400 µL PBS and finally resuspended in 100 µL water for a subsequent PCR
amplification step followed by high-throughput sequencing (Figure 3-6a). After
analysis of the experimental sequences derived by high-throughput sequencing using
an in-house developed program written in C++, the frequency of each code
corresponding to the individual pharmacophores was plotted in a 3D graph in which
the xy plane represents the 4000 different sequences (compounds) of the library,
while the number of sequence counts for each compounds is reported on the z axis
(Figure 3-6b).
62
GGAGCTTGTGAATTCTGGCAAGCTGGACGTGTGTAATTGTCGACTTCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGATCTTAGGACGTGTGTGAATTGTCTTGGGGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCCTGATCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCAGTCAGGGGTGGTCGGATCCGACTAGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCGTTGACGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGNTTAACTGGACGTGTGTGAATTGTCCTCTNTGTCGGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCTGTGCAGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCCAACGTAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGGGTAAGGACGTGTGTGAATTGTCATTAGCTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCCAACGCCGGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTAAATGGACGTGTGTGAATTGTCCAGTGTGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCCACAGTCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCCACAACTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCGATCGGACGTGTGTGAATTGTCGTTGTTCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGCAAGCTGGACGTGTGTGAATTGTCGCCGTAGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCGGAAAAGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCTGGTGTACGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCAGGAGACCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGCGTGCAGGACGTGTGTGAATTGTCCCCCCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCCGGCGGACGTGTGTGAATTGTCCCCCCCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCACCAACGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGGTAAGGCGTGTGTGAATTGTCACAACGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCCATGACCCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGACGGCAGGAGTGTGTGAATTGTCTATANGCCGGGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCACCAGTAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCAAAAGGGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGACGGCAGGACGTGTGTGAATTGTCCCAAAACCGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCACGTTGGGGTGGTCGGATCCGACTGGAGCTTGTGAATTCTGGACGGCAGGACGTGTGTGAATTACTAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCGACTTCCCGTGATCGGATCCGACTAGGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCGTGTGTCCGTGGTCGGATCCGACTAGAGCTTGTGAATTCTGGATTACTGGACGTGTGTGAATTGTCCCAAAACCGTGGTCGGATCCGATAGGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCAGGAAGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCTCCTAGTTGTGGTCGCATCCGACTAGGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCCGCGCGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGACGAATGGACGTGTGTGAATTGTCCAGTGTGGGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCAGGAAGTTGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCGCATATAAGTGGTCGGATCCGACTAGGAGCTTGTGAATTCTGGATCTTAGGACGTGTGTGAATTGTCCAACACGGGTGGTCGGATCCGACTA
CODE1 CODE2
SequencesSequences
1
20
Code 1Code 1
Code 2Code 2200
1
Sequence
b)
8
13
18
23
28
33
38
43
High-Throughput Sequencing
DEL4000 Library
Targetprotein
Targetprotein
a)
I II III IV
Figure 3-6: Selection and high-throughput sequencing workflow. a) DEL4000 was incubated with the
target protein immobilized on a sepharose resin (i). The resin, containing the retained DNA-encoded
binding molecules was washed several times (ii) and used as template in a polymerase chain reaction
(PCR) amplification (iii) prior to high-throughput sequencing decoding (iv). b) An in-house made C++
program processes various thousands of raw DNA sequences after high-throughput sequencing (left
panel). All the codes 1 and 2 present in every sequence are identified and plotted in a 3D graph (right
panel). On the xy plane all the 4000 different possible compounds are represented as combination of
Code1+Code2, while on the z axis the number of counts for a specific combination (compound) is
reported.
3.2.1 Streptavidin selection We have initially assessed the relative composition of the new library and its
functionality by performing selection experiments on sepharose resin coated with
streptavidin. Since a variety of streptavidin ligands were known with dissociation
constants ranking from the mM to the fM54 range, the challenge was to investigate
whether binders with different affinities could be isolated from a library containing
4’000 members. D-desthiobiotin was chosen as positive control binder for
streptavidin (Kd = 47nM)54 and a D-desthiobiotin-oligonucleotide-conjugate was
63
synthesized, unambiguously encoded and added to a final concentration of 1 pM to
the library of 4000 compounds (20 nM total DNA concentration). Subsequently the
spiked library was either added to similar amount of streptavidin-sepharose slurry or
to sepharose slurry without streptavidin. Both resins were preincubated herring sperm
DNA to prevent aspecific binding. After incubation for 1 h at 25 ºC the beads were
washed 4 times with PBS buffer and used as template for PCR amplification of the
selected codes.
64
3.2.1.1 Identification of streptavidin binding molecules
Figure 3-7 shows the results of the high-throughput sequencing analysis performed
on the library before selection, after selection on unmodified sepharose resin used as
negative control, and after selection on streptavidin-coated sepharose.
Figure 3-7: Plots representing the frequency (i.e., sequence counts) of the 4000 library members before
selection, after selection on empty resin and after selection on streptavidin resin, as revealed by high-
throughput 454 sequencing. The chemical structures of some of the most relevant straptavidin binders
are indicated. The building blocks used in the two synthetic steps are indicated in green and red color
respectively, together with the respective identification number. A known streptavidin binder
(desthiobiotin) had been mixed with the library at low concentration prior to the selections serving as
positive control.
65
High-throughput sequencing of the library containing 4000 DNA-encoded compounds
yielded up to 12.000 sequences per sample. The counts for individual library codes (z
axis of the 20 x 200 matrices in Figure 3-7) indicate the abundance of the
corresponding oligonucleotide-compound conjugate. As expected, compounds were
found to be represented in comparable amounts in the library before selection. The
average counts and the standard deviations for the 4’000 compounds were found to be
1.72 +/- 1.42 when analyzing 7’336 individual codes from the library before selection.
Similarly, no striking enrichment was observed for selections on unmodified resin. By
contrast, the decoding of the streptavidin selection revealed a preferential enrichment
of certain classes of structurally-related compounds (Figure 3-7). In addition to
desthiobiotin, a biotin analogue with nanomolar affinity to streptavidin,54 which had
been spiked into the library as positive control prior to selection (see Chapter 3.2.1),
we observed an enrichment of derivatives of the thioester moiety 78, of the ester
moiety 49, as well as of other pharmacophores (e.g., 175). Fluorescent amide
derivatives of compounds 49 and 78 had previously been found to bind to streptavidin
with dissociation constants in the millimolar range, as assessed by fluorescence
polarization assays,54 while others (e.g., 175) had not previously been reported as
streptavidin binders.
3.2.1.2 Characterization of streptavidin binding molecules
In order to evaluate whether the extensions of the pharmacophore 49 and 78 moieties
within the new 4’000-membered chemical library (02, 07, 11, 15, 16, 17 depicted in
green color in Figure 3-7, Chapter 3.2.1.1) contribute to an increased affinity towards
streptavidin, we measured the dissociation constants of the most enriched compounds
by fluorescence polarization at 25 °C, following conjugation to fluorescein (Figure 3-
8a; see also Chapter 3.2.4). Additionally, to assess the specificity of preferentially
enriched compounds, we determined the binding affinities towards two unrelated
proteins (bovine carbonic anhydrase II and hen egg lysozyme Figure 3-8b) serving as
negative controls, and we included four non-enriched compounds (15-117, 02-107,
13-40 and 15-78) in the analysis.
66
0
50
100
150
200
250
300
350
400
10-8 10-7 10-6 10-5 10-4 10-30
50
100
150
200
250
300
350
400
10-8 10-7 10-6 10-5 10-4 10-30
50
100
150
200
250
300
350
400
10-8 10-7 10-6 10-5 10-4 10-3
Concentration [M]
Streptavidin Carbonic anhydrase II Lysozymea) b) c)
Concentration [M] Concentration [M]
Fluo
resc
ence
Pol
ariz
atio
n [m
P]
02-78
07-78
17-78
16-78
17-49
11-78
02-49
02-107
13-40
15-117
15-78
(108)
(73)
(70)
(55)
(32)
(48)
(41)
(0)
(2)
(0)
(7)
(In brackets the counts)
Br HN
O NH
DN A
O
S
CH 3O
7 8
02
N HO tB u
O
OHNDN A
H N O
S
C H 3O
0 7
7 8
I H N
O NH
D NA
O
S
C H 3O
78
1 7
HN
O NH
DN A
O
S
CH 3O
78
16
SI H N
O NH
D NA
O
C H 3
4 9
1 7
O
O M e
M eO
O(C H 2 )4
HNO
DN A
H NO
SM e
O
7 8
1 1
Br HN
O NH
DN A
O
O
CH 3
49
02
O
NH
O
OO M e
HO
N OO
4 0 O
HN D NA
13
SHN OO
HNO
OCH 3
HN D NA
O
1 5
1 17
SHNO
ONH
H NDN A
O
O
SM e O
15
7 8
Figure 3-8: Dissociation constants of the selected compounds determined by fluorescence polarization.
Individual compounds identified in the streptavidin selection experiments were synthesized as
fluorescein conjugates and incubated with different concentrations of target proteins (streptavidin,
bovine carbonic anhydrase II and hen egg lysozyme). a) The top streptavidin binding molecules
[various shades of blue], identified with at least 30 counts (see Chapter 3.2.4), exhibited a preferential
binding towards streptavidin, with Kd values ranging between 350 nM and 11 μM). By contrast, non-
enriched compounds (shades of red) did not exhibit an appreciable binding to streptavidin (Kd > 50
μM). b, c) Neither the streptavidin binders nor the non-enriched compounds exhibited an appreciable
binding to carbonic anhydrase II or to lysozyme. The structures of the 11 compounds can be found in
Figure 3-7.
The dissociation constants towards streptavidin of the most enriched compounds
ranged between 350 nM and 11 μM [Kd (17-49) = 350 nM; Kd (02-78) = 385 nM; Kd
67
(17-78) = 374 nM; Kd (02-49) = 804 nM; Kd (16-78) = 1.1 μM; Kd (11-78) = 3.5 μM;
Kd (07-78) = 11 μM; Figure 3-8a]. These compounds, each represented at least 30
times in the high-throughput sequencing results, were found at least ten-times more
frequently after selection on streptavidin, compared to their occurrence in the
unselected library and to what would be predicted by a random statistical distribution
(for a simulation, see Chapter 3.2.4). By contrast, four randomly chosen negative-
control compounds, experimentally found less than 7-times after sequencing,
exhibited Kd values to streptavidin > 50 μM (Figure 3-8a). Importantly, all
compounds exhibited no appreciable binding affinity (Kd > 200 μM; Figure 3-8b,c)
towards lysozyme and carbonic anhydrase serving as negative control proteins, thus
confirming the specificity of the streptavidin selection. Table 3-2 summarizes the
dissociation constants of the tested compounds towards the different targets.
Fluorescent
Compound
Counts after DEL4000
selection
Streptavidin
Kd (μM)
Lysozyme
Kd (μM)
Carbonic anhydrase
Kd (μM)
13-40 2 54 384 703
11-78 48 3.5 753 781
17-49 32 0.35 1.9e3 225
17-78 70 0.37 834 264
16-78 55 1.1 5.2e3 1e5
15-117 0 99 1.3e7 1.58e8
02-49 41 0.80 1.9e4 452
02-78 108 0.38 3.9e3 5.4e7
07-78 73 11 9.8e8 448
15-78 7 79 1.9e7 6.9e6
02-107 0 50 694 1.4e8
Table 3-2: Complete list of the dissociation constants towards different targets of the selected
compound fluorescein conjugate revealed by fluorescent polarization measurements.
68
3.2.2 Polyclonal human IgG selection Immunoglobulin G (IgG) is an immunoglobulin consisting of two heavy chains γ (H)
and two light chains (L) linked to each other by disulfide bonds, with a total
molecular weight of approximately 150KDa.85 As for the other immunoglobulins, the
variable portion (V-domain) of the heavy and light chains (VH and VL respectively) of
IgG confers to the antibody the ability to bind specific antigen, whereas the constant
domains (C domains, CH and CL respectively) determine the isotype and therefore the
functional properties of the antibody.85 The IgG is the most abundant immunoglobulin
with four different isotypes (IgG1, 2, 3, and 4 in humans) representing the 75% of
serum immunoglobulins in humans. 85 IgG molecules are synthesised and secreted by
plasma B cells and are predominantly involved in the secondary antibody response. 85
Two antigen binding sites allow the binding of IgG to a variety of pathogens (viruses,
bacteria and fungi), protecting the body against them by agglutination,
immobilization, complement activation, opsonization for phagocytosis and
neutralization of their toxins. 85 IgG plays a fundamental role in the immune defence
against pathogens and certain monoclonal antibodies can be used for pharmaceutical
applications. Consequently the production and engineering of therapeutic antibodies
has attracted the interest of numerous pharmaceutical companies.86 For this reason, in
a second selection of DEL4000 library we aimed to identify small organic molecules
which display binding to polyclonal human IgG, immobilized on CNBr-activated
sepharose, which could be useful for affinity purification of human IgG in the
industrial manufacture practice.
3.2.2.1 Identification of polyclonal IgG binding molecules
After selection of the library DEL4000 on polyclonal human IgG-sepharose resin and
a PCR amplification step, high-throughput sequencing decoding was performed. A
total 39’092 sequence tags were identified. Figure 3-9 graphically summarizes the
high-throughput sequencing results, revealing a superior enrichment, after selection of
the derivatives of the compound 40 (927 times overall combination counts) and of the
thiophene moiety 69 (927 times overall combination counts). Typically, bromide 02-
40 was identified 96 times out of a total 39’092 identified sequence tags, while >50%
of library members were detected between 1 and 10 counts and approximately 10% of
the compounds were identified over 20 counts (see also Chapter 3.2.4).
69
20
60
100
1
2001
20
69
02
1640
02
40118
08
18 69B r
N H
O
O
M e O
H O
N O 2
O
NH
D NA
B r
NH
S O
OHN
D N A
S
HN
OO
O M eHO
O 2N
HNO
DN A
NHS
O
O
HN
D NA
NH
O
O
O
O
HN
D N A
Figure 3-9: Plot representing the frequency (i.e., sequence counts) of DEL4000 library members after
selection on polyclonal human IgG resin, as revealed by high-throughput 454 sequencing. The
chemical structures of some of the most relevant compounds enriched are indicated. The building
blocks used in the two synthetic steps are indicated in green and red colour respectively, together with
an identification number.
70
3.2.2.2 Characterization of polyclonal IgG binding molecules by affinity
chromatography resins
Using the diamino linker O-bis-(aminoethyl) ethylene glycol, compound 02-40 and
16-40 were coupled to CNBr-activated sepharose, and the resulting resin was
evaluated for its performance in the affinity capture of labelled (Cy5 fluorescent dye
and biotinilated) polyclonal human IgG, spiked into Chinese hamster ovary (CHO)
cell supernatant. After loading 100 μL (4 μM) of labeled polyclonal human IgG on 70
mg either of 02-40-sepharose resin or of 16-40-sepharose resin, the affinity
chromatography columns were washed with 5 mL PBS, 5 mL 500 mM NaCl, 0.5 mM
EDTA and 5 mL 100 mM NaCl, 0.1% Tween 20, 0.5 mM EDTA and eluted three
times with 200 μL of triethylamine 100 mM. All the fractions were collected and
concentrated back to a final volume of 100µL by centrifugation and consequently
analyzed by gel electrophoresis. Figure 3-10 shows that both IgG labelled with the
fluorophore Cy5 and with biotin could be completely and selectively captured from
the supernatant, and could be eluted using 100 mM aqueous triethylamine solution.
HN O
HN
2 OO
Ores in
H N
O
M In W E (+)
Coomassie Blue Cy5 Detection
M In W E (+) M In W E (+) M In W E (+)
Coomassie Blue Streptavidin-based blot
IgG (Cy5-labeled) IgG (biotinylated)
150
102
52
225
38
76
40
02 or 16
Figure 3-10: Affinity chromatography of CHO cells supernatant on resin containing the compound 02-
40 or 16-40, spiked with human IgG labeled either with Cy5 or with biotin. For antibody purifications,
relevant fractions were analyzed by SDS-PAGE both with Coomassie Blue staining and with a specific
detection method (Cy5 fluorescence and a streptavidin horseradish peroxidase-based blot,
respectively). M = molecular weight marker; In = Input fraction for the chromatographic process; W =
pooled washed fractions; E = pooled eluted fractions. The lane (+) corresponds to Cy-5 or biotin-
labeled polyclonal human IgG. In, W and E fractions were normalized to the same volume, prior to
SDS-PAGE analysis.
71
3.2.3 Matrix metalloproteinase 3 (MMP3) selection General catabolism of tissue structures by tumour-cell proteases provides access to
the vascular and lymphatic systems, thereby facilitating metastases and cancer
dissemination.87 Proteolytic enzymes, through their capacity to degrade extracellular
matrix (ECM) proteins, are important components of this process. Among protease-
like proteins, the matrix metalloproteinases (MMPs) are a group of 24 zinc-dependent
enzymes capable of degrading the ECM and the basement membrane and process
bioactive mediators.88 For this reason, MMPs have been the focus of much anticancer
research, with inhibitors investigated in clinical trials. The establishment of causal
relationships between MMP overexpression and tumour progression initially
encouraged the development of MMP inhibitors (MMPIs) as cancer therapeutics.89 In
addition to connective-tissue-remodelling functions, MMPs are known to precisely
regulate the function of bioactive molecules by proteolytic processing. For example,
MMPs mediate cell-surface-receptor cleavage and release, cytokine and chemokine
activation and inactivation, and the release of apoptotic ligands.90 These processes are
involved in cell proliferation, adhesion and dispersion, migration, differentiation,
angiogenesis, apoptosis and host defence evasion characteristic of the early stages of
tumour growth, before metastasis occurs.89,90 MMPIs may therefore be potentially
suitable for blocking cancer progression. At the same time, inhibitors with insufficient
specificity may suppress normal tissue function or host defence processes. Very
recently our group used 550 member ESAC library (for the technology see Chapter
2.1.2) in a two-step selection procedure for the identification of novel inhibitors of
stromelysin-1 (MMP-3), a matrix metalloproteinase involved in both physiological
and pathological tissue remodeling processes, yielding novel inhibitors with
micromolar potency suitable for subsequent medicinal chemistry optimization.57
Encouraged by the promising results we decided to perform a MMP3 selection with
the larger library DEL4000.
3.2.3.1 Identification of MMP3 binding molecules
Figure 3-11a shows the relative abundance of the individual compounds as obtained
from high-throughput sequencing. A different fingerprint compared to the streptavidin
and IgG selections was observed. Among the compounds which displayed the highest
72
enrichment, four compounds were selected (02-118, 13-17, 18-96, 17-104) and tested
for MMP3 binding and inhibition.
3.2.3.2 Characterization of MMP3 binding molecules
The MMP3 affinity constants of the compounds 02-118, 13-17, 18-96, 17-104 were
determined by fluorescence polarization at 25 °C, following conjugation to
fluorescein using the diamino linker O-bis-(aminoethyl) ethylene glycol (Figure 3-
11b). Compound 02-118 exhibited the best dissociation constant (Kd of 11 μM), while
the other selected compounds did not reveal an appreciable binding to MMP3 (Figure
3-11b). On the other hand the inhibition assays were performed incubating the MMP3
(500 nM) with a dilution series of the inhibitor (02-118, 13-17, 18-96, 17-104) using
Mca-Pro-Leu-Gly-Leu-Dpa-Ala-Arg-NH2 as fluorogenic substrate. Essentially no
substantial inhibition was observed for any of the compounds tested. The observation
led us to the conclusion that compound 02-118 likely binds yet at a site outside of the
catalytic pocket.
200
5
30
55
NO
O
I
I
H N
O
N HD N AO
Br
H N
O
NHDN ANH
N
O O
O
O
NH
DN A
Br
HN
O
ONH
O
DN A
2
118
13
17
18
96
104
17
1
1
20
55
5
02-11813-1715-11717-10418-96
0
50
100
150
200
250
10-7 10-6 10-5 0.0001
[MMP3] (M)
HN
O
HN
O
S
HN
O
O
O
O HHO
2
O
HN
a) b)
30
Figure 3-11: DEL4000 library selection with human MMP3. a) The plot represents the frequency (i.e.,
sequence counts) of DEL4000 library members after selection on human MMP3 resin, as revealed by
high-throughput 454 sequencing. The chemical structures of some of the most relevant compounds
enriched are indicated. The building blocks used in the two synthetic steps are indicated in green and
red colour respectively, together with an identification number. b) MMP3 affinity constants
determination by fluorescence polarization of the compounds 02-118, 13-17, 18-96, 17-104.
Compound 02-118 exhibited the best dissociation constant (Kd of 11 μM), while the other selected
compounds did not reveal an appreciable binding to MMP3.
73
3.2.4 Computational simulation of DEL4000 selections In order to assess whether the enrichment of a compound in high-thoughput
sequencing procedures is statistically significant, we simulated the stochastic
distribution of sequence counts, using software written in-house (Dr. Y. Zhang) in
C++. The program generated a pool of 4000 equally likely numerical codes
representing the 4000 member of DEL4000 library. According to the number of
sequences obtained after high-throughput sequencing decoding, a corresponding
number of codes picking was randomly performed by the software out of the
stochastic pool. The simulation was then repeated 100 times. The average simulated
distribution was plotted displaying the number of codes (i.e., DNA-encoded
compounds in the library), which would be observed with a given number of counts
(i.e., number of sequences) in an ideal library before selection, in which all library
members were equally represented. This simulated distribution was compared with
the experimental distribution of the sequence counts observed for the members of the
library after 454-assisted sequencing of the PCR reaction before selection (Figure 3-
12a), after selection on Tris-quenched resin (Figure 3-12b) and on streptavidin-resin
(Figure 3-12c), as well as resin coated with human matrix metalloproteinase 3
(Figure 3-12d) and with polyclonal human IgG (Figure 3-12e).
74
a) b) c) d) e)
*
Figure 3-12: Simulated and experimental distribution of sequence counts observed for members of the
library before selection (a) and after selection on Tris-quenched resin (b), streptavidin-resin (c), as well
as resin coated with human matrix metalloproteinase 3 (d) and with polyclonal human IgG (e). The
plots display the number of codes (i.e., DNA-encoded compounds in the library), which were observed
with a given number of counts (i.e., number of sequences) either in the experimental 454-assisted
sequence of PCR reaction (performed before or after selection), or in a computer-assisted simulation.
While in the library before selection experimental findings and simulation are in excellent agreement,
in selection experiments certain compounds are enriched much more compared to what would be
predicted from the statistical distribution of sequence counts in an equimolar mixture of compounds.
The sequences of compounds in plot (c) identified with an asterisk were found more than 30-times;
these compounds were then chosen for the experimental affinity determination (Figure 3-8). The
individual plots exhibit a different maximum for the simulated curve of number of codes observed with
a certain number of counts, due to differences in the overall number of experimental sequences (e.g.,
7’336 overall sequence counts for the library before selection; 39’032 overall sequence counts for IgG
selections).
While in the library before selection experimental findings and simulation are in
excellent agreement, in selection experiments certain compounds are enriched much
more compared to what would be predicted from the stochastical distribution of
sequence counts in an equimolar mixture of compounds. The sequences of
compounds in plot of Figure 3-12c identified more than 30-times (indicated with an
asterisk in Figure 3-12c) were then chosen for the experimental affinity determination
(see Chapter. 3.2.1.1, Figure 3-7). Notably, the individual plots exhibit a different
maximum for the simulated curve of number of codes observed with a certain number
of counts, due to differences in the overall number of experimental sequences (e.g.,
7336 overall sequence counts for the library before selection; 39032 overall sequence
counts for IgG selections).
75
3.3 General strategies for the stepwise construction of very large
DNA encoded chemical libraries
The demonstration that high-quality DNA-encoded chemical libraries could be
synthesized and decoded using 454 high-throughput sequencing technology
encouraged us to investigate methodologies for the construction of larger DNA-
encoded chemical libraries of unprecedented size, (potentially comprising >106
compounds), featuring the stepwise addition of at least three independent sets of
chemical moieties and identification oligonucleotide tags. Therefore we investigate a
three rounds split-&-pool chemical library synthesis based on selective deprotection
and reaction of di-amine carboxylic acid derivative core scaffolds as well as three
different encoding strategies, featuring the stepwise insertion of three independent
oligonucleotide codes using experimental procedures based either on the sticky-end
ligation of DNA fragments and/or annealing of partially complementary
oligonucleotides, followed by Klenow-assisted polymerization.
3.3.1 Selective deprotection and reaction of di-amine derivatives The general strategy for the construction of a DNA-encoded chemical library
consisting of N x M x K modules (i.e., 10 x 200 x 200 compounds) joined together by
the formation of an amide bond using a split-&-pool procedure is given in Figure 3-
13 (see also Chapter 2.1.1.4, Figure 2-8). Initially, a set (i.e., N = 10) of di-amino
protected carboxylic acids is conjugated to distinct amino modified synthetic
oligonucleotides. Cleavage of one amino moiety protective group (PG1) of each of the
core scaffolds followed by split-&-pool amide bond formation reaction with selected
carboxylic acid (i.e., M = 200) and subsequently enzymatic encoding lead to a first
sub-library pool of N x M members. After removal of the further amino moieties
protective group (PG2) and split, the N x M pool may undergoes an additional amide
bond formation reaction with suitable carboxylic acids (i.e., K = 200). Encoding of
the last set of carboxylic acids used through enzymatic elongation of the
oligonucleotide tags and pooling of the reaction may lead to the final library mix of N
x M x K member compounds (i.e., 400’000).
76
NHPG2
PG1NHNHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
NHPG2
. . . . . . . . . . . .
EncodingSplit /Reaction
Pool EncodingSplit /Reaction
Pool....
....
NHPG2
PG1NH
NHPG2PG1NH
10N
1
2
n
200M
1
2
n
200K
Figure 3-13: General strategy for the construction of a DNA-encoded chemical library consisting of N
x M x K modules (i.e., 10 x 200 x 200 compounds). The amino protective groups (PG1) of a set (i.e., N
= 10) of di-amino protected carboxylic acids conjugate to unique oligonucleotide tags are removed.
Subsequently in a split-&-pool fashion amide bond formation reaction is performed with selected
carboxylic acid (i.e., M = 200). After enzymatic encoding, cleavage of the further amine protective
group (PG2) allows an additional split-&-pool amide bond formation reaction with carboxylic acids
(i.e., K = 200). DNA Encoding of the final modifications led to the final DNA-encoded library of N x
M x K compounds (i.e., 400’000).
3.3.1.1 Orthogonal protective group and selective deprotection
The choice of appropriate orthogonal protective groups and of convenient di-amino
carboxylic acid core scaffolds is crucial for the construction of a DNA-encoded
library as described in the previous paragraph (Figure 3-13). A list of useful
protective groups for amino moieties with suitable removal condition compatible with
the DNA is shown in Table 3-3. The assessment of the effective DNA-compatibility
of the cleavage condition (for Fmoc cleavage see Chapter 3.1.2) was obtained by
coupling a specific N-protected cis-2-aminocyclopentanecarboxylic acid to a 5’-
amino-modified oligonucleotide. Following purification by HPLC, the cleavage of
77
the amino moiety was performed and analyzed by HPLC and mass spectrometry
(Table 3-3).
1a-d
N-protected cis-2-aminocyclopentanecarboxylic acid
Compound Protective Group
(PG) Name Type Cleavage HPLC yield
1a O
O
9H-fluoren-9-methyl
carbamate
(Fmoc)
Base labile
Piperidine
500mM
water/DMSO, 4 ºC,
1h
Quantitative
1b
O M eO 2N
O M eO
O
4,5-dimethoxy-2-
nitrobenzyl
carbamate
(Nvoc)
Photocleavable
366 nm, 1mM
AcOH/AcONa
pH 4.7 Pyrex, 4 ºC,
30min
Quantitative
1c O
pent-4-enamide
Iodo
lactonization
I2 THF/water
1h 80 %
1d O O
2-(biphenyl-4-
yl)propan-2-yl
carbamate
(Bpoc)
Acid labile
AcOH/AcONa
water
pH 3-4, 35 ºC, 1h
90 %
Table 3-3: Protective groups for amino moieties with compatible with the DNA. Cis-2-
aminocyclopentanecarboxylic acid was protected on the amino moiety with a selected protective group
and coupled on a 5’-amino-oligonucleotide. Following conjugation, removal of the protective group
was performed and HPLC assessed the yield of the cleavage.
Prior to scaffolds preparation, we investigated the selectivity of the orthogonal-
removal of a combination of two amino-protective groups in presence of DNA. We
explored the use of Fmoc (base labile) and Nvoc (photo-cleavable) amino protective
group combination. After coupling of Nα-Fmoc, Nε-Nvoc lysine (2) to an amino-
modified oligonucleotide, Fmoc was removed through addition of piperidine and the
COOH
NHPG
78
completeness of the reaction assessed by HPLC (Figure 3-14). The mass of the
expected N-Nvoc amino-acid oligonucleotide conjugate was confirmed by ESI-MS as
the only product of the reaction.
(C12)
1) Sulfo-NHSEDCDMSO30°C, 15min
2)
TEA/HCl pH = 1030 °C, o/n
3) Piperidine 500mM4°C, 1h
(C12)NH2
NvocCl 1eq.Na2CO3 2eq.
water/dioxane2h
O M eO 2 N
O M eOCl
O
= NvocCl
HN
NH2
Nvoc
O
NH
H2N COOH
NHFmoc
HN COOH
NHFmoc
Nvoc
2
Figure 3-14: Selective removal of Fmoc protective group on model Nα-Fmoc, Nε-Nvoc di-amino
carboxylic acid oligonucleotide conjugate. Initially the terminal amino moiety of 2-N-Fmoc lysine was
protected by mean of NvocCl reagent (i). Following coupling to 5’-amino-oligonucleotide, piperidine
was added and Fmoc removed (ii). After HPLC, ESI-MS revealed N-Nvoc amino-acid oligonucleotide
conjugate as the only product of the reaction.
3.3.1.2 Core scaffolds design and synthesis strategy
The confirmation that Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid conjugated to
amino-modified oligonucleotide allows the selective cleavage of the Fmoc moiety,
quantitatively yielding in the corresponding N-Nvoc protected di-amino acid
oligonucleotide conjugate, led us to the preparation of a variety of Nα-Fmoc, Nε-Nvoc
di-amino carboxylic acid core scaffolds for investigating the feasibility of the library
synthesis pathway. Figure 3-15 depicts four convenient strategies for the preparation
of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acids.
79
∗
∗∗
NvocHN
COOH
FmocHN∗
∗∗
BocHN
COOM e
HO
2HCl
COOH
NH2H2N 1. NvocCl 1eq.DIEA, DMF/water
2. FmocCl
COOH
NHNvocFmocHN
1. Pd(PPh3)2Cl2K2CO3DME/EtOH/waterμW
2. FmocCl
+
X = Br, IR = substituent with primary amino moiety
R
X
XB(OH)2
R
COOH
NvocHN
B(OH)2
COOH
NvocHN
NvocHN
HOOC
R
NHFmoc
+
NvocHN
R
NHFmoc
COOH
NH2
NHFmocHOOC NvocCl 1eq.
DIEA,
DMF/water NHNvoc
NHFmocHOOC
a)
b)
c)
d)
or or
Figure 3-15: Scheme summarizing convenient strategies for the preparation of Nα-Fmoc, Nε-Nvoc di-
amino carboxylic acid scaffolds. a) Synthesis of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid chiral
scaffolds. All the eight initial diastereomers are available commercially. The synthetic strategy
allowing the preparation of the final Nα-Fmoc, Nε-Nvoc product can be found afterwards in Figure 3-
16. b) Preparation of aromatic Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid symmetric scaffold. c)
Synthesis of biphenyl Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid core scaffolds by means of Suzuki
cross-coupling microwave assisted using either amino boronic acid and suitable aromatic halides or
amino carboxylic boronic acid and opportune amino halides. d) Synthesis of short alkyl linkers as Nα-
Fmoc, Nε-Nvoc di-amino carboxylic acid.
Notably, the strategy in Figure 3-15a allows in a straightforward fashion the
synthesis of a large variety of stereoisomeric core scaffolds starting from chiral
precursor compounds. Conversely, the strategies in Figure 3-15b,c,d describe
convenient pathways for the preparation of aromatic, bi-phenyl and alkyl Nα-Fmoc,
Nε-Nvoc carboxylic acid building-blocks respectively.
80
3.3.1.3 Model compounds for N-Fmoc, N’-Nvoc di-amino carboxylic acid core
scaffold based library.
In order to demonstrate the possibility of constructing a DNA encoded library by
means of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid core scaffold in a three split-
&-pool rounds approach, we initially synthesized two building blocks (Nα-Fmoc, Nε-
Nvoc -lysine and (1S,3R,4R)-3-Nvoc-amino-4-Fmoc-amino-cyclopentanecarboxylic
acid) according to the reaction scheme given previously (see Chapter.3.3.1.1, Figure
3-14) and in Figure 3-16a.
1. MeSO2ClTEA, CH2Cl2
2. NaN3DMF
H2 1 atmPd/C
MeOH
FmocClDIEA
DMF
HCl 2N
water/dioxane
H C l
NvocClNa2CO3
water/dioxane
BocHN
C OOM e
HO
BocH N
COO Me
N 3
BocHN
COO Me
H 2N
BocH N
C OO Me
FmocH N
H 2N
C OO H
FmocH N
NvocH N
C OO H
FmocH N
== NvocHNNH2
OHNNvocHN
H2N
NH
O
(C12) NH2
NHNvoc
(C12) NH2
NHNvoc
Sulfo-NHSEDC DMSO30°C, 15min
HOOC
NHO H
HN
HS
COOH
366nm, pyrex1 mM AcOH/AcONa(pH 4.7), 30min
(C12) NH
NH2
O
COOH
NHO H
HN
H
BiotinO
(C12) NH
NH2
Biotin
Desthiobiotin
(C12) NH
NH
O
O
Desthiobiotin
O
(C12) NH
NH
Biotin
Desthiobiotin
O
Sulfo-NHSEDC DMSO30°C, 15minCoupling I Nvoc emoval
Coupling II
b)
a)
3 4 5
6 7 8
Figure 3-16: Preparation of DNA-encoded model compounds of Nα-Fmoc, Nε-Nvoc protected di-
amino carboxylic acid based library. a) Reaction scheme for the synthesis of an Nα-Fmoc, Nε-Nvoc di-
amino carboxylic acid chiral scaffold. The synthesis has been accomplished with an overall yield of 70
%. b) The N-Nvoc di-amino acid oligonucleotide conjugates were coupled by amide bond formation
reaction to biotin or to 3-p-tolylpropanoic acid. Irradiation at 366 nm at 4°C for 30 min in 1 mM
AcOH/AcONa (pH 4.7), enables Nvoc removal. In a final step, the two resulting compounds were
coupled to a further carboxylic acid (desthiobiotin). HPLC and ESI-MS analysis confirmed the identity
of the expected products.
81
Following coupling of the Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative to
5’-amino-modified oligonucleotide and Fmoc deprotection, the product was purified
over HPLC. Biotin or to 3-p-tolylpropanoic acid as model carboxylic acids were
coupled by amide bond formation reaction to the N-Nvoc di-amino acid
oligonucleotide conjugates. The reaction was then purified from small sized organic
contaminants by precipitation as sodium phosphate adducts (Figure 3-16b).
Following Nvoc removal by irradiation at 366 nm at 4°C for 30 min in 1 mM
AcOH/AcONa (pH 4.7), the oligonucleotide conjugats was precipitated once more as
sodium phosphate complex (Figure 3-16b). In a final step, the two resulting
compounds linked to the oligonucleotide were coupled to a further model carboxylic
acid (desthiobiotin). HPLC and mass spectrometry analysis revealed an overall
conversion of 35% and 38%, respectively, into the desired bi-sobstituted
oligonucleotide conjugates.
82
3.3.2 Stepwise DNA-encoding Encouraged by the preliminary results on selective deprotection and split-&-pool
reaction of the di-amine derivatives (see Chapter 3.3.1) , the next challenge prior to
the library construction was to investigate methodologies for the stepwise addition of
at least three independent sets of oligonucleotide tags as identification code for the
chemical building blocks. Three experimental procedures were explored to add in a
stepwise fashion the oligonucleotide fragments either by DNA ligation enzyme using
sticky-end oligonucleotides or through annealing of partially complementary DNA
fragments followed by Klenow-assisted polymerization. The feasibility of the three
alternative strategies was demonstrated by gel-electrophoretic analysis and by DNA
sequencing of the final construct.
3.3.2 Encoding by ligation
Figure 3-17 features the stepwise addition of groups of chemical moieties onto an
initial scaffold followed by the sequential addition of the corresponding DNA codes
by an iterative ligation procedure. This scheme (Figure 3-17a) is conceptually simple
and can be implemented experimentally, but requires two double-stranded DNA
fragments with sticky ends for each encoding event (i.e., 200 + 200 + 200
oligonucleotides for a library containing 100 x 100 x 100 chemical groups). Native
PAGE analysis with 20% TBE revealed the identity and purity of the DNA fragments
used in the encoding procedure (Figure 3-17b).
5‘3‘ Code AReaction
5‘Code A
5‘
3‘
3‘
5‘Code A
5‘
3‘
3‘5‘ 3‘
5‘Code A
3‘5‘
3‘
5‘3‘ Code A
5‘ 3‘
Ligation Code B
LigationReaction Code B Code BCode C
Code B
a)
100
6040
20
a1
a2a4
M a1 a2 a3 a4a5 a5
a3*
b)
Figure 3-17: Stepwise encoding by ligation. a) Encoding strategy based on the sequential ligation of
double-stranded DNA fragments. b) Native PAGE analysis with a 20% TBE gel revealed the identity
and purity of the DNA fragments used in the encoding procedure. M: marker; a1) single strand 28mer
DNA fragment; a2) single stranded 32mer DNA fragment; a3) hybridization of 28mer (a1) with the
32mer (a2) DNA fragments; a4) double-stranded DNA 50mer first ligation step product; a5) Double-
stranded 78mer second ligation step product. *) The band is the hybridized oligonucleotide (of a 28mer
and 24mer) carrying the Code C which was used in excess.
83
3.3.2.1 Encoding by a combination of Klenow polymerase and ligation
An encoding strategy featuring the combination of the Klenow-assisted encoding
strategy (see Chapter 3.1.8) and the encoding by ligation (see Chapter 3.3.2.1) is
depicted in Figure 3-18a. A double-stranded DNA fragment generated by Klenow
fill-in using a biotinylated template is digested with a non-palindromic cutter (i.e.,
BssSI), followed by streptavidin capture of the biotinylated residual fragments
(Figure 3-18a). Subsequently, a ligation step with a complementary double-stranded
DNA fragment carrying the third code is performed (Figure 3-18a). Denatured PAGE
analysis using a 15% TBE-Urea gel revealed the purity and identity of the DNA
fragments generated in the encoding steps (Figure 3-18b).
5‘3‘ Code AReaction 5‘3‘ Code A Annealing
StreptavidinCaptureLigation
5‘Code A3‘ Code BCode C
3‘5‘
*
M a1 a2 a5a3 a4
a1
a5
a6
a6
Biotin
5‘3‘ Code A 5‘Code A
5‘
3‘
3‘
Code BKlenow
Biotin
5‘ 3‘
BssSI
Code B
Non-palindromic
Biotin
BssSI Digestion
5‘
5‘
3‘
3‘
Reaction
5‘Code A
5‘
3‘
3‘
Code B
Biotin
a2
a3 a4
100
70
40
a)
b)
Figure 3-18: Stepwise encoding by combination of Klenow polymerase and ligation. a) Encoding
strategy based on the formation of a double-stranded DNA fragment by a Klenow-assisted
polymerization step, followed by the ligation of a DNA-fragment carrying the third code. b)
Denaturing PAGE analysis using a 15% TBE-Urea gel revealed the purity and identity of the DNA
fragments generated in the encoding steps. M = marker; a1) single strand 42mer DNA fragment a2)
44mer partially complementary 5’-biotinylated single stranded DNA fragment; a3) 27mer and 23mer
hybridized DNA ligation fragments; a4) Klenow assisted polymerization 68mer product; a5) BssSI
digestion product (54mer); a6) full-length (81mer) DNA fragment. *) The band is an artefact resulting
from incomplete denaturation. If excised, extracted and loaded on a gel, this band migrates at the
expected height of a double-stranded 81-base DNA fragment.
84
3.3.2.2 Encoding by Klenow polymerase
The synthetic and encoding strategy depicted in Figure 3-19a represents a natural
extension of the encoding strategy used in the assembling of DEL4000 library (see
Chapter 3.1.8), which would require the lowest number of oligonucleotides for library
encoding (100 + 100 + 100 oligonucleotides for a library containing 100 x 100 x 100
chemical groups). The feasibility of the experimental procedure was demonstrated by
denaturing 15% TBE-Urea gel-electrophoretic analysis (Figure 3-19b), which
monitored the stepwise assembly of DNA-fragments of suitable size.
Biotin
5‘3‘ Code AReaction
5‘Code A3‘
5‘3‘ Code A
Reaction Code B
Annealing 5‘3‘ Code A 5‘Code A
5‘
3‘
3‘
Code BKlenow
Biotin
StreptavidinCapture
5‘Code A3‘ Code B1) Annealing
2) Klenow
5‘Code A3‘ Code BCode C
3‘5‘
100
M a1
a1
a2 a4a3 a5
a4a3
a5
Code B 3‘a2
5‘
70
40
a)
b)
Figure 3-19: Stepwise encoding by Klenow polymerization. a) Encoding strategy based on the
formation of a double-stranded DNA fragment by the sequential use of two Klenow-assisted
polymerization steps, starting from partially complementary oligonucleotides. b) Denaturing PAGE
analysis performed using a 15% TBE-Urea gel revealed the purity and identity of the DNA fragments
generated in the three Klenow-mediated encoding steps. M = marker; a1) single stranded 5’-
aminomodified 42mer DNA fragment; a2) partially complementary 3’-biotinylated single stranded
DNA fragment; a3) 42mer single-strand DNA fragment partially complementary to first Klenow step
product; a4) single-strand 66mer DNA product following first Klenow step polymerization and
purification; a5) full- length (90mer) DNA fragment, following purification.
85
3.3.3 Summary Based on the promising results achieved with the selective deprotection and reaction
of di-amine derivatives (see Chapter 3.3.1 and Chapter 3.3.2), we have investigated
the feasibility of a step-by-step encoding of a model library member comprising three
building blocks. The N-Fmoc, N’-Nvoc protected di-amino carboxylic acid depicted
in Figure 3-20a was coupled to a 5’-amino-modified oligonucleotide. Following
Fmoc removal, the oligonucleotide conjugate was coupled to a further model
carboxylic acid (3-p-tolylpropanoic acid) by amide bond formation reaction and
precipitation as sodium phosphate adducts (Figure 3-20a). A subsequent Nvoc
removal step (Chapter 3.3.1.3) allowed the modification of the resulting
oligonucleotide derivative, carrying a reactive primary amino group. In order to
generate a DNA fragment carrying three codes which univocally identify the building
blocks used for library construction, three experimental strategies were envisaged and
experimentally demonstrated. One of the three strategies is depicted in Figure 3-20,
featuring the use of a biotinylated oligonucleotide in the klenow-assisted fill-in
reaction for the introduction of the second code. A third step in the encoding
procedure, featuring the ligation of a Cy3-labeled double stranded DNA fragment,
allowed the monitoring of the encoding procedure not only by EtBr DNA staining, but
also by fluorescence imaging of gel-electrophoresis (Figure 3-20b). DNA sequencing
confirmed the identity and the purity of the DNA constructs (Figure 3-20c).
86
1. Coupling
2. Ethanol precipitation
1. Coupling
2. Fmoc removal3. HPLC
(C12)NH2
(C12)
HN
NH2
Nvoc
O
NH
HN
NHFmoc
Nvoc
COOH
HOOC
(C12)NH
HN
HN
OO
Nvoc
1. Nvoc removal2. Klenow encoding
Biotin
5‘
BssSI
3‘
5‘3‘
Biotin
(C12)NH
H2N
HN
OO
1. Cy5 Coupling2. Ethanol Precipitation
3. BssSI digestion
Biotin
5‘
3‘
3‘
(C12)NH
HN
HN
OO
O
Cy5
Ligationencoding
5‘3‘
3‘5‘
StreptavidinCapture 5‘
3‘ 5‘
Cy3
(C12)NH
HN
HN
OO
O
Cy5
EthidiumBromideFluorescence
Cy5Fluorescence
Cy3Fluorescence
* * **
*
60
70
100
80
a)
b)
Cy3
a1a2, a3
a4
a1 a2 a3 a4M
III
III
IV
V
IV
c)
87654321
ColonyRestriction site BamHICode C
Restriction site BssSICode B
Restriction site Ecor I Code A
10 15 19 24 43 50 63 68 70 7554 59
5’-GGAGCTTGTGAATTCTGGGTTAGTGGACGTGTGTGAATTGTCGATTACCAGTACTCGTGAAATTTGCTAGGATCCATATTG-3’
3‘- CCTCGAACACTTAAGACCCAATCACCTGCACACACTTAACAGCTAATGGTCATGAGCACTTTAAACGATCCTAGGTATAAC–5‘TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT TTTGCT Code C
GATTACCAGATTACCAGATTACCAGATTACCAGATTACCAGATTACCAGATTACCAGATTACCA
Code B
GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT GTTAGT Code A
Figure 3-20: Step-by-step synthesis and encoding of a model library member compound of N-Fmoc,
N-Nvoc di-amino carboxylic acid based library. a) N-Fmoc, N-Nvoc di-amino carboxylic acid
compound was conjugated to a 5’-amino-modified oligonucleotide (42mer), (i). Following removal of
Fmoc and HPLC (i), coupling reaction with 3-p-tolylpropanoic acid was performed (ii). Subsequently
Nvoc was removed by irradiation at 366nm and Klenow-assisted encoding was completed with a
partially complementary 5’-biotinylated oligonucleotide (44mer) carrying a BssSI restriction site (iii).
The extended DNA product was labelled with Cy5-N-hydroxysuccinimide ester reagent and restricted
with BssSI enzyme (iv). Incubation with streptavidin sepharose beads allowed the deletion of the small
DNA restriction products (v). Cy3-labelled oligonucleotide (23mer) carrying the third code was ligated
(vi). b) Gel- electrophoretic analysis with specific detection method (ethidium bromide, Cy5 and Cy3
fluorescence) monitored the stepwise assembly of DNA-fragments of suitable size. a1) Klenow assisted
polymerization 68mer product; a2) Klenow assisted 68mer product Cy5 coupled; a3) BssSI digestion
product (54mer); a4) full-length (81mer) DNA fragment after ligation with 23mer Cy3 labelled DNA
fragment. *) The band is an artefact resulting from incomplete denaturation. If excised, extracted and
loaded on a gel, this band migrates at the expected height of a double-stranded 81-base DNA fragment.
c) Bacterial cloning and Sanger sequencing of eight different bacterial colonies revealed the identity of
the DNA constructs.
87
4. DISCUSSION We have constructed a high-quality DNA-encoded chemical library containing 4000
compounds (DEL4000). This library was selected for the identification of novel
streptavidin, MMP3 and IgG binders. High-throughput sequencing of the library
before and after selection revealed the preferential enrichment of binding molecules.
In the case of the newly discovered streptavidin binders, we have observed that both
building blocks used for the stepwise synthesis of compounds in the library may
contribute to the resulting binding affinity. For example, we observed a >100-fold
difference in binding affinity between compounds 02-78 and 15-78, with Kd constants
= 385 nM and 78 μM, respectively, in line with their different recovery rates after
streptavidin selection (see Chapter 3.2.1.1 and 3.2.1.2, Figure 3-7 and Figure 3-8).
We have also shown that the encoding strategy followed for the construction of the
DEL-4000 library can be extended, for example by incorporating a third set of
chemical groups and corresponding DNA-coding fragments (see Chapter 3.3.2).
Recent advances in ultra high-throughput DNA sequencing with 454 technology
indicate that it should be possible to sequence over one million sequence tags per
sequencing run.44 Thereby, provided that two orthogonal synthetic procedures are
used which feature high coupling yields and which preserve the integrity of the DNA
molecule, it should be possible to construct, perform selections and decode DNA-
encoded libraries containing millions of chemical compounds.
The potential of using DNA tagging for the identification of binding compounds (e.g.,
in panning experiments) has long been recognized. However in the last few years
research in the field of DNA-encoded chemical libraries has been advanced by the
development of novel methodologies for library construction and decoding. The
recent interest in DNA-encoded chemical libraries is mainly related to the possibility
of constructing libraries of unprecedented size, which can still be screened at low
concentrations for protein binding, thanks to ultra-sensitive DNA detection
experimental procedures, such as the polymerase chain reaction (PCR) and high-
throughput DNA sequencing. In full analogy to antibody phage libraries, DNA-
encoded chemical libraries do not rely on biological assays for the identification of
the binding molecules, but rather on the physical separation of binding molecules
88
from non-binders. Therefore affinity selection with DNA-encoded chemical libraries
as shown in this work can be performed in one reaction tube with standard laboratory
equipment, even with target proteins for which screening assays are not yet available.
While the work presented in this Thesis clearly illustrates the potential of DNA-
encoded chemical libraries, challenges for the further improvements of this
methodology include the improvement of the synthetic procedures, of the encoding
strategies and of the read-out methodologies (i.e., high-throughput sequencing). The
relatively narrow choice of reactions for the conjugation of chemical moieties to DNA
oligonucleotides still represents a limitation, which deserves to be addressed in the
future.
At present, large pharmaceutical companies typically screen a few hundred thousand
compounds in their high-throughput screening campaigns facing enormous challenges
for the preparation, storage and screening of very large libraries of organic molecules,
not only from the synthetic point of view, but also in terms of logistics and analysis.
Furthermore, the costs associated with the identification of specific binding molecules
from a pool of candidates grow exponentially with the size of the chemical library to
be screened. Thus, the combination of large repertoires of organic molecules and
ingenious screening methodologies is recognized as an important approach for
isolating desired binding molecules. For this reason, selections of DNA-encoded
chemical libraries such as the one described in this Thesis may facilitate the
identification of binding molecules (“hits”) for pharmaceutical applications.
Among the selections described in this work, the identification of binders to
polyclonal human IgG appears to have the most direct application. At present,
monoclonal antibodies for therapeutic applications represent the fastest growing
sector of pharmaceutical biotechnology.86 Protein A sepharose, which is used in
virtually all industrial purification procedures for monoclonal antibodies, represents
the largest cost factor for the manufacture of therapeutic antibodies. In consideration
of the substantial costs, these resins are typically regenerated and re-used, which
complicates certain aspects of good manufacture practice. It could be conceivable to
replace protein A-based affinity supports with the affinity purification supports based
on IgG binding molecules, like the ones described in this work.
89
5. Material and Methods
5.1 Reagents and general remarks Unless otherwise denoted, chemical compounds and proteins were from Sigma-
Aldrich-Fluka (Buchs, Switzerland), resin for solid phase synthesis from
Novabiochem (Laufelfingen, Switzerland), enzymes from New England Biolabs
(Ipswich, MA, USA) and HPLC grade lyophilized oligonucleotides were from IBA
GmbH (Göttingen, Germany). SpinX columns were purchased from Corning Costar
Incorporated (Acton, MA, USA) and ion-exchange cartridges for DNA purification
from Qiagen (Hilden, Germany), (PCR purification cat.no 28104, Nucleotides
removal cat.no 28306) and used according to the protocol described by the provider.
NMR spectra were recorded with a Bruker 400 MHz spectrometer, with TMS as the
internal standard. All reactions involving air- and water-sensitive materials were
performed in flame-dried glassware under argon by standard syringe, cannula and
septa techniques. Precoated Merck 60 F254 alumina silica gel sheets were used for
TLC.
5.2 Synthesis of DEL4000 DNA Encoded Library
The individual organic compounds to be coupled to the 5’ amino-modified 42-mer
oligonucleotides were dissolved to a DMSO stock solution (100 mM), occasionally by
further addition of water or diluted hydrochloric acid. All HPLC were performed on
an XTerra Prep RP18 column (5µm, 10x150mm) using a linear gradient from 10% to
40% MeCN in 100 mM TEAA, pH 7. LC-ESI-MS were performed on an XTerra RP18
column (5 µm, 4.6x20 mm) using a linear gradient from 0% to 50% MeOH over 1
min in 400 mM HFIP/5 mM TEA. The mass spectra were measured from m/z 900 to
2000 by a Waters Quattro Micro instrument (Waters, Milford, MA, USA).
Oligonucleotide quantification was performed measuring the absorption at 260 nm
using a NanoDrop instrument (ND-1000 UV-Vis spectrophotometer).
90
5.2.1 Synthesis of library model compounds oligonucleotide conjugate. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds
were added to the respective final concentrations in the order: Fmoc-protected amino
acid (A, see Appendix) DMSO solution, 4 mM; N-hydroxysulfosuccinimide in
DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4
mM; aqueous triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide
aqueous solution, 50 µM, (5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC
TTA GGA CGT GTG TGA ATT GTC-3’). The reaction was stirred overnight at 25
°C; residual activated species were then quenched and simultaneously Fmoc
deprotected by addition of piperidine (500 mM in DMSO). Following HPLC
purification, coupling yield was estimated (see Appendix) and the desired fractions
were dried under reduced pressure and redissolved in 50 µL of water. The recovery
was determined measuring the absorption at 260 nm using a NanoDrop instrument
and the masses of the reacted oligonucleotides detected by LC-ESI-MS (see
Appendix). Subsequently, to a reaction volume of 310 µL, containing 70% (v/v)
DMSO/water, compounds were added to the respective final concentrations in the
order: amino acid (B, see Appendix) DMSO solution, 4 mM; N-
hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-
carbodiimide in DMSO, 4 mM; the resulting compound oligonucleotide-conjugate
aqueous solution, 15 µM; aqueous triethylamine hydrochloride solution, pH 9.0, 80
mM. After overnight stirring at 25 °C, residual activated species were quenched by
addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. The mixture was allowed to
quantitatively precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL
of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation
at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide pellet was
washed with ice-cold 90% (v/v) ethanol and then dissolved in 100 µL water.
Following HPLC, coupling yields on this reaction step was determined (see
Appendix). The desired fractions were dried under reduced pressure and redissolved
in 50 µL of water. The recovery was determined measuring the absorption at 260 nm
using a NanoDrop instrument and the masses of the reacted oligonucleotides detected
by LC-ESI-MS (see Appendix).
91
5.2.2 Coupling reactions of 20 Fmoc-protected amino acids. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds
were added to the respective final concentrations in the order: Fmoc-protected amino
acids DMSO solution, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N-
ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous
triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous
solution, 50 µM, (DEL_O_1_n, 1<n<20 : 5’-amino-C12-GGA GCT TGT GAA TTC
TGG XXX XXX GGA CGT GTG TGA ATT GTC-3’, XXX XXX unambiguously
identifies the individual Fmoc-protected amino acid compound). All coupling
reactions were stirred overnight at 25 °C; residual activated species were then
quenched and simultaneously Fmoc deprotected by addition of piperidine (500 mM in
DMSO). Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to
the reaction mixture. The reactions were then purified by HPLC and the desired
fractions were dried under reduced pressure and redissolved in 100 µL of water and
analyzed by LC-ESI-MS. The samples showed the expected Fmoc-deprotected
products. Typical coupling yields were >51% overall. 4.0 nmol of each DNA-
compound conjugate were pooled to generate a 20 member DNA encoded sub-library.
5.2.3 Coupling reactions of 200 carboxylic acids. To a reaction volume of 310 µL, containing 70%
(v/v) DMSO/water, compounds were added to the respective final concentrations:
DMSO-dissolved carboxylic acid, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10
mM; N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous
triethylamine hydrochloride, pH9.0, 80 mM; DNA-oligonucleotide sub-library pool,
1.5 µM. All coupling reactions were stirred overnight at 25 °C; residual activated
species were then quenched by addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. The
mixture was allowed to quantitatively precipitate by sequential addition of 25 µL of 1
M acetic acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol
followed by 2 h incubation at-23 ºC. The DNA was centrifuged and the resulting
oligonucleotide pellet was washed with ice-cold 90% (v/v) ethanol and then dissolved
in 100 µL water. Test coupling reactions were also performed with the reaction
conditions described above (see Chapter 3.1.2, Table 3-1); using model 42mer 5’-
Fmoc-deprotected amino acid oligonucleotide conjugates and model carboxylic acids.
92
The reactions were analysed by HPLC and the masses of the reacted oligonucleotides
detected by LC-ESI-MS. Typical HPLC coupling yields and recovery on this step
were always >51%.
5.2.4 Polymerase Klenow encoding of 200 carboxylic acids reactions. To a reaction volume of 50 µL, reagents were added to the respective final
concentrations: aqueous solution of the pool of 20 oligonucleotide conjugates coupled
with the specific carboxylic acid (see Chapter 5.2.3) 320 nM, 44mer oligonucleotide
coding oligonucleotide (DEL_O_2_n, 1<n<200: 5'-GTA GTC GGA TCC GAC CAC
XXXX XXXX GAC AAT TCA CAC ACG TCC-3', XXXX XXXX unambiguously
identifies the individual carboxylic acid compound, IBA) 600 nM, Klenow buffer
(NEB, cat.no B7002S), dNTPs (Roche, cat.no 11969064001), 0.5 mM, Klenow
Polymerase enzyme (NEB, cat.no M0210L), 5 units. The Klenow polymerization
reactions were incubated at 37 ºC for 1 h and then purified on ion-exchange cartridges
(Qiagen, cat.no 28306). The 200 purified reactions were dissolved in 50 µL of water,
each, and pooled to generate the 4000 member library (DEL4000) to a final total
oligonucleotide concentration of 300 nM.
5.2.5 Preparation of D-desthiobiotin oligonucleotide-conjugate (positive
control) D-desthiobiotin-oligonucleotide-conjugate was synthesized (DEL_O_1_21: 5’-amino-
C12—GGA GCT TGT GAA TTC TGG ATC GAG GGA CGT GTG TGA ATT
GTC-3’; underlined sequence represent coding sequence) and unambiguously
encoded (DEL_O_2_201: 5'-GTA GTC GGA TCC GAC CAC TTCA CACA GAC
AAT TCA CAC ACG TCC-3'; underlined sequence represent coding sequence ) as
described above. ESI-MS DEL_O_1_21 D-desthiobiotin conjugate: expected: 13572;
measured: 13573.
93
5.3 Library DEL 4000 selections
5.3.1 Streptavidin selection. The resulting library DEL 4000 (total oligonucleotide conjugate concentration 300
nM) was diluted 1:15 in PBS (20nM final concentration), spiked with D-desthiobiotin
oligonucleotide-conjugate (final concentration 1 pM). 50 µL of the library 20nM was
either added to 50 µl streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-
01) or to 50 µl sepharose slurry without streptavidin. Both resins were preincubated
with PBS, 0.3 mg/mL herring sperm DNA. After incubation for 1 h at 25 ºC the
mixture was transferred to a SpinX column, the supernatant was removed, and the
resin washed 4x with 400 µL PBS. After washing, the resin was resuspended in 100
µL water.
5.3.1.1 Identification of binding molecules.
The codes of the oligonucleotide-compound conjugates were amplified by PCR (total
volume 50 µL, 25 cycles of 1 min at 94 ºC, 1 min at 55 ºC, 40 s at 72 ºC) with either 5
µL of 100 fM DEL4000 library before selection as template, or 5 µL of each
resuspended resin after selection as template. The PCR primers DEL_P1_A (5’-GCC
TCC CTC GCG CCA TCA GGG AGC TTG TGA ATT CTG G-3’) and DEL_P2_B
(5’-GCC TTG CCA GCC CGC TCA GGT AGT CGG ATC CGA CCA C-3’)
additionally contain at one extremity a 19 bp domain (underlined) required for high-
throughput sequencing with the 454 Genome Sequencer system. The PCR products
were purified on ion-exchange cartridges. Subsequent high-throughput sequencing
was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform (Sequencing
service by Eurofins MWG GmbH, Ebersberg, Germany). Analyses of the codes from
high-throughput sequencing were performed by an in-house program written in C++.
The frequency of each code has been assigned to each individual pharmacophore.
5.3.1.2 Synthesis of the binding molecules as fluorescein conjugates.
In a polypropylene syringe, 50 mg (46 µmol) of O-bis-(aminoethyl)ethylene glycol
trityl resin (Novabiochem, cat.no 01-64-0235) was suspended in a mixture of the
appropriate Fmoc-protected amino acid (100 µmol, 1 mL), HBTU (Aldrich, 200
µmol, 1 mL), and DIEA (Fluka, 400 µmol, 0.5 mL) in dry DMF. After overnight
94
incubation at 25 °C, the resin was washed 6x with 2 mL dry DMF and the Fmoc
moiety was removed by addition of 1 mL piperidine (50% in dry DMF) for 1 h at 25
°C. After washing 6x with 2 mL dry DMF, the corresponding carboxylic acid (100
µmol, 1 mL DMF) was added and a further amide bond formation reaction was
performed as described above. The resulting product was cleaved by treating the resin
10x with 2 mL TFA (1% in CH2Cl2). The methylenchloride fractions were quenched
in 5 mL NaHCO3 aqsat and the water phase was back extracted 2x with 5 mL CH2Cl2.
The pooled organic phases were washed 3x with water, dried on Na2SO4 and
concentrated in vacuo. The crude product was reacted with 2 equivalents of
fluorescein isothiocyanate (800 µL of DMF) and 200 µL NaHCO3 aqsat in the dark
overnight at 25 °C. Following HPLC purification on an XTerra Prep RP18 column (5
µM, 10x150 mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the
desired fractions were collected and lyophilized. 2 mg of the fluorescein conjugates
were dissolved in DMSO as 5 mM stock solution. ESI-MS analysis confirmed the
mass of the expected FITC conjugate products: 02-78 (C45H49BrN4O10S2) m/z
expected: 949.93 measured: 951.31 [M+H+]; 07-78 (C50H59N5O12S2) m/z expected:
985.36 measured: 986.37 [M+H+]; 15-78 (C44H49N5O12S3) m/z expected: 935.25
measured: 936.25 [M+H+]; 02-107 (C47H47BrN4O9S) m/z expected: 923.87
measured: 925.12 [M+H+]; 13-40 (C46H49N5O14S) m/z expected: 927.30 measured:
928.42 [M+H+]; 11-78 (C49H58N4O13S2) m/z expected: 974.34 measured: 975.41;
17-49 (C45H49IN4O11S) m/z expected: 980.22 measured: 981.29 [M+H+]; 17-78
(C45H49IN4O10S2) m/z expected: 996.19 measured: 997.26; 16-78
(C43H48N4O10S3) m/z expected: 876.25 measured: 877.33; 15-117
(C45H45N5O12S2) m/z expected: 911.25 measured: 912.33; 02-49
(C45H49BrN4O11S) m/z expected: 932.23; measured: 933.32 [M+H+].
5.3.2 Affinity measurements. In a total volume of 60 µL, fluorescein-compound conjugates (500 nM) were
incubated with increasing amounts of streptavidin (from 10 nM to 200 µM, BIOSPA,
cat.no S002-6) or MMP3 (from 33 nM to 40 µM) in PBS, 5% DMSO, for 1 h at 25
ºC. The fluorescence polarization was determined with a TECAN Polarion instrument
by excitation at 485 nm and measuring emission at 535 nm (ε = 72000 M-1cm-1). [All
the curves were fitted applying a formula derived as following: [FC] = [FC]0 - [C] and
95
Kd = ([FC]* [P])/ [C]; substituting and solving for [C]: [C]2-([P]+ [FC]0+Kd)* [C]+
[P]*[FC]0 = 0. The solutions of the quadratic equation are:
considering that only the minus gives a meaningful solution and FP = a*[FC]+b*[C] =
a*[FC]0+(b-a)*[C], the solution of the quadratic equation can be derived in FP and
used in the fitting to determine the dissociation constant:
[FC] = fluorescein compound conjugates total molar concentration; [FC]0 =
fluorescein compound conjugates initial molar concentration (in the experiment 500
nM); [C] = concentration of the complex; [P] = protein total molar concentration; FP
= fluorescence polarization; a,b = proportionality constant; Kd = dissociation
constant]. 5.3.3 Polyclonal human IgG selection. The library DEL4000 (total oligonucleotide conjugate concentration 300 nM) was
diluted 1:15 in PBS (20 nM final concentration). 50µL of the library 20 nM was
added to 50 µl IgG-sepharose slurry. The resin was preincubated with PBS, 0.3
mg/mL herring sperm DNA (Sigma). After incubation for 1 hour at 25 °C the mixture
was transferred to a SpinX column (Corning Costar Incorporated), the supernatant
was removed, and the resin washed four times with 400 µL PBS. After washing, the
resin was resuspended in 100 µL water.
5.3.3.1 Polyclonal human IgG coating of sepharose beads.
100mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was swollen in
500 µL, 1 mM HCl, washed (10 times with 500 µL 1 mM HCl, 3 times with 500 µL
0.1 M NaHCO3aq), and mixed with 2.5 mg/ml polyclonal human IgG (Sigma-Aldrich-
Fluka, Buchs, Switzerland) dissolved in 1.2 mL 0.1 M NaHCO3aq. After 4 hour
incubation at 4°C, the slurry was repeatedly and alternatively washed with 0.1 M
NaHCO3aq 0.1 M Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4
then stored in 1 mL of PBS at 4°C.
5.3.3.2 Identification of human IgG binding molecules.
The codes of the oligonucleotide-compound conjugates were amplified by PCR (total
±
96
volume 50 µL, 25 cycles of 1 min at 94 ºC, 1 min at 55 ºC, 40 s at 72 ºC) with either 5
µL of 100 fM DEL4000 library before selection as template, or 5 µL of each
resuspended resin after selection as template. The PCR primers DEL_P1_A (5’-GCC
TCC CTC GCG CCA TCA GGG AGC TTG TGA ATT CTG G-3’) and DEL_P2_B
(5’-GCC TTG CCA GCC CGC TCA GGT AGT CGG ATC CGA CCA C-3’)
additionally contain at one extremity a 19 bp domain (underlined) required for high-
throughput sequencing with the 454 Genome Sequencer system. The PCR products
were purified on ion-exchange cartridges. Subsequent high-throughput sequencing
was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform. Analyses of
the codes from high-throughput sequencing were performed by an in-house program
written in C++. The frequency of each code has been assigned to each individual
pharmacophore.
5.3.3.3 Synthesis of affinity chromatography resin containing the compound 02-40
or 16-40.
In a polypropylene syringe, 129 mg (120 µmol) of O-bis-(aminoethyl)ethylene glycol
trityl resin (Novabiochem, cat.no 01-64-0235) was suspended in a mixture of the
appropriate Fmoc-protected amino acid (180 µmol, 1 mL), HBTU (Aldrich, 360
µmol, 1mL), and DIEA (Fluka, 720 µmol, 0.5mL) in dry DMF. After overnight
incubation at 25°C, the resin was washed 6x with 2 mL dry DMF and the Fmoc
moiety was removed by addition of 3 mL piperidine (50% in dry DMF) for 1 h at 25
°C. After washing 6x with 2 mL dry DMF, 4-(4-(1-hydroxyethyl)-2-methoxy-5-
nitrophenoxy)butanoic acid (40, 54 mg, 180 µmol, 1 mL DMF) was added and a
further amide bond formation reaction was performed as described above. The
resulting product was cleaved by treating the resin with 10x with 2 mL TFA (1% in
CH2Cl2). The dichloromethylene fractions were quenched in 5 mL NaHCO3 aqsat and
the water phase was back extracted 2x with 5mL CH2Cl2. The pooled organic phases
were washed 3 times with water, dried on Na2SO4 and concentrated in vacuo.
Following HPLC purification on an XTerra Prep RP18 column (5µM, 10x150mm)
using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired fractions
were collected and lyophilized. ESI-MS analysis confirmed the mass of the expected
products m/z: [M+H+]; 02-40 (C29H41BrN4O9) m/z expected: 668.21 measured:
669.37 [M+H+]; 16-40 (C27H40N4O9S) m/z expected: 596.25 measured: 597.12
[M+H+]. 200mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was
97
swollen in 1 mM HCl, washed, and mixed in separate tubes with 15µmol of the
compounds dissolved in 2mL 0.1M NaHCO3aq, 10% DMF. After 4 hours incubation
at 25°C, the slurry was repeatedly and alternatively washed with 0.1M NaHCO3aq 0.1
M Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4 then stored in
PBS at 4°C.
5.3.3.4 Polyclonal human IgG Cy5 labeling.
Polyclonal human IgG (Sigma-Aldrich-Fluka, Buchs, Switzerland) was labelled with
Cy5 Mono-reactive kit (Amersham, cat.no PA25001) according to the protocol of the
provider and purified over a PD10 column (GE Healthcare, cat.no 17-0851-01) as
described by the supplier.
5.3.3.5 Biotinylated polyclonal human IgG.
Polyclonal human IgG (Sigma-Aldrich-Fluka, Buchs, Switzerland) was labelled with
NHS-LC-Biotin reagent (Pierce, cat.no 21336) according to the protocol of the
provider and purified over a PD10 column (GE Healthcare, cat.no 17-0851-01) as
described by the supplier.
5.3.3.6 Affinity chromatography of CHO cells supernatant spiked with human IgG
Cy5 labeled or biotinylated human IgG on IgG binding resin.
70mg of the resin containing compound 02-40 or 16-40 were loaded on a
chromatography cartridge (Glen Research, cat.no 20-0030-00) and washed 3 times
with 1mL PBS before loading a CHO cells supernatant (60 µL) spiked with human
IgG Cy5 labeled (40 µL, 9.68 µM) or with biotinylated human IgG (30 µL, 17.2 µM).
The flow-through, the washing fractions (washing 1 time with 10 mL PBS; 1x with 10
mL 500 mM NaCl, 0.5 mM EDTA; 1x with 10 mL 100 mM NaCl, 0.1% Tween 20,
0.5 mM EDTA) and the elutate (elution 3 times with 200 µL aqueous triethylamine
100 mM) were collected and concentrated back to a final volume of 100µL by
centrifugation in a Vivaspin 500 tube (Vivascience, cat.no VS0101, cut-off 10.000
MW). The samples were then analyzed by gel electrophoresis on a NuPAGE 4-12%
Bis-Tris Gel (Invitrogen, cat.no NP0321) using MOPS SDS as running buffer and
stained with Coomassie Blue. Cy5 activity was detected by a Diana III
Chemiluminescence Detection System (Raytest) by excitation at 675 nm and
measuring emission at 694 nm (ε = 250,000 M-1cm-1). Western Blot analysis has been
98
performed transferring the proteins to NC membrane (Millipore, Billerica, MA, USA)
with the Xcell II blot module (Invitrogen) using standard procedures. The membrane
was quickly rinsed with water before soaking them twice in methanol. The membrane
was dried at room temperature for 15 min and incubated for 1 h with 1:500 dilutions
in 4% defatted milk-containing PBS of the following protein: Streptavidin-
horseradish peroxidase conjugate (HRP-Streptavidin, Amersham Biosciences, Little
Chalfont Buckinghamshire, UK, cat.no RPN1231V ). For detection of
immunoreactive bands the membrane was washed three times for 5 min with PBS and
soaked in chemiluminescent reagent (ECL1plus Western Blotting Detection System
from Amersham Biosciences) for 5 sec and exposed to BioMax films (Kodak, Hemel,
UK) in an autoradiographic cassette.
5.3.4 Human MMP3 selection. The library DEL4000 (total oligonucleotide conjugate concentration 300nM) was
diluted 1:15 in PBS (20nM final concentration). 50µL of the library 20nM was added
to 50µl MMP3-sepharose slurry. The resin was preincubated with PBS, 0.3mg/mL
herring sperm DNA (Sigma). After incubation for 1 hour at 25°C the mixture was
transferred to a SpinX column (Corning Costar Incorporated), the supernatant was
removed, and the resin washed 4x with 400µL PBS. After washing, the resin was
resuspended in 100µL water.
5.3.4.1 Human MMP3 coating of sepharose beads.
100mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was swollen in 1
mM HCl, washed, and mixed in separate tubes with 1 mg/ml polyclonal human IgG
(Sigma-Aldrich-Fluka, Buchs, Switzerland) dissolved in. After 4 hour incubation at
4°C, the slurry was repeatedly and alternatively washed with 0.1M NaHCO3aq 0.1 M
Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4 then stored in PBS
at 4°C.
99
5.3.4.2 Identification of human MMP3 binding molecules.
The codes of the oligonucleotide-compound conjugates were amplified by PCR (total
volume 50 µL, 25 cycles of 1 min at 94 ºC, 1 min at 55 ºC, 40 s at 72 ºC) with either 5
µL of 100 fM DEL4000 library before selection as template, or 5 µL of each
resuspended resin after selection as template. The PCR primers DEL_P1_A (5’-GCC
TCC CTC GCG CCA TCA GGG AGC TTG TGA ATT CTG G-3’) and DEL_P2_B
(5’-GCC TTG CCA GCC CGC TCA GGT AGT CGG ATC CGA CCA C-3’)
additionally contain at one extremity a 19 bp domain (underlined) required for high-
throughput sequencing with the 454 Genome Sequencer system. The PCR products
were purified on ion-exchange cartridges. Subsequent high-throughput sequencing
was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform. Analyses of
the codes from high-throughput sequencing were performed by an in-house program
written in C++. The frequency of each code has been assigned to each individual
pharmacophore.
5.3.4.3 Synthesis of the MMP3 binding molecules as fluorescein conjugates.
In a polypropylene syringe, 50 mg (46 µmol) of O-bis-(aminoethyl)ethylene glycol
trityl resin (Novabiochem, cat.no 01-64-0235) was suspended in a mixture of the
appropriate Fmoc-protected amino acid (100 µmol, 1 mL), HBTU (Aldrich, 200
µmol, 1 mL), and DIEA (Fluka, 400 µmol, 0.5 mL) in dry DMF. After overnight
incubation at 25 °C, the resin was washed 6x with 2 mL dry DMF and the Fmoc
moiety was removed by addition of 1 mL piperidine (50% in dry DMF) for 1 h at 25
°C. After washing 6 times with 2 mL dry DMF, the corresponding carboxylic acid
(100 µmol, 1 mL DMF) was added and a further amide bond formation reaction was
performed as described above. The resulting product was cleaved by treating the resin
10 times with 2 mL TFA (1% in CH2Cl2). The methylenchloride fractions were
quenched in 5 mL NaHCO3 aqsat and the water phase was back extracted 2 times with
5 mL CH2Cl2. The pooled organic phases were washed 3 times with water, dried on
Na2SO4 and concentrated in vacuo. The crude product was reacted with 2 equivalents
of fluorescein isothiocyanate (800 µL of DMF) and 200 µL NaHCO3 aqsat in the dark
overnight at 25 °C. Following HPLC purification on an XTerra Prep RP18 column (5
µM, 10x150 mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the
desired fractions were collected and lyophilized. 2 mg of the fluorescein conjugates
were dissolved in DMSO as 5 mM stock solution. ESI-MS analysis confirmed the
100
mass of the expected FITC conjugate products: 02-118 (C55H63BrN4O10S) m/z
expected: 1052.08 measured: 1053.30 [M+H+]; 13-17 (C49H55N5O11S) m/z
expected: 921.36 measured: 922.42 [M+H+]; 15-117 (C45H45N5O12S2) m/z
expected: 911.25 measured: 912.33 [M+H+]; 17-104 (C46H43I2N5O10S) m/z
expected: 1111.08 measured: 1112.19 [M+H+]; 18-96 (C49H45BrN4O9S) m/z
expected: 945.87 measured: 947.10 [M+H+].
5.3.5 Computational simulation The simulated distribution of number of codes represented by individual counts,
which are related to the probability that certain counts are experimentally found in a
non-biased mixture of equimolar compounds, was computed using home-written
software. The basic principle used in the simulation relies on the computer-assisted
random generation of numbers corresponding to any of the 4000 compounds in the
library. The repetition of the simulation more than once allows the computation of
fractional values for the number of codes associated to a given "count" value. For
example, a number of code-value of 0.1 corresponds to the observation of a given
"Counts" value in only 1 out of 10 simulations each performed with a total number of
counts equal to the total number of experimental sequences in a given experiment.
5.4 Stepwise coupling by selective deprotection and reaction of di-
amine derivatives.
5.4.1 DNA-compatible cleavage of different amino protective groups.
5.4.1.1 Synthesis of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid (1c).
To 4 mL dioxane/water solution (1:1) of 0.1 mmol of cis-2-
aminocyclopentanecarboxylic acid hydrochloric salt (17 mg) was added 0.1 mmol (1
eq.) of 4-pentenoic N-hydroxy succinimide ester (1e, for the synthesis see Chapter
5.4.1.4) and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir at room
temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and extracted with
ethyl acetate (5 times, 5 mL). The organic phases collected were washed with 10 mL
of brine and dried on Na2SO4. After removing of the solvent under vacuum the crudes
101
were dissolved in 1 mL of dry DMSO and used as such for the coupling to the
oligonucleotide. 1H NMR (400 MHz, CDCl3) δ = 1.73 (m, 2H), 1.82 (m, 1H), 2.10
(m, 3H), 2.32 (m, 2H), 2.41 (m, 2H), 3.15 (m, 1H), 4.55 (m, 1H), 5.10 (m, 2H), 5.82
(m, 1H), 6.37 (br, 1H) ppm. 13C NMR (100 MHz, CDCl3): δ = 22.1, 28.1, 29.6, 31.7,
35.7, 46.3, 52.2, 115.8, 136.7, 173.1, 178.0 ppm. ESI-MS 2-pent-4-enamido-cis-
cyclopentanecarboxylic acid (C11H17NO3) m/z expected: 211.12 measured: 212.07
[M+H+].
5.4.1.2 Synthesis of N-Bpoc cis-2-aminocyclopentanecarboxylic acid (1d).
Cis-2-aminocyclopentanecarboxylic acid hydroclhloric salt (17 mg, 0.1 mmol) was
dissolved in in 1 mL of water and Triton B was added (0.2 mmol, 0.1 mL, d = 0.920
g/mL, 40% MeOH). Following evaporation of MeOH under reduced preassure, DMF
(2 mL) was added to the residue and the suspension evaporated under high vacuum at
50 °C. This procedure was repeated 3 times and 5 mL DMF was then added to the
residue followed by 0.1 mmol of methyl 4-((2-(biphenyl-4-yl)propan-2-
yloxy)carbonyloxy)benzoate (Bpoc carbonate reagent, 1 eq., 39 mg). The suspension
was heated at 50 °C and stirred for 5 h, during which time the solids dissolved.
Afterwards the DMF was removed at 50 °C under reduced pressure and the residue
distributed between water (10 mL) and ether (5 mL). To facilitate the phase separation
the aqueous phase was acidified with citric acid until pH = 4 and then extracted 5
times with 5 mL of ether. The collected ether phases were washed with citric/citrate
aqueous buffer pH =4 2 times 10 mL, with water 2 times 5 mL and dried (Na2SO4).
After removing of the solvent under vacuum the crudes were dissolved in 1 mL of dry
DMSO and used as such for the coupling to the oligonucleotide. 1H NMR (400 MHz,
MeOD) δ = 1.55-1.80 (m, 12H), 2.73 (m, 1H), 3.95 (m, 1H), 7.33 (t, J = 8 Hz, 1H),
7.44 (m, 4H), 7.58 (m, 4H) ppm. 13C NMR (100 MHz, MeOD): δ = 23.2, 24.1, 29.5
(2C), 32.3, 50.8, 54.2, 81.7, 125.9, 127.8, 127.9, 128.3, 129.8, 140.8, 142.1, 147.3,
157.2, 181.7 ppm. ESI-MS N-Bpoc-cis-2-aminocyclopentanecarboxylic acid
(C22H25NO4) m/z expected: 367.18 measured: 390.04 [M+Na+].
5.4.1.3 Synthesis of N-Nvoc cis-2-aminocyclopentanecarboxylic acid (1b).
To 4 mL dioxane/water solution (1:1) of 0.1 mmol of cis-2-
aminocyclopentanecarboxylic acid hydroclhloric salt (17 mg) was added 0.1 mmol (1
eq.) of NvocCl and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir at
102
room temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and extracted
with ethyl acetate (5 times, 5 mL). The organic phases collected were washed with 10
mL of brine and dried on Na2SO4. After removing of the solvent under vacuum the
crudes were dissolved in 1 mL of dry DMSO and used as such for the coupling to the
oligonucleotide. 1H NMR (400 MHz, MeOD) δ = 1.61-1.92 (m, 6H), 2.82 (m, 1H),
3.85 (s, 3H), 3.90 (s, 3H), 3.93 (m, 1H), 5.44 (s, 2H), 7.11 (s, 1H), 7.71 (s, 1H), 7.89
(s, 1H) ppm. 13C NMR (100 MHz, MeOD): δ = 21.1, 25.3, 32.6, 46.1, 52.1, 55.8
61.3, 113.5, 128.9, 132.2, 143.3, 145.2, 159.6, 163.1, 173.8 ppm. ESI-MS confirmed
the mass of the expected products: N-Nvoc-cis-2-aminocyclopentanecarboxylic acid
(C16H20N2O8) m/z expected: 368.12 measured: 369.41 [M+H+].
5.4.1.4 Synthesis of 4-pentenoic N-hydroxy succinimide ester (1e).
N-hydroxysuccinimide (14 mmol, 1.61g) was suspended in CH2Cl2 (10mL) with
diisopropylethylamine (14 mmol). A solution of pent-4-enoyl chloride (13 mmol, d =
1.074 g/ml) in CH2Cl2 (10 mL) was added drop wise in 1 h to the suspension. The
mixture, during which time turned into a yellowish solution, was stirred for further 4
hours and then poured onto water (80 mL) and the water phase extracted with CH2Cl2
(5 times, 5 mL). The organic phase was washed 2 times with 10 mL water, dried
(Na2SO4) and concentrated under reduced pressure.
The crude product as white solid was used as such in the further reaction. 1H NMR
(400 MHz, CDCl3) δ = 2.44 (m, 2H), 2.63 (t, J = 7.9 Hz, 2H), 2.7-2.8 (m, 2H), 5.02
(dd, J1 = 20 Hz, J2 = 2 Hz, 1H), 5.64 (dd, J1 = 24 Hz, J2 = 2 Hz, 1H), 5.77 (m, 1H)
ppm. 13C NMR (100 MHz, CDCl3): δ = 25.6, 28.3, 30.3, 99.1, 116.6, 135.2, 168.0,
169.1 ppm. ESI- MS 4-pentenoic N-hydroxy succinimide ester (C9H11NO4): m/z
expected: 197.07 measured: 198.06 [M+H+].
5.4.1.5 Synthesis of Nα-Fmoc-Nε-Nvoc-lysine (2).
To 4 mL of dioxane/water solution (1:1) of 0.1 mmol of Nα-Fmoc-lysine was added
0.1 mmol of NvocCl and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir
at room temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and
extracted with ethyl acetate (5 times, 5 mL). The organic phases collected were
washed with 10 mL of brine and dried on Na2SO4. After removing of the solvent
under vacuum the crudes were dissolved in 1 mL of dry DMSO and used as such for
103
the coupling to the oligonucleotide. 1H NMR (400 MHz, d6DMSO) δ = 1.4-1.75 (m,
6H), 3.1 (m, 2H), 3.85 (s, 3H), 3.89 (s, 3H), 3.93 (m, 1H), 4.20-4.29 (m, 3H), 5.44 (s,
2H), 7.17 (s, 1H), 7.33 (t, J =8.9 Hz, 2H), 7.43 (t, J =8.9 Hz, 2H), 7.70 (s, 1H), 7.75
(d, J =8.8 Hz, 2H), 7.88 (d, J =8.9 Hz, 2H). 13C NMR (100 MHz, d6DMSO): δ = 23.6,
29.8, 32.3, 37.9, 46.6, 53.2, 57.2 (2C), 64.3, 66.6, 108.3, 120.1, 125.2, 126.8, 128.1,
129.0, 139.2, 141.3, 144.3, 147.6, 154.4, 156.7, 157.3, 174.5 ppm. ESI-MS N-Fmoc-
N’-Nvoc-lysine (C31H33N3O10) m/z expected: 607.22 measured: 608.34 [M+H+].
5.4.1.6 Oligonucleotide conjugation of Bpoc or Nvoc N-protected cis-2-
aminocyclopentanecarboxylic acid derivatives and Nα-Fmoc Nε-Nvoc-lysine.
To a reaction volume of 300 µL, containing 70% (v/v) DMSO/water were added 5 µL
either of the crude N-protected (Bpoc or Nvoc) cis-2-aminocyclopentanecarboxylic
acid derivative DMSO solution or of the crude Nα-Fmoc-Nε-Nvoc-lysine DMSO
solution and in the order the following compounds to the respective final
concentrations: N-hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-
dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous triethylamine
hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous solution, 50 µM,
(5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC TTA GGA CGT GTG TGA
ATT GTC-3’) and 1. All coupling reactions were stirred overnight at 25 °C; residual
activated species were then quenched by addition of 50 µL Tris-Cl buffer, 500 mM
pH 9.0. Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to
the reaction mixture. The reactions were then purified by HPLC and the desired
fractions were dried under reduced pressure and redissolved in 100 µL of water and
an amount (ca. 1 nmol) analyzed by LC-ESI-MS. ESI-MS N-Nvoc cis-2-
aminocyclopentanecarboxylic olgionucleotide conjugate: expected: 13676 measured:
13678; ESI-MS N-Bpoc cis-2-aminocyclopentanecarboxylic olgionucleotide
conjugate: expected: 13675 measured: 13674; ESI-MS Nα-Fmoc-Nε-Nvoc-lysine
olgionucleotide conjugate: expected: 13915 measured: 13916.
5.4.1.7 Cleavage of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid
oligonucleotide conjugate.
150 μL of a water solution 27 μM of the oligonucleotide conjugate (according to
absorption measurement at 260 nm using a NanoDrop instrument) were added to 150
104
μL of 200 mM I2 solution in THF. After 1h at room temperature the reaction was
quenched with 100 μL of aqueous 1M sodium thiosolfate and purified over HPLC
(yield 80%). The desired fractions were dried under reduced pressure and analyzed by
LC-ESI-MS, revealing the expected product. ESI-MS cis-2-
aminocyclopentanecarboxylic oligonucleotide conjugate: expected: 13437 measured:
13437.
5.4.1.8 Cleavage of N-Bpoc cis-2-aminocyclopentanecarboxylic acid.
oligonucleotide conjugate.
150 μL of a water solution 27 mM of the oligonucleotide conjugate (according to
absorption measurement at 260 nm using a NanoDrop instrument) were added to 150
μL of aqueous AcOH/AcONa pH 3-4 and heated at 35 ºC for 1h. Subsequently the
mixture was directly injected in HPLC (yield 90%). The desired fractions were dried
under reduced pressure and analyzed by LC-ESI-MS, revealing the expected product.
ESI-MS cis-2-aminocyclopentanecarboxylic oligonucleotide conjugate: expected:
13437 measured: 13438.
5.4.1.9 Cleavage of N-Nvoc cis-2-aminocyclopentanecarboxylic acid and Nα-Fmoc
Nε-Nvoc-lysine oligonucleotide conjugate.
150 μL of a water solution 27 mM of the oligonucleotide conjugate (according to
absorption measurement at 260 nm using a NanoDrop instrument) were added to 150
μL of aqueous 2mM AcOH/AcONa pH 4.7 in a pyrex glass vial and irradiated at 366
nm at 4 ºC for 30 min. Subsequently the mixture was directly injected in HPLC
(quantitative conversion). The desired fractions were dried under reduced pressure
and analyzed by LC-ESI-MS, revealing the expected product. Notably, using Nα-
Fmoc, Nε-Nvoc-lysine oligonucleotide conjugate none Fmoc cleavage was observed.
ESI-MS cis-2-aminocyclopentanecarboxylic olgionucleotide conjugate: expected:
13437 measured: 13439. ESI-MS Nε-Nvoc-lysine olgionucleotide conjugate:
expected: 13676 measured: 13679.
105
5.4.2 Synthesis of model scaffolds for Nα-Fmoc, Nε-Nvoc di-amino
carboxylic acid derivative based library.
5.4.2.1 Synthesis of (1R,3R,4R)-methyl 3-azido-4-Boc-amino-
cyclopentanecarboxylate (4).
To a solution of the alcohol 3 (1 mmol, 259 mg) in CH2Cl2 (20 mL) was added
triethylamine (3 mmol, 0.45 mL) and methanesulfonyl chloride (1.6 mmol). The
solution was stirred 45 min and then treated with water (100 mL). The water phase
was extracted with CH2Cl2 (5 times, 25 mL) and the organic extract washed with
brine (2×25 mL), dried (Na2SO4) and concentrated under reduced pressure. The crude
was dissolved in 20 mL of DMF and a solution of sodium azide (20 mL DMF) added.
The suspension was heated at 70 ºC for 8h and then quenched in water (100 mL) and
extracted in ethyl acetate (5 times, 25 mL). Subsequently the organic extract was
washed with brine (2×25 mL), dried (Na2SO4) and concentrated under reduced
pressure prior to use in the further reaction. 1H NMR (400 MHz, MeOD) δ = 1.33 (s,
9H), 1.46 (m, 1H), 1.61 (m, 2H), 2.11 (m, 2H), 2.75 (m, 1H), 3.37 (m, 1H), 3.77 (s,
3H). ESI-MS C12H20N4O4 m/z expected: 284.15 measured: 306.96 [M+Na+].
5.4.2.2 Synthesis of (1S,3R,4R)-methyl 3-amino-4-Boc-amino-
cyclopentanecarboxylate (5).
To a stirred suspension of the crude 4 (ca. 1 mmol) and Pd/C (102mg, 10% Pd) in 5
mL MeOH was added an overpressure of H2 for 3h. Subsequently catalyst was filtered
off and the MeOH removed at reduced pressure. The crude was used as such in the
further reaction. 1H NMR (400 MHz, MeOD) δ = 1.38 (s, 9H), 1.47 (m, 1H), 1.59 (m,
2H), 2.26 (m, 2H), 3.10 (m, 1H), 3.57 (m, 1H), 3.77 (s, 3H).ESI-MS C12H22N2O4
m/z expected: 258.16 measured: 259.00 [M+H+].
5.4.2.3 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Boc-amino-
cyclopentanecarboxylate (6).
FmocCl (1.25 mmol, 323 mg) and diisopropylethylamine (2.5 mmol, 0.44 mL) were
added to a solution of the crude 5 (ca. 1 mmol) in 25 mL DMF. Following 3h stirring,
the mixture was treated with water (100 mL) and the water phase extracted with ethyl
acetate (5 times, 25 mL) and the organic extract washed with brine (2 times 25 mL),
106
dried (Na2SO4) and concentrated under reduced pressure. The crude was directly used
in the next reaction. 1H NMR (400 MHz, CDCl3) δ = 1.41 (s, 9H), 1.46 (m, 1H), 1.55
(m, 2H), 2.32 (m, 2H), 3.77 (s, 3H), 3.96 (m, 1H), 4.12 (m, 1H), 4.77 (m, 1H) 5.02
(m, 2H), 7.29-7.41 (m, 4H), 7.53-7.89 (m, 4H). [M+H+].ESI-MS C27H32N2O6 m/z
expected: 480.23 measured: 481.06 [M+H+].
5.4.2.4 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Nvoc-amino-
cyclopentanecarboxylate (8).
To water/dioxane 1:1 (20 mL) sospension of the crude of 6 (ca. 1 mmol) was added
10 mL 6 N aqueous HCl and the mixured allowed stirring at 50 ºC for 5h, during
which time the solids dissolved. The solvent was then removed and the residue
dissolved again in water/dioxane 1:1 (10 mL). Following adjustment of the pH = 9 by
Na2CO3, NvocCl (1.5 mmol, 414 mg) was added and the reaction stirred for 3h at
room temperature. The mixture was then treated with water (50 mL) and the water
phase extracted with ethyl acetate (5 times, 10 mL) and the organic extract washed
with brine (2×15 mL), dried (Na2SO4) and concentrated under reduced pressure.
Preparative HPLC were performed on an XTerra Prep RP18 column (5µm,
10x150mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired
fractions were collected and lyophilized (yellowish solid, 73 mg). 1H NMR (400
MHz, MeOD) δ = 1.21 (m, 1H), 1.71 (m, 2H), 2.23 (m, 2H), 2.85 (m, 1H), 2.89 (m,
1H), 3.73 (s, 3H), 3.81 (s, 3H), 3.90-3.92 (m, 2H), 4.3 (m, 1H), 5.14 (d, J =20, 1H),
5.41 (d, J =20, 1H), 7.13-7.63 (m, 10H). ESI-MS C31H31N3O10 m/z expected:
605.20 measured: 606.02 [M+H+].
5.5 Stepwise encoding Gel electrophoresis was performed either using 15% Tris-Borate-EDTA-Urea
denaturing polyacrylamide gels (TBE-Urea, Invitrogen, cat.no EC68852) or 20%
Tris-Borate-EDTA native polyacrylamide gels (TBE, Invitrogen, cat.no EC63152)
and stained with SYBRgreenII. DNA Ethanol precipitation of DNA was performed by
adding 1/10 volumes of 3M AcOH/AcONa buffer pH 4.7, and 3 volumes of ethanol
relative to the volume of the DNA sample. After 2h incubation at -23 ºC the mixture
was centrifuged in a table-top centrifuge for 40min (16.000g) at 4 ºC, the supernatant
107
removed and the pellet washed with 300 µL ice-cold ethanol 90%. After a further 20
min centrifugation (16.000g) at 4 ºC, the pellet was dried and redissolved in water.
5.5.1 Stepwise encoding by Ligation.
Hybridization of 3 pairs (A, B, C) of oligonucleotides (A: 5’-CAT GGA ATT CGC
TCA CTC CGA CTA GAG G-3’ and 5’-(Phosphate)-CGT ACC TCT AGT CGG
AGT GAG CGA ATT CCA TG-3’; B: 5’-(Phosphate)-TAC GTG AGC TTG ACC
TGG TGA G-3’ and 5’- (Phosphate)-GCT TCT CAC CAG GTC AAG CTC A-3’; C:
5’-(Phosphate)-AAG CAC GTT CGC TGG ATC CTC AAC TGT G-3’ and 5’-CAC
AGT TGA GGA TCC AGC GAA CGT-3’; underlined sequences represent coding
sequences) was carried out by mixing the oligonucleotides at a concentration of 1.25
µM per oligonucleotide in 1x ligase buffer (40 mM Tris-HCl, 10 mM MgCl2, 10 mM
DTT, 0.5 mM ATP, pH 7.8) and incubating the mixtures for 10 minutes at 50 °C.
Subsequently the ligations were performed mixing 10 µl of hybridized
oligonucleotide pairs A and B with 10 µl of 1x ligase buffer and 1 µL of T4 ligase
(Roche Applied Science, Basel, Switzerland), and incubated at 25 ºC for 2 hours. The
ligation product was purified using a Qiagen Nucleotide Removal Kit, and eluted with
50 µl of 10 mM Tris-HCl pH 8.0. 18 µl of the eluate was mixed with 10 µl of
hybridized oligonucleotide pair C (which was present in excess), 2 µl of 10x ligase
buffer, and 1 µl of T4 ligase, and incubated for 2 hours at 25 ºC. Aliquots of the two
starting oligonucleotides, and the different ligation products were subjected to
electrophoresis on a 20% TBE gel.
5.5.2 Stepwise encoding by a combination of Klenow polymerase and Ligation.
To a reaction volume of 50 µL, reagents were added to the respective final
concentrations: a 42mer 5’-amino-C12-DNA-oligonucleotide (5’-GGA GCT TGT
GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’), 2 µM, a 42mer 5’-
C6-biotinylated-oligonucleotide containing the non-palindromic BssSI restriction site
(in boldface type) (5‘-GTA GTC GGA CAC GAG TAC TGG TAA TCG ACA ATT
CAC ACA CGT CC–3‘; underlined sequences represent coding sequences), 3 µM,
klenow buffer, dNTPs (Roche, cat.no 11969064001), 0.5 mM, and Klenow
Polymerase enzyme, 5 units. After incubation at 37 ºC for 1 h, the reaction mixture
was purified on ion-exchange cartridge and eluted in 25 µL of water. 8 units of BssSI
enzyme were added to the purified Klenow product in 50 µL of BssSI restriction
108
buffer. The restriction cutting reaction was carried out at 37 ºC for 1.5 h. 50 µL of
streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-01) was added and the
slurry was incubated for 30 min at 4 ºC. After SpinX column centrifugation, the
supernatant was collected and purified on ion-exchange cartridge and eluted in 25 µL
of water. Subsequently were added the following reagents to the final volume of 50
µL: preincubated mixture 1:1 of hybridized oligonucleotides (27mer 5`-phosphate-
TCG TGA AAT TTG CTA GGA TCC ATA TTG–3` and 23mer 5`-CAA TAT GGA
TCC TAG CAA ATT TC–3`), 3 µM, T4 ligase buffer (Roche Applied Science, Basel,
Switzerland) and T4 ligase (Roche Applied Science, Basel, Switzerland), 4 units. The
ligation was performed overnight at 16ºC then purified on ion-exchange cartridge.
Aliquots of the starting oligonucleotides, and the different Klenow, restriction and
ligation products were analyzed on a 15% TBE-Urea gel. Sequencing of the excised
band after three stepwise encoding confirmed the identity of the expected product.
5.5.3 Stepwise encoding by Klenow Polymerase.
To a reaction volume of 50 µL, reagents were added to the respective final
concentrations: a 42mer 5’-amino-C12-DNA-oligonucleotide (5’-GGA GCT TGT
GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’), 2 µM, 42mer 3’-
C6-biotinylated-oligonucleotide (5’-GTA GTC GGA TCC GAC CAC GTT CCT
GAC AAT TCA CAC ACG TCC-3’; underlined sequences represent coding
sequences), 3 µM, Klenow buffer, dNTPs (Roche, cat.no 11969064001), 0.5mM,
Klenow Polymerase enzyme, 5 units. The Klenow polymerization reaction was
incubated at 37 ºC for 1 h, purified on ion-exchange cartridge and eluted in 100 µL of
4 M urea. After incubating at 94 ºC for 2 min, 50 µl of streptavidin-sepharose slurry
(GE Healthcare, cat.no 17-5113-01) were added and the slurry was incubated for 1 h
at 4 ºC. The streptavidin sepharose resin and the supernatant were separated by
centrifugation in a SpinX column. The DNA in the supernatant was ethanol
precipitated as described above. The resulting single-stranded oligonucleotide was
mixed with a 42mer unmodified DNA oligonucleotide (5’-GTC GTA TCG CCA
TGG TCC AAC ATC GTA GTC GGA GAG GAC CAC-3’) and a Klenow
polymerization reaction was performed as described above. Aliquots of the three
starting oligonucleotides, and the different Klenow products were applied on a 15%
TBE-Urea gel.
109
5.5.4 Stepwise coupling and encoding of model compound for Nα-Fmoc, Nε-Nvoc
di-amino carboxylic acid derivative based library.
To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds
were added to the respective final concentrations in the order: N-Fmoc-N’-Nvoc-
lysine (2) DMSO solution, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N-
ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous
triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous
solution, 100 µM, (5’-amino-C12-GGA GCT TGT GAA TTC TGG GTT AGT GGA
CGT GTG TGA ATT GTC-3’, underlined sequence represent coding sequence). The
reaction was stirred overnight at 25 °C; residual activated species were then quenched
and simultaneously Fmoc deprotected by addition of piperidine (500 mM in DMSO).
Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to the
reaction mixture. The reaction was then purified by HPLC and the desired fractions
were dried under reduced pressure and redissolved in 100 µL of water and analyzed
by LC-ESI-MS. The sample showed the expected Fmoc-deprotected product N’-
Nvoc-lysine DNA-oligonucleotide conjugate. Subsequently a further peptide forming
reaction step was performed. Therefore to a final volume of 310 µL, containing 70%
(v/v) DMSO/water, the following compounds were added to the respective final
concentrations: 3-p-tolylpropanoic acid DMSO solution, 4 mM; N-
hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-
carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride, pH9.0, 80 mM;
50 µM N’-Nvoc-lysine DNA-oligonucleotide conjugate. The reaction was stirred
overnight at 25 °C; residual activated species were then quenched by addition of 50
µL Tris-Cl buffer, 500 mM pH 9.0. The mixture was allowed to quantitatively
precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL of 3 M sodium
acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation at-23 ºC. The
DNA was centrifuged and the resulting oligonucleotide pellet was washed with ice-
cold 90% (v/v) ethanol and then dissolved in 300 µL of aqueous 1mM AcOH/AcONa
pH 4.7 in a pyrex glass vial. Following irradiation at 366 nm at 4 ºC for 30 min, an
aliquot of the mixture was injected in HPLC and the desired fractions analyzed by
LC-ESI-MS, revealing the expected product (3-p-tolylpropanoyl)-lysine
oligonucleotide conjugate. The encoding of the 3-p-tolylpropanoyl moiety was
achieved adding the following reagents to a final volume of 50 µL to the respective
110
final concentrations: (3-p-tolylpropanoyl)-lysine oligonucleotide conjugate (5’-GGA
GCT TGT GAA TTC TGG GTT AGT GGA CGT GTG TGA ATT GTC-3’,
underlined sequence represent coding sequence), 2 µM, a 42mer 5’-C6-biotinylated-
oligonucleotide containing the non-palindromic BssSI restriction site (in boldface
type) (5‘-GTA GTC GGA CAC GAG TAC TGG TAA TCG ACA ATT CAC ACA
CGT CC–3‘; underlined sequences represent coding sequences), 3 µM, klenow
buffer, dNTPs (Roche, cat.no 11969064001), 0.5 mM, and Klenow Polymerase
enzyme, 5 units. After incubation at 37 ºC for 1 h, the reaction mixture was purified
on ion-exchange cartridge and eluted in 50 µL of water. Subsequently the encoded 40
µL (3-p-tolylpropanoyl)-lysine oligonucleotide conjugate was coupled to Cy5 by
addition of Cy5-NHS ester (Amersham, cat.no PA25001) and aqueous triethylamine
hydrochloride solution, pH 9.0, to a final concentration of 4 mM and 80 mM
respectively. The reaction was stirred overnight at 25 °C. The mixture was then
allowed to quantitatively precipitate by sequential addition of 25 µL of 1 M acetic
acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2
h incubation at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide
pellet was washed with ice-cold 90% (v/v) ethanol. Ultimately, the encoding of the
Cy5 moiety was performed. Therefore 8 units of BssSI enzyme were added to the N’-
Cy5-N-(3-p-tolylpropanoyl)-lysine oligonucleotide conjugate in 50 µL of BssSI
restriction buffer. The restriction cutting reaction was carried out at 37 ºC for 1.5 h. 50
µL of streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-01) was added
and the slurry incubated for 30 min at 4 ºC. After SpinX column centrifugation, the
supernatant was collected and purified on ion-exchange cartridge and eluted in 25 µL
of water. Subsequently were added the following reagents to the final volume of 50
µL: preincubated mixture 1:1 of hybridized oligonucleotides (27mer 5`-phosphate-
TCG TGA AAT TTG CTA GGA TCC ATA TTG–3` and 23mer 5`-CAA TAT GGA
TCC TAG CAA ATT TC–3`, underlined sequences represent coding sequences), 3
µM, T4 ligase buffer (Roche Applied Science, Basel, Switzerland) and T4 ligase
(Roche Applied Science, Basel, Switzerland), 4 units. The ligation was performed
overnight at 16ºC then purified on ion-exchange cartridge. Aliquots of the starting
oligonucleotides, and the different Klenow, restriction and ligation products were
analyzed on a 15% TBE-Urea gel. Sequencing of the excised band after three
stepwise encoding confirmed the identity of the expected product.
111
5.5.5 Bacterial cloning and sequencing.
Following gel polyacrylamide electrophoresis the band of interest was excised,
extracted in aqueous TrisCl 10 mM and PCR amplified using the following
oligonucleotides as primer: DEL_P1 (5’-GGA GCT TGT GAA TTC TGG-3’,
underlined EcorI restriction site) and DEL_P2 (5’-GTA GTC GGA TCC GAC CAC-
3’, underlined BamHI restriction site). The PCR products were purified on ion-
exchange cartridges and cloned in pUC19 vector using the restriction sites EcorI and
BamHI and electroporated in TG1 bacteria. Sequencing of the vector in a number of
colonies was performed using an ABI PRISM 3130 Genetic Analyzer (Applied
Biosystem).
112
6. REFERENCES 1 E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin, K.
Devon, K. Dewar and M. Doyle et al., Initial sequencing and analysis of the
human genome, Nature 409 (2001),860–921.
2 J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton,
H.O. Smith, M. Yandell and C.A. Evans et al., The sequence of the human
genome, Science 291 (2001), 1304–1351.
3 S.D. Patterson and R.H. Aebersold, Proteomics: the first decade and beyond,
Nat. Genet. 33 (2003) (Suppl.), 311–323.
4 Robert L. Strausberg and Stuart L. Schreiber, From Knowing to Controlling:
A Path from Genomics to Drugs Using Small Molecule Probes, Science 300
(2003), 294-295.
5 Stoughton RB, Applications of DNA Microarrays in Biology, Annu Rev
Biochem 74 (2005), 53-82
6 Drews J. Drug discovery: a historical perspective. Science 287 (2000), 1960-
1964
7 A. Furka, F. Sebestyen, M. Asgedom and G. Dibo, General method for rapid
synthesis of multicomponent peptide mixtures, Int. J. Pept. Protein Res. 37
(1991), 487–493.
8 R.A. Houghten, C. Pinilla, S.E. Blondelle, J.R. Appel, C.T. Dooley and J.H.
Cuervo, Generation and use of synthetic peptide combinatorial libraries for
basic research and drug discovery, Nature 354 (1991), 84–86.
9 K.S. Lam, S.E. Salmon, E.M. Hersh, V.J. Hruby, W.M. Kazmierski and R.J.
Knapp, A new type of synthetic peptide library for identifying ligand-binding
activity, Nature 354 (1991), 82–84.
113
10 R.B. Merrifield, Solid phase peptide synthesis. I. The synthesis of a
tetrapeptide, J. Am. Chem. Soc. 85 (1963), 2149–2154.
11 R. Frank, W. Heikens, G. Heisterberg-Moutsis and H. Blocker, A new general
approach for the simultaneous chemical synthesis of large numbers of
oligonucleotides: segmental solid supports, Nucl. Acids Res. 11 (1983), 4365–
4377.
12 R.A. Houghten, General method for the rapid solid-phase synthesis of large
numbers of peptides: specificity of antigen-antibody interaction at the level of
individual amino acids, Proc. Natl. Acad. Sci. U. S. A 82 (1985), 5131–5135.
13 G.P. Smith, Filamentous fusion phage: novel expression vectors that display
cloned antigens on the virion surface, Science 228 (1985), 1315–1317.
14 T. Clackson, H.R. Hoogenboom, A.D. Griffiths and G. Winter, Making
antibody fragments using phage display libraries, Nature 352 (1991), 624–
628.
15 E.T. Boder and K.D. Wittrup, Yeast surface display for screening
combinatorial polypeptide libraries, Nat. Biotechnol. 15 (1997), 553–557.
16 J. Hanes and A. Pluckthun, In vitro selection and evolution of functional
proteins by using ribosome display, Proc. Natl. Acad. Sci. U. S. A 94 (1997),
4937–4942.
17 J. Bertschinger, D. Grabulovsky, D. Neri, Selection of single domain binding
proteins by covalent DNA display, Protein Eng Des Sel 20 (2007), 57-68.
18 S. Brenner and R.A. Lerner, Encoded combinatorial chemistry, Proc. Natl.
Acad. Sci. U. S. A. 89 (1992), 5381–5383.
114
19 J. Nielsen, S. Brenner and K.D. Janda, Synthetic methods for the
implementation of encoded combinatorial chemistry, J. Am. Chem. Soc. 115
(1993), 9812–9813.
20 M.C. Needels, D.G. Jones, E.H. Tate, G.L. Heinkel, L.M. Kochersperger, W.J.
Dower, R.W. Barrett and M.A. Gallop, Generation and screening of an
oligonucleotide-encoded synthetic peptide library, Proc. Natl. Acad. Sci. U. S.
A. 90 (1993), 10700–10704.
21 Meo T, Gramsch C, Inan R, Hollt V, Weber E, Herz A, Riethmuller G,
Monoclonal antibody to the message sequence Tyr-Gly-Gly-Phe of opioid
peptides exhibits the specificity requirements of mammalian opioid receptors,
Proc. Natl. Acad. Sci. U. S. A. 80 (1983), 4084-4088.
22 Mukund S. Chorghade, Drug Discovery and Development - Combinatorial
Chemistry in the Drug Discovery Process ISBN: 9780471398486 Ed. 2006
John Wiley & Sons, Inc., 129-167
23 K. FitzGerald, In vitro display technologies – new tools for drug discovery,
Drug Discov. Today 5 (2000), 253–258.
24 Pedersen, H., Gouilaev, A.H., Sams, K.C., Slok, F.A., Freskgard, P.-O.,
Holtmann, A., Kampmann Olsen, E., Husemoen Gitte, N., Felding, J., et al.,
2002. Methods for template-directed synthesis of and modification of
polymers and screening for desired activity. WO02103008.
25 Pedersen, H., Holtmann, A., Franch, T., Gouliaev, A.H., Felding, J., 2003.
Methods for template-directed synthesis of and modification of polymeric
libraries and their use in screening for biological activity. WO03078625.
26 Freskgard, P.-O., Franch, T., Gouliaev, A.H., Lundorf, M.D., Felding, J.,
Olsen, E.K., Holtmann, A., Jakobsen, S.N., Sams, C., et al., 2004.
Bifunctional substances and their use in preparation and enzyme-based
encoding of combinatorial libraries. WO2004039825.
115
27 Morgan, B., Hale, S., Arico-Muendel, C.C., Clark, M., Wagner, R., Israel,
D.I., Gefter, M.L., Benjamin, D., Hansen, N.J.V., et al., 2004. Methods and
building blocks for synthesis of combinatorial libraries of mols. comprising
functional moieties operatively linked to encoding oligonucleotides.
WO2005058479.
28 D.R. Halpin and P.B. Harbury, DNA display I. Sequence-encoded routing of
DNA populations, PLoS Biol. 2 (2004), 1015–1021.
29 D.R. Halpin and P.B. Harbury, DNA display. II. Genetic manipulation of
combinatorial chemistry libraries for small-molecule evolution, PLoS Biol. 2
(2004), 1022–1030.
30 T. Meo, C. Gramsch, R. Inan, V. Hollt, E. Weber, A. Herz and G. Riethmuller,
Monoclonal antibody to the message sequence Tyr-Gly-Gly-Phe of opioid
peptides exhibits the specificity requirements of mammalian opioid receptors,
Proc. Natl. Acad. Sci. U. S. A 80 (1983),. 4084–4088.
31 D.R. Halpin, J.A. Lee, S.J. Wrenn and P.B. Harbury, DNA display III. Solid-
phase organic synthesis on unprotected DNA, PLoS Biol. 2 (2004), 1031–
1038.
32 C.A. Lipinski, F. Lombardo, B.W. Dominy and P.J. Feeney, Experimental and
computational approaches to estimate solubility and permeability in drug
discovery and development settings, Adv. Drug Deliv. Rev. 46 (2001), 3–26.
33 Z.J. Gartner and D.R. Liu, The generality of DNA-templated synthesis as a
basis for evolving non-natural small molecules, J. Am. Chem. Soc. 123 (2001),
6961–6963.
34 C.T. Calderone, J.W. Puckett, Z.J. Gartner and D.R. Liu, Directing otherwise
incompatible reactions in a single solution by using DNA-templated organic
synthesis, Angew Chem., Int. Ed. Engl. 41 (2002), 4104–4108.
116
35 D. Summerer and A. Marx, DNA-templated synthesis: more versatile than
expected, Angew Chem., Int. Ed. Engl. 41 (2002), 89–90.
36 X. Li and D.R. Liu, DNA-templated organic synthesis: nature's strategy for
controlling chemical reactivity applied to synthetic molecules, Angew Chem.,
Int. Ed. Engl. 43 (2004), 4848–4870.
37 M.W. Kanan, M.M. Rozenman, K. Sakurai, T.M. Snyder and D.R. Liu,
Reaction discovery enabled by DNA-templated synthesis and in vitro
selection, Nature 431 (2004), 545–549.
38 Z.J. Gartner, M.W. Kanan and D.R. Liu, Multistep small-molecule synthesis
programmed by DNA templates, J. Am. Chem. Soc. 124 (2002), 10304–10306.
39 T.M. Snyder and D.R. Liu, Ordered multistep synthesis in a single solution
directed by DNA templates., Angew Chem., Int. Ed. Engl. 44 (2005), 7379–
7382.
40 J.B. Doyon, T.M. Snyder and D.R. Liu, Highly sensitive in vitro selections for
DNA-linked synthetic small molecules with protein binding affinity and
specificity, J. Am. Chem. Soc. 125 (2003), 12372–12373.
41 Z.J. Gartner, B.N. Tse, R. Grubina, J.B. Doyon, T.M. Snyder and D.R. Liu,
DNA-templated organic synthesis and selection of a library of macrocycles,
Science 305 (2004), 1601–1605.
42 Mannocci L., Zhang Y., Scheuermann J., Leimbacher M., De Bellis G., Rizzi
E., Dumelin C., Melkko S, Neri D., High-throughput sequencing allows the
identification of binding molecules isolated from DNA-encoded chemical
libraries, Proc. Natl. Acad. Sci. U. S. A. 105(46), (2008), 17670-17675.
43 Buller F., Mannocci L., Zhang Y., Dumelin C.E., Scheuermann J., Neri D.,
Design and synthesis of a novel DNA-encoded chemical library using Diels-
Alder cycloadditions, Bioorg Med. Chem. Lett. 18(22), (2008), 5926-5931.
117
44 Margulies M., et al., Genome sequencing in microfabricated high-density
picolitre reactors. Nature, 437(7057) (2005), 376-80.
45 S.C. Schuster, Nat. Methods 5 (1) (2008), 16-18.
46 K. Hoogsteen, The crystal and molecular structure of a hydrogen-bonded
complex between 1-methylthymine and 9-methyladenine. Acta
Crystallographica 16 (1963), 907-916.
47 S. Melkko, J. Scheuermann, C.E. Dumelin and D. Neri, Encoded self-
assembling chemical libraries, Nat. Biotechnol. 22 (2004), 568–574.
48 Cheng Y.K., Pettitt B.M., Stabilities of double- and triple-strand helical
nucleic acids. Prog Biophys Mol Biol. 58(3) (1992), 225-257.
49 Aich P., Ritchie S., Bonham K., Lee J.S., Thermodynamic and kinetic studies
of the formation of triple helices between purine-rich deoxyribo-
oligonucleotides and the promoter region of the human c-src proto-oncogene.
Nucleic Acids Res. 26(18) (1998), 4173-4177.
50 S. Melkko, C.E. Dumelin, J. Scheuermann, D. Neri, On the magnitude of the
chelate effect for the recognition of proteins by pharmacophores scaffolded by
self-assembling oligonucleotides. Chem Biol. 13(2) (2006), 225-231.
51 M. Lovrinovic and C.M. Niemeyer, DNA microarrays as decoding tools in
combinatorial chemistry and chemical biology, Angew Chem., Int. Ed. Engl.
44 (2005), 3179–3183.
52 M. Uttamchandani, D.P. Walsh, S.Q. Yao and Y.T. Chang, Small molecule
microarrays: recent advances and applications, Curr. Opin. Chem. Biol. 9
(2005), 4–13.
53 S. Melkko, J. Sobek, G. Guarda, J. Scheuermann, C.E. Dumelin and D. Neri,
Encoded self-assembling chemical libraries, Chimia 59 (2005), 798–802.
118
54 Dumelin C.E., Scheuermann J., Melkko S., Neri D., Selection of streptavidin
binders from a DNA-encoded chemical library. Bioconjug Chem., 17(2)
(2006), 366-70.
55 Dumelin CE, Trüssel S, Buller F, Trachsel E, Bootz F, Zhang Y, Mannocci L,
Beck SC, Drumea-Mirancea M, Seeliger MW, Baltes C, Müggler T, Kranz F,
Rudin M, Melkko S, Scheuermann J, Neri D. A portable albumin binder from
a DNA-encoded chemical library. Angew Chem Int Ed Engl. 47(17) (2008);
3196-201.
56 Melkko S, Zhang Y, Dumelin CE, Scheuermann J, Neri D., Isolation of high-
affinity trypsin inhibitors from a DNA-encoded chemical library. Angew Chem
Int Ed Engl. 46(25) (2007), 4671-4674.
57 Scheuermann J, Dumelin CE, Melkko S, Zhang Y, Mannocci L, Jaggi M,
Sobek J, Neri D., DNA-Encoded Chemical Libraries for the Discovery of
MMP-3 Inhibitors. Bioconjug Chem. 19(3) (2008), 778-785.
58 Melkko S., Neri D., 2002 Encoded Self-Assembling Chemical Libraries,
WO/2003/076943
59 Michael J. Heller, DNA-microarray Technology: Devices, Systems, and
Applications. Annu. Rev. Biomed. Eng., 4 (2002). 129–153.
60 Southern, E.M., Detection of specific sequences among DNA fragments
separated by gel electrophoresis. J Mol Biol., 98 (1975), 503-517.
61 Kulesh D.A., Clive D.R., Zarlenga D.S., Greene J.J., Identification of
interferon-modulated proliferation-related cDNA sequences. Proc Natl Acad
Sci USA, 84 (1987), 8453–8457.
62 Schena M., Shalon D., Davis R.W., Brown P.O., Quantitative monitoring of
gene expression patterns with a complementary DNA microarray. Science 270
(1995), 467–470.
119
63 Lashkari D.A., DeRisi J.L., McCusker J.H., Namath A.F., Gentile C., Hwang
S.Y., Brown P.O., Davis R.W., Yeast microarrays for genome wide parallel
genetic and gene expression analysis. Proc Natl Acad Sci USA 94 (1997),
13057–13062.
64 Shendure, J., Mitra, R.D., Varma, C., Church G.M., Advanced sequencing
technologies: methods and goals. Nat. Rev. Genet. 5 (2004), 335–344.
65 Sanger, F. , Nicklen, S. & Coulson, A. R. DNA sequencing with chain-
terminating inhibitors. Proc. Natl Acad. Sci. USA, 74 (1977), 5463–5467.
66 Prober, J. M. et al. A system for rapid DNA sequencing with fluorescent
chain-terminating dideoxynucleotides. Science, 238 (1987), 336–341.
67 Nyren, P., Pettersson, B. & Uhlen, M. Solid phase DNA minisequencing by an
enzymatic luminometric inorganic pyrophosphate detection assay. Anal.
Biochem. 208 (1993), 171–175.
68 Ronaghi, M. et al. Real-time DNA sequencing using detection of
pyrophosphate release. Anal. Biochem. 242 (1996), 84–89.
69 Jacobson, K. B. et al. Applications of mass spectrometry to DNA sequencing.
GATA 8 (1991), 223–229.
70 Bains, W. & Smith, G. C. A novel method for nucleic acid sequence
determination. J. Theor. Biol. 135 (1988), 303–307.
71 Jett, J. H. et al. High-speed DNA sequencing: an approach based upon
fluorescence detection of single molecules. Biomol. Struct. Dynam. 7 (1989),
301–309.
72 M. Margulies, M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bemben,
J. Berka, M.S. Braverman and Y.J. Chen et al. Genome sequencing in
microfabricated high-density picolitre reactors, Nature 437 (2005), 376–380.
120
73 J. Shendure, G. J. Porreca, N.B. Reppas, X. Lin, J.P. McCutcheon, A.M.
Rosenbaum, M. D. Wang, K. Zhang, R.D. Mitra, G.M. Church. Accurate
Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science,
309(5741) (2005), 1728 – 1732.
74 http://solid.appliedbiosystems.com/ - Applied Biosystems' SOLiD technology.
75 http://www.illumina.com/
76 Braslavsky I., Hebert H., Kartalov E., Quake S.R.. Sequence information can
be obtained from single DNA molecules. Proc. Natl Acad. Sci. USA, 100
(2003), 3960–3964.
77 M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, P. Nyren. Real-time
DNA sequencing using detection of pyrophosphate release. Anal. Biochem.,
242 (1996), 84-89.
78 S. C. Macevicz, US Patent 5750341, filed 1995
79 Droege, M., The Genome Sequencer FLXTM System – Longer reads, More
applications, Straight forward bioinformatics & More Complete Data Sets. J.
Biotechnol. (2007), in press.
80 http://www.helicosbio.com/
81 Mitra R.D., Shendure J., Olejnik J., Olejnik E.K., Church G.M., Fluorescent in
situ sequencing on polymerase colonies. Anal. Biochem., 320(1) (2003), 55-
65.
82 Huber C.G., Oberacher H., Analysis of nucleic acids by on-line liquid
chromatography-mass spectrometry. Mass Spectrom Rev. 20(5) (2001), 310-
343.
121
83 Klenow H., Henningsen I., Selective Elimination of the Exonuclease Activity
of the Deoxyribonucleic Acid Polymerase from Escherichia coli B by Limited
Proteolysis. Proc Natl Acad Sci 65 (1970), 168–175.
84 Silacci M., Brack S., Schirru G., Mårlind J., Ettorre A., Merlo A., Viti F., Neri
D., Design, construction, and characterization of a large synthetic human
antibody phage display library. Proteomics, 5(9) (2005), 2340–2350.
85 Janeway, Travers, Walport, Shlomchik, Immunobiology, 6th Ed. (2005)
Churchill Livingstone.
86 Walsh G., Biopharmaceutical benchmarks 2006. Nature Biotechnology, 24(7)
(2006), 769-776.
87 Liotta L. A. et al. Metastatic potential correlates with enzymatic degradation
of basement membrane collagen. Nature, 284 (1980), 67–68.
88 Brinckerhoff C. E., Matrisian, L. M., Matrix metalloproteinases: a tail of a
frog that became a prince. Nature Rev. Mol. Cell Biol. 3 (2002), 207–214.
89 Coussens L. M., Fingleton B., Matrisian, L. M., Matrix metalloproteinase
inhibitors and cancer: trials and tribulations. Science 295, 2387–2392 (2002).
90 Egeblad M., Werb Z., New functions for the matrix metalloproteinases in
cancer progression. Nature Rev. Cancer. 2, (2002), 163–175.
122
7. Curriculum Vitae
Luca Mannocci
Wolfgang-Paulistrasse 10
ETH Zürich, HCI G398
CH-8093 Zürich
Switzerland
Tel.: +41 44 63 37 453
Fax.: +41 44 63 31 358
Personal Details
Name Luca Mannocci
Date of birth 07th of September 1979
Citizen Zürich
Nationality Italian
Civil state Unmarried
Address: Kolbenacker 34
8052 Zürich
Switzerland
Tel.: +41 43 44 39 900
Mobile: +41 76 43 76 485
123
Education
2005 – 2008
ETH Swiss Federal Institute of Technology
Zürich, Switzerland. Ph.D. in Sciences
2004 Italian State exam for the habilitation to
chemistry profession
1998 - 2004 Università degli Studi di Pisa Pisa, Italy.
Degree in Chemistry (Mark: 110/110 e lode).
1998
Liceo Scientifico Statale “Ulisse Dini”
(Scientific Lyceum), Pisa, Italy. High school
diploma (Mark: 60/60).
124
Research Experience
SWISS FEDERAL INSTITUTE OF TECHNOLOGY (ETH)
2005-2008 Doctoral student, Institute of Pharmaceutical Sciences.
Ph.D. Thesis “DNA-encoded Chemical Libraries”
Advisor: Prof. Dr. Dario Neri
UNIVERSITY OF PISA
2004-2005 Internship, Organic Chemistry Division of the Department of Chemistry
at the University of Pisa
Silver(I)-catalysed
protiodesilylation
Advisor: Prof. Adriano Carpita and
Prof. Renzo Rossi
Collaboration for natural science book
publication
Prof. R. Rossi, Prof. A. Carpita, Prof.
F. Bellina, “Sostanze organiche
naturali e loro derivati da analoghi
strutturali con proprietà
antineoplastiche”, (2005), Ed. Plus.
(“Natural occurring substances and
structural analogues with anti-
neoplastic properties”).
125
2002-2004 Training in the Organic Chemistry Division of the Department of
Chemistry at the University of Pisa
Diploma Thesis
“La prima sintesi totale del (-)-
nitidone, una sostanza naturale con
proprietà antitumorali, e del suo
enantiomero” (“First total synthesis of
naturally-occurring (-)-nitidon and its
enantiomer”).
Advisor: Prof. Adriano Carpita
Languages Italian Native speaker English
Fluent
German
Basic knowledge
Hungarian
Basic knowledge
126
Publications and Patents
• Mannocci L., Zhang Y., Scheuermann J., Leimbacher M., De Bellis G., Rizzi E., Dumelin C.E., Melkko S., Neri D., “High-throughput sequencing allows the identification of binding molecules isolated from DNA-encoded chemical libraries”. PNAS, 2008, 105(46), 17670-17675.
• Mannocci L., Neri D., Melkko S., “DNA-Encoded Chemical Libraries”.
SCREENING - Trends in Drug Discovery, 2009, 10, 16-18. • Mannocci L., Melkko S., Neri D., DNA-encoded chemical libraries. US
Patent Application 2008 No 61/008,249. • Scheuermann J., Dumelin C.E., Melkko S., Zhang Y., Mannocci L., Jaggi
M., Sobek J., Neri D., “DNA-encoded chemical libraries for the discovery of MMP-3 inhibitors”. Bioconjug Chem., 2008, 19(3), 778-785.
• Dumelin C.E., Trüssel S., Buller F., Trachsel E., Bootz F., Zhang Y.,
Mannocci L., Beck S.C., Drumea-Mirancea M., Seeliger M.W., Baltes C., Müggler T., Kranz F., Rudin M., Melkko S., Scheuermann J., Neri D. “A portable albumin binder from a DNA-encoded chemical library”. Angew Chem Int Ed Engl., 2008, 47(17), 3196-3201.
• Buller F., Mannocci L., Zhang Y., Dumelin C.E., Scheuermann J., Neri D., “Design and synthesis of a novel DNA-encoded chemical library using Diels-Alder cycloadditions”. Bioorg Med Chem Lett. 2008
• A. Carpita, L. Mannocci, R. Rossi, “Silver(I)-catalysed Protiodesilylation
of 1-(Trimethylsilyl)-1-alkynes”. Eur. J. Org. Chem. 2005, 12, 1367-1377.
• F. Bellina, A. Carpita, L. Mannocci, R. Rossi “First total synthesis of
naturally-occurring (-)-nitidon and its enantiomer”, Eur. J. Org. Chem. 2004, 12, 2610-2619.
127
Poster Presentations
• L. Mannocci, Y. Zhang, J. Scheuermann, M. Leimbacher, G. De Bellis, E. Rizzi, C. Dumelin, S. Melkko, D. Neri, “Novel Strategies for the Synthesis, Selection and Decoding of DNA-Encoded Chemical Libraries” - Molecular Medicine Tri-Conference, San Francisco, USA, 25th - 28th March 2008.
• A. Carpita, L. Mannocci, R. Rossi, XVII Convegno Nazionale della
Divisione di Chimica Farmaceutica della Società Chimica Italiana, Pisa (PI), Italy, 6th – 10th September 2004.
• A. Carpita, L. Mannocci - poster and flash communication, XXIX
Convegno Nazionale della Divisione Chimica Organica, Potenza (PZ), Italy, 31st August – 4th September 2004.
• M. Biagetti, A. Carpita, L. Mannocci, R. Rossi “Studi sulla sintesi del
(-)-nitidone”, communication, VI Convegno Nazionale “Giornate di Chimica delle Sostanze Naturali”, Vietri sul Mare (SA), Italy, 29th September – 1st October 2003. Acts pag. 3.
128
8. ACKNOWLEDGMENTS
First of all, I would like to express my sincere gratitude to my PhD advisor Professor
Dr. Dario Neri for giving me the privilege to pursue my doctoral studies in his
laboratory. In every discussion, I constantly perceived a brilliant intellect beyond all
his answers, as well as an enormous wisdom in his questions. I especially appreciated
his success focus attitude towards research and I was impressed by his excellent and
ubiquitous scientific knowledge, inexhaustible source of creativity and curiosity.
I would like to thank Dr. Yixin Zhang, Dr. Jörg Scheuermann and Dr. Samu Melkko.
Their constant scientific and personal support inspired me in the day-by-day
experiments and was absolutely essential for the accomplishment of this work. Dr.
Yixin Zhang was an invaluable help in the design and in the set up of all the
bioinformatic tools. His expertise and precious advice were often crucial for the
synthesis and the assembling of the library. Dr. Jörg Scheuermann introduced me into
the field of DNA-encoded chemistry and to the laboratory life. I constantly benefit
from his priceless critical input and exciting discussions. Samu Melkko was a brilliant
support for the selection procedures and a very big help with the gene cloning and the
radioactive selections.
Further thanks go to the “chemistry team”: Christoph Dumelin, Fabian Buller, Jean-
Paul Gapian, Sabrina Trüssel, Madalina Jaggi and Ilona Molnàr for their priceless
support and for the numerous daily, open, controversial and stimulating scientific (and
non-scientific) discussions, which created the constant extraordinary enjoyable
atmosphere, typical of the room G398. Special thanks go to Markus Leimbacher for
helping me on the assembling of the library over his Master Thesis and for his heroic
efforts for the set up of the encoding strategies and of the selection procedures.
I greatly acknowledge my co-examiner Prof. Karl-Heinz Altmann for thoroughly
proof-reading my Thesis for all his valuable input.
Additionally, I am most grateful to Gianluca De Bellis and Ermanno Rizzi from the
Institute for Biomedical Technologies of Milan, who enabled us to use “454” high-
throughput sequencing technology, by providing the platform.
129
Finally, an enormous hug goes to all present and former members (and friends) of
Professor Dario Neri’s group, who contributed to the pleasant atmosphere and
provided me advices, friendship, vitality and for many years a home away from home.
Without you the winter here would have certainly been much colder and the clouds
not so bright. A smile lasts only for while, but in the memory can be forever: you will
always have a special place in the treasure of my heart.
Un credito non solo di gratitudine ma soprattutto di affetto e stima lo devo al Prof.
Adriano Carpita. La sua sincera amicizia mi ha accompagnato e supportato in tutti
questi anni. M’insegnò, con quell’arte che oggi è mio mestiere, come spesso la
piacevole scoperta è più frutto della tenace ingegnosa pazienza che di qualsiasi altro
talento.
Un ringraziamento e un abbraccio non possono certo mancare per il mio amico,
compagno di sogni grandiosi e viaggi avventurosi Dott. Dario Lombardi. Con vino,
parole e allegria mi ha sempre aiutato a buttar giù le pillole più amare e a lasciare gli
errori in fondo al boccale. Ti auguro di cuore di mietere presto tutto quel successo che
in questi anni hai seminato con il tuo talento.
Voglio inoltre esprimere la mia più sincera gratitudine a tutti i miei amici “vecchi” e
“nuovi” per il loro supporto e per la loro vicinanza nella lontananza. E poiché mi pare
ingiusto nominarne alcuno quanto nessuno allora li riporto tutt’e centomila: Alessio
Catarsi, Luca Mantilli, Mirko Sardelli, Enrico Marsili, Roberto Scamuzzi, Sandro
Orsini, Luca Reggiani, Giorgio La Corte, Francesco Attuali, Andrea Scarpellini,
Silvia Anthoine Dietrich, Giulio Casi, Stefania Capone, Andrea Chicca, Cesare
Borgia e tutti coloro che non trovano spazio qua, ma di sicuro lo hanno tra i miei
ricordi. Brindo a voi amici di oggi, amici di ieri, amici, spero, di sempre!
Un riconoscimento speciale va a mio Padre, a mia Madre, ai miei Nonni e a tutti i
miei cari oggi presenti e a tutti quelli che purtroppo non posso essere qui con me a
festeggiare questo traguardo. Questo lavoro è dedicato a voi che durante questa mia
storia, con amore e pazienza, mi avete sostenuto e incoraggiato giorno per giorno a
130
inseguire i miei sogni, a vedere aquiloni là dove c’erano soltanto nuvole. Un posto
d’onore sarà sempre per voi nello scrigno del mio cuore.
Infine, un immenso abbraccio e un ringraziamento speciale lo devo ad Anita. E’ stata
la mia scorta di sole quando l’inverno sembrava infinito e il mio più potente antidoto
contro i più diversi vapori di questo laboratorio.
Forse è vero, la vita e i sogni sono fogli di uno stesso libro: leggerli in ordine è vivere,
sfogliarli a caso è sognare, ma questo libro parla comunque di te. Sei tutto quello che
so sulla felicità. Grazie!
131
9. APPENDIX
9.1 Model compound oligonucleotide conjugates.
Fmoc-amino acid (A) DEL_O_1 Conjugated
(5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC
TTA GGA CGT GTG TGA ATT GTC-3’) Yield*) % 97 Recovery†) % 53
N H F m ocO
H O
ESI-MS (Da) (in brackets is the expected MS)
13474 (13473)
Yield*) % 90 Recovery†) % 65
O O H
F m oc N H
ESI-MS (Da) (in brakets the expected MS) 13439 (13437)
Yield*) % 73 Recovery†) % 55
O
O H
F m oc N H ESI-MS (Da) (in brakets the expected MS) 13457 (13459)
Table 9-1: HPLC coupling yields and recovery assessed after peptide bond formation reaction between three
selected Fmoc-amino acids (A) and a model 5’-amino-oligonucleotide (DEL_O_1). *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis
spectrophotometer) following HPLC purification (see Chapter 3.1.6).
132
Table 9-2: HPLC coupling yields and recovery assessed after peptide bond formation reaction between three
selected 5’-Fmoc-deprotected amino acids (A) oligonucleotide conjugated and four different model carboxylic
acids (B). **) In the row are schematically represented the structures of the Fmoc-deprotected amino acids (A)
oligonucleotide conjugated, while in the column, the structures of the model carboxylic acids (B). *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis
spectrophotometer) following HPLC purification (see Chapter 3.1.6). ‡) In brackets is reported the calculated MS for the oligonucleotide conjugated product.
N H 2
O
H N
D N A
OHN
H 2 N
D NA
O
N H
H 2 N
D NA
Structures**)
Yield*) %
Recovery†)
%
ESI-MS‡)
(Da)
Yield*) %
Recovery†)
%
ESI-MS‡)
(Da)
Yield*) %
Recovery†)
%
ESI-MS‡)
(Da)
O H
NHO H
H N
HS
O
98 90
13670
(13699) 83 68
13663
(13663) 65 60
13685
(13685)
O
H O
O I
70 60
13732
(13733) 72 60
13694
(13697) 76 65
13721
(13719)
O
O H
N
>70 70
13647
(13644) >64 64
13609
(13608) >57 57
13632
(13630)
O
H O
Br
>52 52
13670
(13670) >55 55
136332
(13634) >51 51
13657
(13656)
133
9.2 Library synthesis overview
List of the 20 Fmoc-amino acids and of the oligonucleotide codes used as initial building
block for constructing DEL4000:
DEL_Cn Structure Name MW Coding sequence
HPLC Yield*)
%
ESI-MS†) (Da)
1
(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(pyridin-4-yl)propanoic acid
388.42 ATCTTA 97 13474 (13474)
2
3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(4-bromophenyl)butanoic acid
480.35 GCTGCG 70 13610 (13608)
3
(1R,2S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)cyclopentanecarboxylic acid
351.4 AGAACG 86 13497 (13496)
4
3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(pyridin-2-yl)propanoic acid
433.52 GACATC 53 13528 (13529)
5
(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(3-fluorophenyl)propanoic acid
405.43 ATTACT 64 13491 (13491)
( S)
( R)
COOH
NHFmoc
N
(S)
FmocHNCOOH
Br
(S )
N HF m oc
C O O H
( S)H O O C
N H F m o c
F
MeS
NHFmoc
HOOC
134
6
(1S,4R)-4-(((9H-fluoren-9-yl)methoxy)carbonylamino)cyclopent-2-enecarboxylic acid
349.38 ACGGCA 72 13467 (13470)
7
(R)-3-(4-((((9H-fluoren-9-yl)methoxy)carbonylamino)methyl)phenyl)-2-(tert-butoxycarbonylamino)propanoic acid
516.58 AGAGAA 60 13586 (13685)
8
Acetic acid, [[5-[[(9H-fluoren-9-ylmethoxy)carbonyl]amino]-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-2-yl]oxy]
505.56 TCCAAA 87 13588 (13585)
9
(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(thiazol-4-yl)propanoic acid
394.44 TCGATC 75 13482 (13481)
10
(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(1-benzyl-1H-imidazol-4-yl)propanoic acid
467.52 TCCGGC 60 13553 (13555)
11
5-(4-((((9H-fluoren-9-yl)methoxy)carbonylamino)methyl)-3,5-dimethoxyphenoxy)pentanoic acid
505.56 CGTGCA 53 13618 (13617)
(S) ( R)
( Z)
NHFmocHOOC
(R )
HOOC
BocHN
FmocHN
NHFmoc
O
HOOC
S
N
( S)
HOOC
NHFmoc
N
N( S)
HOOC
NHFmoc
Ph
FmocHN
OMeMeO
O(CH2)4
HOOC
135
12
(R)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(4-chlorophenyl)propanoic acid
421.88 GGGTAA 80 13598 (13598)
13
(R)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)hex-5-ynoic acid
349.38 CCCTCC 98 13358 (13357)
14
(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4,4-diphenylbutanoic acid
477.55 TCTCCA 70 13521 (13524)
15
(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-2-(phenylsulfonamido)propanoic acid
466.51 CAAGCT 80 13564 (13562)
16
(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(thiophen-3-yl)butanoic acid
407.48 GCACTG 43 13520 (13519)
17
(S)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(4-iodophenyl)butanoic acid
527.35 ACGAAT 64 13648 (13647)
(R)NHFmoc
COOH
HOOC(S)
NHFmoc
PhPh
SHN
OO
(S)HOOC
NHFmoc
S( S)
NHFmoc
COOH
I
(S)
NHFmoc
COOH
( R)H O O C
N H F m o cC l
136
18
(R)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(naphthalen-2-yl)butanoic acid
451.51 TATCAG 80 13562 (13562)
19
(R)-3-(((9H-fluoren-9-yl)methoxy)carbonylamino)-4-(naphthalen-1-yl)butanoic acid
451.51 TGAAAT 62 13587 (13586)
20
(S)-2-(((9H-fluoren-9-yl)methoxy)carbonylamino)-3-(4-hydroxyphenyl)propanoic acid
403.43 GTTAGT 56 13543 (13545)
Table 9-3: List of the 20 Fmoc-amino acids and of the oligonucleotide codes used as initial building block for
constructing DEL4000: *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) In brackets is reported the calculated expected MS for the oligonucleotide conjugated product Fmoc-
deprotected.
(R)NHFmoc
COOH
(R)
COOH
FmocHN
( S)H O O C
N H F m o cO H
137
List of the 200 carboxylic acids and of the oligonucleotide codes used as second building block for the
construction of DEL4000:
Num. Structure Formula structure MW Coding
sequence Supplier
1
C13H8N2O4S2 320.34771 TTTTTTTT SALOR
2
C12H9NO5S 279.27323 GGGGTTTT SALOR
3
C7H7NO3S 185.20274 CCCCTTTT SALOR
4
C10H11NO4 209.20347 AAAATTTT SALOR
5
C6H7N3O4 185.14039 ACGTGTTT SALOR
6
C6H6ClN3O4 219.58542 CATGGTTT SALOR
N
N
S
O
S
O
O
OH
N
S O
O
O
OOH
SN
O
O
OH
N+
O
O
OOH
CH3
NN
N+
O
O
O
OH
CH3
NN
N+
O
OCl
O
CH3
OH
138
7
C8H15NO8 253.21065 GTACGTTT SALOR
8
C6H6O2S 142.17752 TGCAGTTT ALDRICH
9
C7H8O2S 156.20461 GACTCTTT ALDRICH
10
C8H10O2S 170.2317 TCAGCTTT ALDRICH
11
C16H29NO5 315.41323 AGTCCTTT SIGMA
12
C13H25NO5 275.3479 CTGACTTT FLUKA
13
C13H25NO5 275.3479 CGATATTT FLUKA
NO O
OHOH
OH
OH
OHOH
HH
H
HH
H
S O
OH
S
O
OH
S
O
OH
N O
O
O
OH
OH
CH3
CH3
CH3
N O
O
O
OH
OH
CH3
CH3CH3
CH3
CH3
Chiral
NO
O
O
OH
OH
CH3
CH3CH3
CH3 CH3
Chiral
139
14
C14H25NO6 303.35845 ATCGATTT FLUKA
15
C14H19NO4 265.31183 TAGCATTT FLUKA
16
C15H21NO4 279.33892 GCTAATTT FLUKA
17
C16H23NO4 293.36601 CAGTTGTT FLUKA
18
C21H25NO4 355.4377 ACTGTGTT FLUKA
19
C15H20ClNO4 313.78395 TGACTGTT FLUKA
20
C15H20ClNO4 313.78395 GTCATGTT FLUKA
N
O
O
OO
O
OH
CH3
CH3
CH3CH3
CH3
CH3
Chiral
NO
O
O OH
CH3 CH3
CH3
Chiral
N
O
O O
OH
CH3
CH3
CH3
Chiral
NO
O
O
OH
CH3CH3
CH3
CH3
Chiral
N
O
O O
OH
CH3CH3
CH3
NO
O
O
OH
Cl
CH3CH3
CH3
NO
O
O
OH
Cl
CH3
CH3CH3
140
21
C19H23NO4 329.39946 GGTTGGTT FLUKA
22
C16H20N2O4 304.3488 TTGGGGTT FLUKA
23
C16H20N2O4 304.3488 AACCGGTT FLUKA
24
C16H20F3NO4 347.3373 CCAAGGTT FLUKA
25
C16H20F3NO4 347.3373 ATATCGTT FLUKA
26
C16H20F3NO4 347.3373 CGCGCGTT FLUKA
27
C16H20F3NO4 347.3373 GCGCCGTT FLUKA
NO
O
O
OH
CH3
CH3
CH3
OH
ONO
CH3
CH3
CH3O
CN
OH
ONO
CH3
CH3
CH3O
CN
N
O
O O
F
FF
OH
CH3
CH3CH3 Chiral
OH
ONO
CH3
CH3
CH3O
CF3
NO
O
O
F
FF
OH
CH3CH3
CH3
Chiral
NO
O
O
F F
F
OH
CH3
CH3
CH3
Chiral
141
28
C14H20N2O4 280.3265 TATACGTT FLUKA
29
C16H20N2O4 304.3488 TCCTAGTT FLUKA
30
C21H25NO4 355.4377 GAAGAGTT FLUKA
31
C7H16ClNO2 146.21107 CTTCAGTT ALDRICH
32
C7H16ClNO3 162.21047 AGGAAGTT FLUKA
33
C13H28ClNO4 260.354 AGCTTCTT SIGMA
34
C17H36ClNO4 316.462 CTAGTCTT SIGMA
N
O
O
N
O
OH
CH3
CH3
CH3Chiral
OH
ONO
CH3
CH3
CH3O
CN
NO
O
O
OH
CH3
CH3
CH3
N+
O
OH
CH3CH3
CH3
Cl
N+ O
OHOHCH3 CH3
CH3
Cl
N
O
O
O
OH
CH3
CH3CH3
CH3
ClH
N
O
O
O
OH
CH3
CH3CH3
CH3
ClH
142
35
C6H11NO4 161.15887 GATCTCTT ALDRICH
36
C8H7NO5 197.14869 TCGATCTT ALDRICH
37
C10H11NO4 209.20347 TAATGCTT FLUKA
38
C9H8N2O5 224.17451 GCCGGCTT ALDRICH
39
C10H10N2O5 238.2016 CGGCGCTT ALDRICH
40
C13H17NO7 299.28294 ATTAGCTT FLUKA
41
C4H7NaO3 104.10739 CCTTCCTT FLUKA
N+
O
O O
OH
N+
OO
O
O
OH
N+
O
O
O
OH
N+
N
O
O
O
O
OH
N+
O
O
O O
N OH
N+
O
O
O
OHO
OOH
CH3 CH3
O
OOH
CH3Na
+
143
42
C4H7NaO3 104.10739 AAGGCCTT FLUKA
43
C5H10O3 118.13365 TTCCCCTT FLUKA
44
C7H13NO4 175.18596 GGAACCTT SALOR
45
C9H10O4 182.17765 GTGTACTT FLUKA
46
C10H12O5 212.20414 TGTGACTT FLUKA
47
C12H16O5 240.25832 ACACACTT FLUKA
48
C12H20O3 212.2914 CACAACTT ALDRICH
OO
OH
CH3
Chiral
Na+
O
OHOH
CH3
CH3
O
O
NH2
OH
OH
CH3
CH3
O
OOH
OH
O
O
OH
O
OH
CH3
O
OH
O
O
OH
CH3
O
OHCH3
CH3CH3
OH
144
49
C8H14O4 174.19838 GCATTATT ALDRICH
50
C18H22O6 334.37244 TACGTATT SALOR
51
C11H14N4O4 266.25863 ATGCTATT SIGMA
52
C14H11NO2 225.24927 CGTATATT FLUKA
53
C15H13NO2 239.27636 CTCTGATT FLUKA
54
C11H14N4O2S 266.32383 AGAGGATT SALOR
55
C18H17NO2 279.34169 TCTCGATT ALDRICH
O
O
OHO CH3
O
O
O
CH3
O
OH
O
CH3
CH3
N
N
N
N
O
O
O
CH3
CH3 OH
N
O
OH
N
O OH
N
NN
N
O
S OH
CH3
N
O
CH3
OH
H
145
56
C3H7N3O2 117.10814 GAGAGATT ALDRICH
57
C5H11N3O2 145.16232 TGGTCATT FLUKA
58
C4H10ClNO2 139.5828 GTTGCATT ALDRICH
59
C5H10O2 102.13425 CAACCATT ALDRICH
60
C10H18O2 170.25376 ACCACATT ALDRICH
61
C16H22O3 262.35194 AATTAATT SALOR
62
C5H8O3 116.11771 CCGGAATT FLUKA
NH
N
O
NH2OH
NH
N
O
NH2OH
NHO
NNH2
OHNH
O
NNH2
OH
O
OHCH3
O
OH
O
O
OH
O O
OH CH3
O
NO H
CH 3
CH 3
HC l
146
63
C7H10O3 142.15595 GGCCAATT ALDRICH
64
C8H15NO3 173.21365 TTAAAATT ALDRICH
65
C5H6N2O4 158.11457 GGTTTTGG ALDRICH
66
C13H14N2O4 262.26753 TTGGTTGG ALDRICH
67
C13H17N5O5 323.31094 AACCTTGG SIGMA
68
C6H6N2O2S 170.19092 CCAATTGG ALDRICH
69
C16H14O2S 270.35278 CAGTGTGG SALOR
O
O
OH
O
O
NOH
CH3
N
NO
OO
OH
N
N
O
O
O
OH
NN
N
N
NO
O
O
O
OH
NN
N
N
NO
O
O
O
OH
N
N S
O
OH
S
OOH
147
70
C9H10O2S 182.24285 ACTGGTGG ALDRICH
71
C9H8O2S2 212.29091 TGACGTGG ALDRICH
72
C9H10O2S 182.24285 GTCAGTGG ALDRICH
73
C9H10O2S 182.24285 TCCTCTGG ALDRICH
74
C9H10O2S 182.24285 GAAGCTGG ALDRICH
75
C8H7ClO2S 202.66079 CTTCCTGG ALDRICH
76
C12H10O2S 218.2763 AGGACTGG FLUKA
O
SOH
S
S
O
OH
OS
OH
CH3
O
S
OH
CH3
S
O
OH
CH3
SO
OHCl
S
O
OH
148
77
C3H6O2S 106.14407 ATATATGG ALDRICH
78
C8H14O3S 190.26298 CGCGATGG ALDRICH
79
C9H9FO2 168.16928 GCGCATGG ALDRICH
80
C10H9FO3 196.17983 TATAATGG SIGMA
81
C9H8F2O2 186.15971 ACGTTGGG ALDRICH
82
C9H6F4O2 222.14057 CATGTGGG ALDRICH
83
C10H9F3O2 218.17723 GTACTGGG ALDRICH
O
OHS
CH3
O
O
SOH
CH3
O
OH
F
O
O
OH
F
O
F
F OH
O
F
F
F
F
OH
OF
F
F
OH
149
84
C11H8F6O2 286.17561 TGCATGGG ALDRICH
85
C11H8F6O2 286.17561 TTTTGGGG ALDRICH
86
C9H7F3O3 220.14954 GGGGGGGG ALDRICH
87
C8H7ClO2 170.59679 CCCCGGGG ALDRICH
88
C9H9ClO2 184.62388 AAAAGGGG ALDRICH
89
C10H9ClO3 212.63443 CGATCGGG ALDRICH
90
C9H9ClO3 200.62328 ATCGCGGG ALDRICH
O
FF
F
F
F F
OH
O
F
FF
F
F
F
OH
OO
F
F
F
OH
O
Cl
OH
O
OH
Cl
O
O
OH
Cl
O
O
OH
Cl
CH3
150
91
C8H6Cl2O3 221.04122 TAGCCGGG ALDRICH
92
C13H12Cl2O4 303.14419 GCTACGGG SIGMA
93
C19H16ClNO4 357.79667 GACTAGGG SIGMA
94
C12H9ClO5 268.65553 TCAGAGGG ALDRICH
95
C8H7BrO2 215.04779 AGTCAGGG FLUKA
96
C8H7BrO2 215.04779 CTGAAGGG FLUKA
97
C8H7BrO2 215.04779 CTCTTCGG FLUKA
O
O
ClCl
OH
O
O
O
ClCl
CH2
OH
CH3
N
O
O
CH3
OH
Cl
OCH3
O O
Cl
O
O
CH3
OH
O
Br
OH
O
OH
Br
O
OHBr
151
98
C9H9BrO2 229.07488 AGAGTCGG ALDRICH
99
C8H7BrO3 231.04719 TCTCTCGG ALDRICH
100
C10H9BrO3 257.08543 GAGATCGG ALDRICH
101
C8H7IO2 262.04819 GCATGCGG ALDRICH
102
C10H11IO2 290.10237 TACGGCGG SIGMA
103
C8H7IO3 278.04759 ATGCGCGG ALDRICH
104
C9H8INO3 305.07341 CGTAGCGG FLUKA
O
OH
Br
O
OOH
Br
O
O
OH
Br
O
OHI
O
OH
I
O
OOH
I
N
O
O
I
OH
152
105
C6H7NO3 141.12759 AATTCCGG ALDRICH
106
C3H4N4O2 128.09093 CCGGCCGG ALDRICH
107
C10H12O2 164.20594 GGCCCCGG ALDRICH
108
C11H14O2 178.23303 TTAACCGG ALDRICH
109
C10H12O2 164.20594 TGGTACGG FLUKA
110
C10H12O2 164.20594 GTTGACGG FLUKA
111
C15H12O2 224.26169 CAACACGG ALDRICH
NO
O
OH
CH3
N
N
N
N
O
OH
O
OH
CH3
O
OH
CH3
CH3CH3
O OH
CH3
Chiral
O
OH
CH3
Chiral
O
OH
153
112
C7H8ClNO2 173.60031 ACCAACGG FLUKA
113
C10H10O3 178.1894 TAATTAGG ALDRICH
114
C11H12O3 192.21649 GCCGTAGG ALDRICH
115
C11H12O3 192.21649 CGGCTAGG ALDRICH
116
C8H8O3 152.15116 ATTATAGG ALDRICH
117
C9H10O3 166.17825 AGCTGAGG ALDRICH
118
C18H28O3 292.42206 GATCGAGG ALDRICH
N O
OH
ClH
O
O
OH
OO
OH
OO
OH
CH3
O
OOH
OO
OH
CH3
O
O
OH
CH3
CH3CH3
CH3
CH3
CH3 O
O
OH
CH3
CH3CH3
CH3
CH3
CH3
154
119
C9H10O3 166.17825 TCGAGAGG ALDRICH
120
C9H10O3 166.17825 GTGTCAGG FLUKA
121
C9H10O3 166.17825 TGTGCAGG FLUKA
122
C9H10O3 166.17825 ACACCAGG FLUKA
123
C10H12O4 196.20474 CACACAGG ALDRICH
124
C11H14O5 226.23123 AAGGAAGG FLUKA
125
C11H14O5 226.23123 TTCCAAGG FLUKA
O
OH
OH
O
OH
O CH3
O
OH
OCH3
O
OH
OCH3
O
OH
O
O
CH3
CH3
O
OO
OH
O
CH3
CH3
CH3
O
O
O
O
OHCH3
CH3
CH3
155
126
C10H12O3 180.20534 GGAAAAGG ALDRICH
127
C10H12O3 180.20534 CCTTTTCC FLUKA
128
C10H12O3 180.20534 AAGGTTCC FLUKA
129
C11H14O4 210.23183 TTCCTTCC ALDRICH
130
C11H14O4 210.23183 GTGTGTCC ALDRICH
131
C12H16O5 240.25832 TGTGGTCC FLUKA
132
C16H16O4 272.30352 ACACGTCC SALOR
O
OH
OCH3
O
OH OCH3
O
OHO
CH3
O
OH
O
O
CH3
CH3
O
OH
O
O
CH3
CH3
OO
O
OOH
CH3
CH3
CH3
O O
OH
OCH3
156
133
C12H12O4 220.22704 CACAGTCC ALDRICH
134
C12H10O5 234.2105 AGCTCTCC ALDRICH
135
C16H14O3 254.28818 CTAGCTCC ALDRICH
136
C9H9NO3 179.17698 GATCCTCC ALDRICH
137
C10H11NO3 193.20407 TCGACTCC ALDRICH
138
C10H11NO3 193.20407 TAATATCC ALDRICH
139
C10H11NO3 193.20407 GCCGATCC ALDRICH
O
O
OHOCH3
O O
O
OOH
CH3
O
OOH
N
O
O
OH
N
O
O
OH
CH3
N
O
O
OH
CH3
N
O
O
OH
CH3
157
140
C10H7NO4 205.17159 CGGCATCC FLUKA
141
C9H8NNaO4 217.15821 ATTAATCC ALDRICH
142
C10H11NO3 193.20407 TGGTTGCC SIGMA
143
C11H13NO3 207.23116 GTTGTGCC FLUKA
144
C12H15NO3 221.25825 CAACTGCC ALDRICH
145
C12H15NO3 221.25825 ACCATGCC ALDRICH
146
C16H17NO3 271.31879 AATTGGCC ALDRICH
OO
O
O
NH2
Na+
N
O
O
O
OH
O O
N OH
N
O
O
OH
CH3
N
OO
OH
CH3H
N
OO
OH
CH3H
N O OOH
CH3H
158
147
C16H17NO3 271.31879 CCGGGGCC ALDRICH
148
C11H11NO4 221.21462 GGCCGGCC ALDRICH
149
C20H21N3O6 399.40687 GCATCGCC SIGMA
150
C5H7ClN2O2 162.57674 TACGCGCC FLUKA
151
C7H8N2O4 184.15281 ATGCCGCC ALDRICH
152
C7H7NO4 169.13814 CGTACGCC ALDRICH
153
C9H9NO4 195.17638 CTCTAGCC ALDRICH
N O OOH
CH3H
N
O
O
O OH
NNO
NO
O
O
O
OH
N
N
O
OH
ClH
N
NO
O
O
CH3
OH
NO
O
O
OH
ON
O
O
OH
H
H
159
154
C3H5N4NaO2S 184.1527 AGAGAGCC ALDRICH
155
C9H10O2 150.17885 TCTCAGCC ALDRICH
156
C9H10O2 150.17885 GAGAAGCC ALDRICH
157
C11H14O2 178.23303 GACTTCCC SALOR
158
C9H10O2 150.17885 TCAGTCCC ALDRICH
159
C9H10O2 150.17885 AGTCTCCC ALDRICH
160
C10H12O2 164.20594 CTGATCCC ALDRICH
O OH
CH3
O
OH
CH3
O
OH
CH3
O
OH
CH3
O
OH
O
OH
CH3
NN
N
NS
O
O N a
160
161
C10H12O2 164.20594 CGATGCCC ALDRICH
162
C10H10O2 162.19 ATCGGCCC ALDRICH
163
C12H10O2 186.2123 TAGCGCCC FLUKA
164
C12H10O2 186.2123 GCTAGCCC ALDRICH
165
C14H12O2 212.25054 TTTTCCCC ALDRICH
166
C15H14O2 226.27763 GGGGCCCC FLUKA
167
253.256 CCCCCCCC
O
OH
CH3
O
OH
O
OH
O
OH
O
OH
O
OH
N
OH
O
O
161
168
154.139 AAAACCCC
169
202.208 ACGTACCC
170
170.138 CATGACCC
171
204.244 GTACACCC
172
202.208 TGCAACCC
173
C10H8N2O3 204.18686 ATATTACC SALOR
174
C11H10N2O4 234.21335 CGCGTACC SALOR
175
C10H12N4O4 252.23154 GCGCTACC SALOR
F
OH
O
OOH
O
OF
OH
O
N+
O
CH3 CH3CH3
OH
OCH3
O
Cl
O
OH
O
N
N
O
O
OH
N
N O
O
O OH
N
N N
NO
O
CH3
O
CH3
OH
162
176
C12H16O2 192.26012 TATATACC SALOR
177
C8H8O3S 184.214 TCCTGACC
178
C16H15ClN4O4S 394.83935 GAAGGACC SALOR
179
C16H12N2O3 280.28564 CTTCGACC SALOR
180
C6H7N3O4S 217.20439 AGGAGACC SALOR
181
C11H10N2O3 218.21395 CAGTCACC SALOR
182
C15H14O5 274.27583 ACTGCACC SALOR
O
OH
CH3
CH3
CH3
S
O
OH
O
NN
NN
SO
O
O
CH3
CH3
Cl
OH
N
N
O
O
OH
N
NN
O
S
OO
OH
N
N O
O
OH
O O
O
OOH
163
183
C11H14O2 178.23303 TGACCACC SALOR
184
C12H14N2O2 218.25758 GTCACACC SALOR
185
C10H8N2O3 204.18686 GGTTAACC SALOR
186
C9H8ClNO3 213.62201 TTGGAACC SALOR
187
C10H8F3NO3 247.17536 AACCAACC SALOR
188
C18H13ClO6 360.75371 CCAAAACC SALOR
189
C11H14N4O2S 266.32383 AATTTTAA SALOR
O
OH
CH3CH3
N
N
O
OH
CH3
N
N
O
O
OH
N
O
O
OHCl
N
O O
F
FF
OH
O
O
O
O
OOH
CH3 Cl
N
N
N
N
O
S OH
CH3
164
190
C10H3Cl4NO4 342.95171 CCGGTTAA SALOR
191
C10H9NO3S 223.25213 TTAATTAA SALOR
192
C9H8O2 148.16 TGGTGTAA ALDRICH
193
C10H8N2O3 204.184 GTTGGTAA
194
C10H12O2 164.21 CAACGTAA FLUKA
195
C11H14O2 178.23 ACCAGTAA Alfa Aesar
196
C9H9IO2 276.08 CTCTCTAA Trans World
Chemicals
197
C8H13NO3 171.195 AGAGCTAA
N
O
O
Cl
Cl
Cl
Cl
O
OH
N
S
O
OOH
H
OH
O
N
N
O
OH
O
OH
O
I
OH
O
N
O
OH
O
O H
O
165
198
C9H11NO4S 229.255 TCTCCTAA
199
C10H13O2N 179.2 GAGACTAA Aldrich
200
C7H13NO2S2 207.31516 GCATATAA SALOR
SN
O O
CH3OH
O
O
OH
NH2
SS
NO
OH
CH3
CH3