bioinformatics and biocomputing - seoul national university · expression patterns during the cell...
TRANSCRIPT
Bioinformatics and Biocomputing
Byoung-Tak Zhang
Center for Bioinformation Technology (CBIT) &
Biointelligence Laboratory
School of Computer Science and Engineering
Seoul National University
http://bi.snu.ac.kr/ or http://cbit.snu.ac.kr/
2
Outline
! Bioinformation Technology (BIT)
! DNA Chip Data Mining: IT for BT
! DNA Computing: BT for IT
! DNA Computing with DNA Chips
! Outlook
3
Human Genome Project
GenomeHealthImplications
A New
Disease
Encyclopedia
New Genetic
Fingerprints
New
Diagnostics
New
Treatments
Goals• Identify the approximate 40,000 genes
in human DNA• Determine the sequences of the 3 billion
bases that make up human DNA• Store this information in database• Develop tools for data analysis• Address the ethical, legal and social
issues that arise from genome research
4
Bioinformation Technology:Bioinformatics vs. Biocomputing
BTIT
Bioinformatics
Biocomputing
5
Bioinformatics
6
What is Bioinformatics?
! Bioinformatics vs. Computational Biology
! Bioinformatik (in German): Biology-based computerscience as well as bioinformatics (in English)
Informatics – computer science
Bio – molecular biology
Bioinformatics – solving problems arising frombiology using methodology from computerscience.
7
Molecular Biology: Flow ofInformation
DNA RNA Protein Function
���
��������������������
���
�������
�����
�����
���������
8
DNA (Gene) RNA Protein
�������������
����
�����
���������
����
�������������
�������������
����
������������� �������������
����
�����
����������� ������
�’ ��� �’ ���
9
Nucleotide and Protein Sequence
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
����������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
����������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
����������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
����������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
���������������������
����������������������
���������������������
���������������������
���������������������
���������������������
���������������������
��������������� ��������������������������������
DNA (Nucleotide) Sequence
CG2B_MARGL Length: 388 April 2, 1997 14:55 Type: P Check:
9613 .. 1
�������������� � ������������ �������
ARNNLQAGAK KELVKAKRGM TKSKATSSLQ SVMGLNVEPMEKAKPQSPEP MDMSEINSAL EAFSQNLLEG VEDIDKNDFDNPQLCSEFVN DIYQYMRKLE REFKVRTDYM TIQEITERMRSILIDWLVQV HLRFHLLQET LFLTIQILDR YLEVQPVSKN
KLQLVGVTSM LIAAKYEEMY PPEIGDFVYI TDNAYTKAQIRSMECNILRR LDFSLGKPLC IHFLRRNSKA GGVDGQKHTMAKYLMELTLP EYAFVPYDPS EIAAAALCLS SKILEPDMEWGTTLVHYSAY SEDHLMPIVQ KMALVLKNAP TAKFQAVRKKYSSAKFMNVS TISALTSSTV MDLADQMC
Protein (Amino Acid) Sequence
10
Some Facts
! 1014 cells in the human body.
! 3 × 109 letters in the DNA code in every cell inyour body.
! DNA differs between humans by 0.2% (1 in 500bases).
! Human DNA is 98% identical to that ofchimpanzees.
! 97% of DNA in the human genome has no knownfunction.
11
Topics in Bioinformatics
Structure analysis4 Protein structure comparison4 Protein structure prediction4 RNA structure modeling
Pathway analysis4 Metabolic pathway4 Regulatory networks
Sequence analysis4 Sequence alignment4 Structure and function prediction4 Gene finding
Expression analysis4 Gene expression analysis4 Gene clustering
12
Extension of BioinformaticsConcept! Genomics
4Functional genomics4Structural genomics
! Proteomics: large scaleanalysis of the proteins ofan organism
! Pharmacogenomics:developing new drugs thatwill target a particulardisease
! Microarray: DNA chip,protein chip
13
Applications of Bioinformatics
! Drug design
! Identification of genetic risk factors
! Gene therapy
! Genetic modification of food crops and animals
! Biological warfare, crime etc.
! Personal Medicine?
! E-Doctor?
14
Bioinformatics as InformationTechnology
������������
����������
������
�������
������ �
��������
�����
������
�������
� ������
��������������
��������������������
����������������
����������
�������������
�������������������
��!���������������
�������������"������ ���
��������
15
Background of Bioinformatics
! Biological information infra4Biological information management systems
4Analysis software tools
4Communication networks for biological research
! Massive biological databases4DNA/RNA sequences
4Protein sequences
4Genetic map linkage data
4Biochemical reactions and pathways
! Need to integrate these resources to model biologicalreality and exploit the biological knowledge that is beinggathered.
16
StructuralGenomics
FunctionalGenomics
ProteomicsPharmaco-genomics
���������������
���������������
���������������
���������������
���������������
���������������
Microarray (Biochip)
Infrastructure of Bioinformatics
Areas and Workflow ofBioinformatics
17
DNA Chip Data Mining:IT for BT
18
cDNA Microarray
cDNA clones(probes)
PCR product amplificationpurification
Printing
Microarray
Hybridize targetto microarray
mRNA target
Excitation
Laser 1Laser 2
Emission
Scanning
Analysis
Overlay images and normalize
0.1nl/spot
19
The Complete MicroarrayBioinformatics Solution
DataManagement
Databases
StatisticalAnalysis
ImageProcessing
Automation
DataMining
ClusterAnalysis
20
DNA Chip Applications
! Gene discovery: gene/mutated gene4Growth, behavior, homeostasis …
! Disease diagnosis4Cancer classification
! Drug discovery: Pharmacogenomics
! Toxicological research: Toxicogenomics
21
Disease Diagnosis:Cancer Classification with DNA Microarray
- cDNA microarray data of 6567gene expression levels [Khan ’01].
- Filter genes that are correlated tothe classification of cancer usingPCA and ANN learning.
- Hierarchical clustering of the DNAchip samples based on the filtered 96genes.
- Disease diagnosis based on DNAchip.
[Fig.] Flowchart of the experimentalprocedure.
22
Disease Diagnosis:Hierarchical Clustering Based on Gene Expression Levels
- Hierarchical clustering ofcancer by 96 gene expressionlevels.
- The relation between geneexpression and cancercategory.
- Four cancer diagnosticcategories
[Fig.] The dendrogram of fourcancer clusters and gene expressionlevels (row: genes, column: samples).
23
AI Methods for DNA Chip DataAnalysis! Classification and prediction
4ANNs, support vector machines, etc.
4Disease diagnosis
! Cluster analysis4Hierarchical clustering, probabilistic clustering, etc.
4Functional genomics
! Genetic network analysis4Differential models, relevance networks, Bayesian
networks, etc.
4Functional genomics, drug design, etc.
24
Cluster Analysis
[DNA microarray dataset]
[Gene Cluster 1]
[Gene Cluster 2]
[Gene Cluster 3]
[Gene Cluster 4]
25
Methods for Cluster Analysis
! Hierarchical clustering [Eisen ’98]
! Self-organizing maps [Tamayo ’99]
! Bayesian clustering [Barash ’01]
! Probabilistic clustering using latent variables[Shin ’00]
! Non-negative matrix factorization [Shin ’00]
! Generative topographic mapping [Shin ’00]
26
Clustering of Cell Cycle-regulatedGenes in S. cerevisiae (the Yeast)! Identify cell cycle-regulated
genes by cluster analysis.4104 genes are already known to
be cell-cycle regulated.4Known genes are clustered into
6 clusters.
! Cluster 104 known genes andother genes together.
! The same cluster " similarfunctional categories.
[Fig.] 104 known gene expressionlevels according to the cell cycle(row: time step, column: gene).
27
Probabilistic Clustering UsingLatent Variables
gi: ith gene
zk: kth clustertj: jth time stepp(gi|zk): generating probability
of ith gene given kth clustervk=p(t|zk): prototype of kth
cluster
)(
)()|()|()(
i
kkiikki p
zpzpzpzp
gg
gg ==∈
∑∑ ∑=i j k
kjkikij ztpzpzpgztf ))|()|()(log(),,( gg
∑=j
kjijki vxsimilarity ),( vx
: (*) objective function(maximized by EM)
28
Experimental Result:Identify Cell Cycle-Regulated Genes
! Clustering result
[Table] Clustering result with α-factor arrest data. In 4 clusters, the genes, thathave high probability of being cell cycle-regulated, were found.
29
Experimental Result:Prototype Expression Levels of Found Clusters
[Fig.] Prototype expression levels ofgenes found to be cell cycle-regulated (4 clusters).
• The genes in the samecluster show similarexpression patterns duringthe cell cycle.• The genes with similarexpression patterns arelikely to have correlatedfunctions.
30
Clustering Using Non-negativeMatrix Factorization (NMF)
! NMF (non-negative matrix factorization)
∑=
=≈
≈r
aaiaii HW
1
)()( µµµ WHG
WHG
G ��gene expression data matrix
W ��basis matrix (prototypes)
H ��encoding matrix (in low
dimension)
0,, ≥µµ aiai HWG
! NMF as a latent variable model
…
…
h1 hr
g1 g2 gn
W
Whg >=<
h2
31
Experimental Result:Five Clusters Found by NMF
! 5 prototype expression levels during the cell cycle.
�
����
����
����
����
���
����
����
����
����
� � � � � � � �� �� �� �� �� � �� � ��
Time step in cell cycle
Exp
ress
ion
leve
l
32
Clustering Using GenerativeTopographic Mapping (GTM)
• GTM: a nonlinear, parametric mapping y(x;W)from a latent space to a data space.
y�x�W���mapping
t1
t3
t2
x2
x1
Grid
<Latent space> <Data space>
Visualization
Generation
33
Experimental Result:Clusters Found by GTM
! Three cell cycle-regulated clusters found by GTM
(.894 .907 -.766 -.479)10 / 16 (62%)0 / 16
35 / 18/ 7
(-0.111 0.333)(-0.111 0.111)
G1 c1c2
(-.616 –1.01 1.832 1.596)0 / 53 / 5 (80%)
10 / 5/ 3
(0.111 0.333)(0.111 0.111)
G2/M c1c2
(-.171 -.573 .091 .311)1 / 60 / 60 / 6
13 / 7/ 2/ 2
(0.111 0.333)(-0.111 –0.111)(0.323 0.1)
M/G1 c1c2c3
(1.075 1.482 -.233 -.375)5 / 5 (100%)5 / 5(0.111 –0.333)S
(.148 .184 -.367 -.044)1 / 25 /S/G2
Overall mean expressionlevels (Cln/b) of knowngenes
Correct no. / testdata
No. of trainData/ no. incluster
Cluster center
34
Experimental Result:Comparison with other methods
! Comparison of prototype expression levels
(.66 .49 -.55 -.33)300
(total = 800)
(.92 .74 -.62 -.33)(.79 .82 -.48 -.34)
12274
(total = 570)
G1 c1c2
(-.32 -.62 .49 .54)195(-.59 -.96 1.34 1.29)(.08 -.30 .51 .57)
3360
G2/M c1c2
(-.21 -.61 -.04 .07)113(.82 .65 -.65 -.38)(-.04 -.37 -.01 -.11)(.32 .29 -.3 .05)
1203410
M/G1 c1c2c3
(.46 .47 -.43 -.18)71(.84 .81 -.42 -.33)25S
(.13 .05 -.16 .03)121(.13 -.06 -.1 .01)92S/G2
Mean expressionlevels by Spellman
No. of selectedgenes bySpellman
Mean expressionlevels by GTM
No. ofselectedgenes
35
Genetic Network Analysis
- Discover the complex regulatoryinteraction among genes.
- Disease diagnosis, pharmacogenomicsand toxicogenomics
- Boolean networks
- Differential equations
- Relevance networks [Butte ’97]
- Bayesian networks [Friedman ’00][Hwang ’00]
[Fig.] Basin of attraction of 12-geneBoolean genetic network model[Somogyi ’96].
36
Bayesian Networks
! Represent the joint probability distribution amongrandom variables efficiently using the concept ofconditional independence.
BA
C D
Enet)Bayesexample(by the)|()|(),|()()(
rule)chain(by),,,|(),,|(),|()|()(
),,,,(
CEPBDPBACPBPAP
DCBAEPCBADPBACPABPAP
EDCBAP
==
•A, C and D are independent given B.
•C asserts dependency between A and B.
•A, B and E are independent given C.
An edge denotes the possibility of thecausal relationship between nodes.
37
Bayesian Networks Learning
! Dependence analysis [Margaritis ’00]
4Mutual information and χ2 test
! Score-based search
• D: data, S: Bayesian network structure
4NP-hard problem
4Greedy search
4Heuristics to find good massive network structuresquickly (local to global search algorithm)
∏ ∏ ∏= = = Γ+Γ
+ΓΓ
⋅=
=n
i
q
j
r
kijk
ijkijk
ijij
iji iN
NSp
SDpSpSDp
1 1 1 )(
)(
)(
)()(
)|()(),(
αα
αα
38
The Small Bayesian Network forClassification of Cancer
Zyxin
Leukemiaclass
MB-1
C-mybLTC4S
1.3/340/38RBF networks
1/340/38Neural trees
2/340/38Bayes nets
Test errorTraining error
•The Bayesian network was learned by full searchusing BD (Bayesian Dirichlet) score withuninformative prior [Heckerman ’95] from theDNA microarray data for cancer classification(http://waldo.wi.mit.edu/MPR/).
[Table] Comparison of the classification performancewith other methods [Hwang ’00].
39
Large-Scale Bayesian Networkwith 1171 Genes
- Genetic networks forunderstanding the regulatoryinteraction among genes andtheir derivatives
- Pharmacogenomics andToxicogenomics
[Fig.] The Bayesian networkstructure constructed from DNAmicroarray data for cancerclassification (partial view).
40
DNA Computing: BT for IT
41
DNA Computing: BioMoleculesas Computer
011001101010001 ATGCTCGAAGCT
42
Why DNA Computing?
! 6.022 × 1023 molecules / mole
! Immense, brute force search of all possibilities4Desktop: 109 operations / sec
4Supercomputer: 1012 operations / sec
41 µmol of DNA: 1026 reactions
! Favorable energetics: Gibb’s free energy
! 1 J for 2 × 1019 operations
! Storage capacity: 1 bit per cubic nanometer
-1mol8kcalG −=∆
43
HPP
...
......
...ATGACG
TGC
CGA
TAA
GCA
CGT...
...
...
...... ...
...
...
10
3
2 5
6
4
Solution
ATGTGCTAACGAACG
ACGCGAGCATAAATGTGCCGT
TAAACG
CGACGT
TAAACGGCAACG
...
...
...
...
CGACGTAGCCGT
...
...
...
ACGCGAGCATAAATGTGCCGTACGCGTAGCCGT
ACGCGT
......
...
...
...
ACGGCATAAATGTGCACGCGTACGCGAGCATAAATGCGATGCCGT
ACGCGAGCATAAATGTGCCGT
...... ......
...
ACGCGAGCATAAATGTGCCGT
...
.........
...
Decoding
Ligation
Encoding
Gel Electrophoresis
Affinity Column
ACGCGAGCATAAATGTGCACGCGT
ACGCGAGCATAAATGCGATGCACGCGT
ACGCGAGCATAAATGTGCACGCGT
ACGCGAGCATAAATGCGATGCACGCGT
2
0 13 4
56
Node 0: ACG Node 3: TAANode 1: CGA Node 4: ATGNode 2: GCA Node 5: TGC
Node 6: CGT
Flow of DNA Computing
PCR(Polymerase
ChainReaction)
44
Biointelligence on a Chip?
BiologicalComputer
MolecularElectronics
BioinformationTechnology
Computing Models:The limit of conventionalcomputing models
Computing Devices:The limit of siliconesemiconductor technology
InformationTechnology
Biotechnology
BiointelligenceChip
45
Intelligent BiomolecularInformation Processing
��������� ����������
������� ���������
S
GFP
Cytochrome c
S
GFP
Cytochrome c
������� �����
�������������� Controller
������
Reaction Chamber
(Calculating)
46
Evolvable BiomolecularHardware
! Sequence programmable and evolvable molecular systems have beenconstructed as cell-free chemical systems using biomolecules such asDNA and proteins.
47
DNA Computers vs.Conventional Computers
electronic data are vulnerable butcan be backed up easily
DNA is sensitive to chemicaldeterioration
setting up only requires keyboardinput
setting up a problem may involveconsiderable preparations
smaller memorycan provide huge memory in smallspace
can do substantially feweroperations simultaneously
can do billions of operationssimultaneously
fast at individual operationsslow at individual operations
Microchip-based computersDNA-based computers
48
Molecular Operators for DNAComputing
• Hybridization: complementary pairing of two single-stranded polynucleotides
�’� ���������–�’
�’� ���������–�’
�’� ���������–�’
�’� ���������–�’
• Ligation: attaching sticky ends to a blunt-ended molecule
����
��������
��������
����
������������
������������
�� ������
49
Research Groups
! MIT, Caltech, Princeton University, Bell Labs
! EMCC (European Molecular ComputingConsortium) is composed of national groups from11 European countries
! BioMIP Institute (BioMolecular InformationProcessing) at the German National ResearchCenter for Information Technology (GMD)
! Molecular Computer Project (MCP) in Japan
! Leiden Center for Natural Computation (LCNC)
50
Applications of BiomolecularComputing! Massively parallel problem solving! Combinatorial optimization! Molecular nano-memory with fast associative search! AI problem solving! Medical diagnosis! Cryptography! Drug discovery! Further impact in biology and medicine:
4Wet biological data bases4Processing of DNA labeled with digital data4Sequence comparison4Fingerprinting
51
NACST(Nucleotide Acid Computing Simulation Toolkit)
GUI
DNA Sequence Generator
Genetic Algorithm
Ligation Unit
PCR Unit
Electrophoresis Unit
Affinity Column Unit
Enzyme Unit
NACST Engine Controller
DNA Sequence Optimizer
52
NACSTOutputsInputs
53
Combinatorial Problem Solver
1
32
AGCT TAGGP1A P1B
ATGG CATGP2A P2B
CGAT CGAAP3A P3B
10
3
2 5
6
4
3
53
3
7
113
3
9
11
33 7 3
P1B P3A
ATCC GCCT GCTAW1→3P1B P2A
ATCC ATCA TACCW1→2
TSP (Traveling Salesman Problem)
Representations
0 → 1 → 2 → 3 → 4 → 5 → 6 → 0
54
Combinatorial Problem Solver
! Weight representationmethods
1. Molecules with high G-Ccontent tend to hybridizeeasily.
2. Molecules with high G-Ccontent tend to bedenatured at highertemperature.
3. Molecules with largerpopulation in tube willhave more probability tohybridize.
Hybridization/Ligation
PCR/Gel electrophoresis
Affinity chromatography
PCR/Gel electrophoresis
Temperature GradientGel Electrophoresis
Graduate PCR
55
Experimental Results for 4-TSP
Hybridization (37°C)Ligation (16 °C 15hr)
PCR (36 cycle)Gel electrophoresis
(10% polyacrylamide gel)
50 bp markerOligomer mixture
Ligation result
Final PCRresult(140bp)
56
Molecular Theorem Prover
! Resolution refutation method
RQP ∨¬∨¬ QTS ∨¬∨¬ S TP R¬
RQ∨¬ QT∨¬
Q
R
nilR is true!
! Problem underconsideration:
! Turninto , add R as
!
?true
,,,,
=→∧→∧
R
PTSQTSRQP
BA →BA ∨¬
R¬
RPTS
QTSRQP
¬∨¬∨¬∨¬∨¬
,,,
,
57
Molecular Theorem Prover(Abstract Implementation)
! ������������ 1 ! ������������ 2
¬S ¬T Q
¬Q ¬P R
P ¬R
TS
¬S ¬T Q¬Q ¬P R
P ¬R
TS
¬S ¬T Q¬Q ¬P R
P ¬RTS
R
¬Q
Q
¬P¬S
¬T ¬R
T S
P
58
Molecular Theorem Prover(Experiments for Method 1)
! �� �� ! �� ��
II. Denaturation
( 95°C 10 min)
IV. Polyacrylamide gel Electrophoresis(20%)
( PAGE )
V. Detection of solution
: 75bp ds DNA
III. Annealing
95°C 1 min #### 15 °C : 1°C down/min
I. � ���� ��
100pmol/each #### Total 20 ul
200 bp
20 bp
1 2 3 4 5 6
20 bp DMA marker (Talara)
Mixture Reaction
59
Solving Logic Problems byMolecular Computing
! Satisfiability Problem4Find Boolean values for
variables that make the givenformula true
! 3-SAT Problem4Every NP problems can be
seen as the search for asolution that simultaneouslysatisfies a number of logicalclauses, each composed ofthree variables.
)oror(AND)oror(
)oror(AND)oror(
321321
654321
xxxxxx
xxxxxx
)()()( 324431 xxxxxx ∨∧∧∨∨
DNA Computing with DNA Chips
61
DNA Chips for DNA Computing
I. Make: oligomer synthesis
II. Attach (Immobilized):5’HS-C6-T15-CCTTvvvvvvvvTTCG-3’
III. Mark: hybridization
IV. Destroy: Enzyme rxn (ex.EcoRI)
V. Unmark*���������� strand
�
VI. Readout:N cycle�������������, PCR������ !
62
Variable Sequences and theEncoding Scheme
63
Tree-dimensional Plot andHistogram of the Fluorescence
! S3: w=0, x=0, y=1, z=1
! S7: w=0, x=1, y=1, z=1
! S8: w=1, x=0, y=0, z=0
! S9 : w=1, x=0, y=0, z=1
! y=1: (w V x V y) ��
! z=1: (w V y V z) ��
! x=0 or y=1: (x V y) ��
! w=0: (w V y) ��
! Four spots with high fluorescenceintensity correspond to the fourexpected solutions.
! DNA sequences identified in thereadout step via addressed arrayhybridization.
64
Outlook
! IT gets a growing importance in the advancementof BT.4Bioinformatics
4DNA Microarray Data Mining
! IT can benefit much from BT.4Biocomputing and Biochips
4DNA Computing (with DNA Chips)
! Bioinformation technology (BIT) is essential as anext-generation information technology.4In Silico Biology vs. In Vivo Computing
65
References
! [Barash ’01] Barash, Y. and Friedman, N., Context-specific Bayesianclustering for gene expression data, Proc. of RECOMB’01, 2001.
! [Butte ’97] Butte, A.J. et al., Discovering functional relationshipsbetween RNA expression and chemotherapeutic susceptibility usingrelevance networks, Proc. Natl Acad. Sci. USA, 94, 1997.
! [Eisen ’98] Eisen, M.B. et al., Cluster analysis and display of genome-wide expression patterns, Proc. Natl Acad. Sci. USA, 95, 1998.
! [Friedman ’00] Friedman, N. et al, Using Bayesian networks toanalyze expression data, Proc. of RECOMB’00, 2000.
! [Heckerman ’95] Heckerman, D. et al., Learning Bayesian networks:the combination of knowledge and statistical data, Machine Learning,20(3), 1995.
! [Hwang ’00] Hwang, K.-B. et al., Applying machine learningtechniques to analysis of gene expression data: cancer diagnosis,CAMDA’00, 2000.
66
References
! [Khan ’01] Khan, J. et al., Classification and diagnostic prediction ofcancers using gene expression profiling and artificial neural networks,Nature Medicine, 7(6), 2001.
! [Margaritis ’00] Margaritis, D. and Thrun, S., Bayesian networkinduction via local neighborhoods, Proc. of NIPS’00, 2000.
! [Shin ’00] Shin, H.-J. et al., Probabilistic models for clustering cellcycle-regulated genes in the yeast, CAMDA’00, 2000.
! [Somogyi ’96] Somogyi, R. and Sniegoski, C.A., Modeling thecomplexity of genetic networks: understanding multigenic andpleiotropic regulation, Complexity, 1(6), 1996.
! [Tamayo ’99] Tamayo, P. et al., Interpreting patterns of geneexpression with self-organizing maps: methods and application tohematopoietic differentiation, Proc. Natl Acad. Sci. USA, 96, 1999.
67
Web Resources: Bioinformatics
! ANGIS - The Australian National Genomic Information Service:http://morgan.angis.su.oz.au/
! Australian National University (ANU) Bioinformatics: http://life.anu.edu.au/! BioMolecular Engineering Research Center (BMERC): http://bmerc-www.bu.edu/! Brutlag bioinformatics group: http://motif.stanford.edu/! Columbia University Bioinformatics Center (CUBIC): http://cubic.bioc.columbia.edu/! European Bioinformatics Institute (EBI): http://www.ebi.ac.uk/! European Molecular Biology Laboratory (EMBL): http://www.embl-heidelberg.de/! Genetic Information Research Institute: http://www.girinst.org/! GMD-SCAI: http://www.gmd.de/SCAI/scai_home.html! Harvard Biological Laboratories: http://golgi.harvard.edu/! Laurence H. BakerCenter for Bioinformatics and Biological Statistics:
http://www.bioinformatics.iastate.edu/! NASA Center for Bioinformatics: http://biocomp.arc.nasa.gov/! NCSA Computational Biology: http://www.ncsa.uiuc.edu/Apps/CB/! Stockholm Bioinformatics Center: http://www.sbc.su.se/! USC Computational Biology: http://www-hto.usc.edu/! W. M. Keck Center for Computational Biology: http://www-bioc.rice.edu/
68
Web Resources: Biocomputing
! European Molecular Computing Consortium (EMCC):http://www.csc.liv.ac.uk/~emcc/
! BioMolecular Information Processing (BioMip):http://www.gmd.de/BIOMIP
! Leiden Center for Natural Computation (LCNC):http://www.wi.leidenuniv.nl/~lcnc/
! Biomolecular Computation (BMC):http://bmc.cs.duke.edu/
! DNA Computing and Informatics at Surfaces:http://www.corninfo.chem.wisc.edu/writings/DNAcomputing.html
! SNU Molecular Evolutionary Computing (MEC) Project:http://scai.snu.ac.kr/Research/
69
Web Resources: Biochips
! DNA Microarry (Genome Chip):http://www.gene-chips.com/
! Large-Scale Gene Expression and MicroarrayLink and Resources:http://industry.ebi.ac.uk/~alan/MicroArray/
! The Microarray Centre at The Ontario CancerInstitute:http://www.oci.utoronto.ca/services/microarray/
! Lab-on-a-Chip resources: http://www.lab-on-a-chip.com/
! Mailing List: [email protected]
70
Books: Bioinformatics
! Cynthia Gibas and Per Jambeck, Developing BioinformaticsComputer Skills, O’REILLY, 2001.
! Peter Clote and Rolf Backofen, Computational Molecular Biology:An Introduction, A John Wiley & Sons, Inc., 2000.
! Arun Jagota, Data Analysis and Classification for Bioinformatics,2000.
! Hooman H. Rashidi and Lukas K. Buehler, Bioinformatics BasicsApplications in Biological Science and Medicine, 1999.
! Pierre Baldi and Soren Brunak, Bioinformatics: The MachineLearning Approach, MIT Press, 1998.
! Andreas Baxevanis and B. F. Francis Ouellette, Bioinformatics: APractical Guide to the Analysis of Genes and Proteins, A John Wiley& Sons, Inc., 1998.
71
Books: Biocomputing
! Cristian S, Calude and Gheorghe Paun, Computing with Cells andAtoms: An introduction to quantum, DNA and membrane computing,Taylor & Francis, 2001.
! Pâun, G., Ed., Computing With Bio-Molecules: Theory andExperiments, Springer, 1999.
! Gheorghe Paun, Grzegorz Rozenberg and Arto Salomaa, DNAComputing, New Computing Paradigms, Springer, 1998.
! C. S. Calude, J. Casti and M. J. Dinneen, Unconventional Models ofComputation, Springer, 1998.
! Tono Gramss, Stefan Bornholdt, Michael Gross, Melanie Mitchell andthomas Pellizzari, Non-Standard Computation: MolecularComputation-Cellular Automata-Evolutionary Algorithms-QuantumComputers, Wiley-Vch, 1997.
72
More information athttp://cbit.snu.ac.kr/http://bi.snu.ac.kr/