cs 7010: computational methods in bioinformatics (course review) dong xu computer science department...
Post on 18-Dec-2015
216 views
TRANSCRIPT
CS 7010: Computational Methods in
Bioinformatics(course review)
Dong Xu
Computer Science Department271C Life Sciences Center
1201 East Rollins RoadUniversity of Missouri-Columbia
Columbia, MO 65211-2060E-mail: [email protected]
573-882-7064 (O)http://digbio.missouri.edu
Technical Definitions
NIH (http://www.bisti.nih.gov/) Bioinformatics: “research, development, or
application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze, or visualize such data”.
Computational Biology: “the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems”.
Course Topics
Data interpretation in analytical technologies Data management and computational
infrastructure Discovery from data mining Modeling, prediction and design Theoretical in silico biology
Cover classical/mainstream bioinformatics problems from computer science prospective
Discovery from Data Mining (I)
Data sourceGenomic / protein sequence
Microarray data
Protein interaction
Complicated dataLarge-scale, high-dimension
Noisy (false positives and false negatives)
Discovery from Data Mining (II)
Pattern/knowledge discovery from datamany biological data are generated by
biological processes which are not well understood
interpretation of such data requires discovery of convoluted relationships hidden in the data
which segment of a DNA sequence represents a gene, a regulatory region
which genes are possibly responsible for a particular disease
Discovery from Data Mining (III)
Modeling, Prediction and
Design (I)
Modeling and prediction of biological objects/processesSequence comparison
Secondary structure prediction
Gene finding
Regulatory sequence
identification
Prediction of outcomes of biological processes computing will become an integral part of modern biology through an
iterative process of
From prediction to engineering design Drug design Protein structure prediction to protein engineering Design genetically modified species
model formulation
computational prediction
experimental validation
Modeling, Prediction and
Design (II)
Scope of Bioinformatics
data management; data mining; modeling; prediction; theory formulation
engineering aspect
scientific aspect
bioinformatics
an indispensable part of biological science
genes, proteins, protein complexes, pathways, cells, organisms, ecosystem
computer science, biology, statisticsmathematics, physics, chemistry, engineering,…
Bioinformatics Foundations
Technology Biology/medicine Computer Science Statistics
From interdisciplinary field to a distinct discipline
Course Coverage
A general introduction to the field of bioinformatics problems definitions: from biological problem to computable problem
key computational techniques
A way of thinking: tackling “biological problem” computationally how to look at a biological problem from a computational point of view
how to formulate a computational problem to address a biological issue
how to collect statistics from biological data
how to build a computational model
how to design algorithms for the model
how to test and evaluate a computational algorithm
how to access confidence of a prediction result
Dong’s top 10 list forcomputational methods
in BI
1. Dynamic programming
2. Neural network
3. Hidden Markov Model
4. Hypothesis test
5. Bayesian statistics
6. Clustering
7. Information theory
8. Support Vector Machine
9. Maximum likelihood
10. Sampling search (Gibbs, Monte Carlo, etc)
1. “Solved” problems
2. “Developed” areas with remaining challenges hard to solve
3. Developing areas
4. Emergent areas
5. Future directions
Research Areas
54
3
2
1
DNA sequence base calling and assembly Pairwise sequence comparison Protein secondary structure prediction Disordered region in proteins Transmembrane segment prediction Subcellular localization Signal peptide prediction Protein geometry Homology modeling Physical/genetic mapping informatics
“Solved” Problems
Gene finding Phylogenetic tree construction and evolution Protein docking Drug design Protein design Linkage analysis and quantitative traits (QTL) Microarray data collection Gene expression clustering
“Developed” areas with remaining challenges
Multiple sequence comparison and remote homolog search
Repetitive sequence analysis Protein structure comparison Protein tertiary structure prediction RNA secondary structure prediction Regulatory sequence analysis Computational proteomics Protein interaction networks Gene ontology and function prediction Computational neural science and applications in various
species and systems (e.g., cancer)
Developing Areas
Pathway (regulatory network) prediction ChIP-chip analysis Tiling array analysis Haplotype/SNP analysis Computational comparative genomics Text (literature) mining Small RNA and anti-sense regulation Alternative splicing prediction Computational metabolomics
Emergent Areas
Genome semantics Membrane protein structure prediction RNA tertiary structure prediction Post-translational modification Dynamics of regulatory networks Virtual cell/organism modeling Phenotype-genotype relationship … (nobody knows)
Possible future directions
Where the science is going? (1)
Bioinformatics has been a “technology” to biological research: Interpretation of data generated by bench biologists
We start to see a trend that computational predictions can guide experimental design
With more high-throughput technologies become available, discovery-driven science will play increasingly more important roles in biology research
With computational techniques continue to mature for biological applications, we will see more and more computational applications with powerful prediction capabilities
Where the science is going? (2)
Like physics, where general rules and laws are taught at the start, biology will surely be presented to future generations of students as a set of basic systems ....... duplicated and adapted to a very wide range of cellular and organismic functions, following basic evolutionary principles constrained by Earth’s geological history. --Temple Smith, Current Topics in Computational Molecular Biology
Major research centers (1)
National Center for Biotechnology Information (NCBI) of NIH (http://www.ncbi.nlm.nih.gov/) the home of many important databases including GenBank
the home of many important bioinformatics tools including BLAST
European Molecular Biology Laboratory (EMBL) (http://www.embl-heidelberg.de/) has some of the most powerful research groups in
bioinformatics
Has numerous tools and databases
Major research centers (2)
Sanger Institute (http://www.sanger.ac.uk/)
The Institute for Gonomic Research (TIGR, http://www.tigr.org/)
Swiss-Prot (http://www.tigr.org/)
Major research centers (3)
Major Universities in US
University of California at Santa Cruz University of California at San Diego Washington University University of Southern California Stanford University Columbia University Boston University Harvard University MIT Virginia Tech
Major journals
BioinformaticsNucleic Acids ResearchGenome Research Journal of Computational Biology Journal of Bioinformatics and Computational Biology In silico Biology Briefings in bioinformatics Applied Bioinformatics IEEE/ACM Transactions on Computational Biology and
Bioinformatics Proteins: structure, function and bioinformatics Journal of Computer Science and TechnologyGenomics, Proteomics and Bioinformatics…
Major conferences
Intelligent Systems for Molecular Biology (ISMB)
Annual Conference on Computational Biology (RECOMB)
IEEE/Computational Systems Bioinformatics Conference (CSB)
Pacific Symposium on Biocomputing (PSB)
European Conference on Computational Biology (ECCB)
IEEE Conference on Biotechnology and Bioinformatics (BIBE)
International Workshop on Genome Informatics (GIW)
Asia-Pacific Bioinformatics Conference (APBC)
…
Academicians
Michael Waterman Phil Green Gene Myers Barry Honig
No Nobel Price Winner yet…
Discussions
Scope of the new biology (large-scale) Technology (tool development) vs. science
(biological application) Knowledge vs. prediction Experimental vs. computational/theoretical First principle vs. empirical / statistical Automated vs. curated
One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man.
Choosing Bioinformaticsas Career - 1
Field outlook Must be a believer of bioinformatics (for
its value to science) Must have a strong motivation and
willing to walk extra miles (learn more disciplines)
Technologist vs. technician
Choosing Bioinformaticsas Career - 2
Molecular & cellular and evolutionary biologyunderstanding the science
Computational, mathematical, and statistical sciencesmastering the techniques
High-throughput measurement technologiesKnowing what biological data are obtainable