cs 7010: computational methods in bioinformatics (course review) dong xu computer science department...

30
CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: [email protected] 573-882-7064 (O) http://digbio.missouri.edu

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

CS 7010: Computational Methods in

Bioinformatics(course review)

Dong Xu

Computer Science Department271C Life Sciences Center

1201 East Rollins RoadUniversity of Missouri-Columbia

Columbia, MO 65211-2060E-mail: [email protected]

573-882-7064 (O)http://digbio.missouri.edu

Page 2: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Technical Definitions

NIH (http://www.bisti.nih.gov/) Bioinformatics: “research, development, or

application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze, or visualize such data”.

Computational Biology: “the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems”.

Page 3: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Course Topics

Data interpretation in analytical technologies Data management and computational

infrastructure Discovery from data mining Modeling, prediction and design Theoretical in silico biology

Cover classical/mainstream bioinformatics problems from computer science prospective

Page 4: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Discovery from Data Mining (I)

Page 5: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Data sourceGenomic / protein sequence

Microarray data

Protein interaction

Complicated dataLarge-scale, high-dimension

Noisy (false positives and false negatives)

Discovery from Data Mining (II)

Page 6: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Pattern/knowledge discovery from datamany biological data are generated by

biological processes which are not well understood

interpretation of such data requires discovery of convoluted relationships hidden in the data

which segment of a DNA sequence represents a gene, a regulatory region

which genes are possibly responsible for a particular disease

Discovery from Data Mining (III)

Page 7: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Modeling, Prediction and

Design (I)

Modeling and prediction of biological objects/processesSequence comparison

Secondary structure prediction

Gene finding

Regulatory sequence

identification

Page 8: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Prediction of outcomes of biological processes computing will become an integral part of modern biology through an

iterative process of

From prediction to engineering design Drug design Protein structure prediction to protein engineering Design genetically modified species

model formulation

computational prediction

experimental validation

Modeling, Prediction and

Design (II)

Page 9: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Scope of Bioinformatics

data management; data mining; modeling; prediction; theory formulation

engineering aspect

scientific aspect

bioinformatics

an indispensable part of biological science

genes, proteins, protein complexes, pathways, cells, organisms, ecosystem

computer science, biology, statisticsmathematics, physics, chemistry, engineering,…

Page 10: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Bioinformatics Foundations

Technology Biology/medicine Computer Science Statistics

From interdisciplinary field to a distinct discipline

Page 11: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Course Coverage

A general introduction to the field of bioinformatics problems definitions: from biological problem to computable problem

key computational techniques

A way of thinking: tackling “biological problem” computationally how to look at a biological problem from a computational point of view

how to formulate a computational problem to address a biological issue

how to collect statistics from biological data

how to build a computational model

how to design algorithms for the model

how to test and evaluate a computational algorithm

how to access confidence of a prediction result

Page 12: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Dong’s top 10 list forcomputational methods

in BI

1. Dynamic programming

2. Neural network

3. Hidden Markov Model

4. Hypothesis test

5. Bayesian statistics

6. Clustering

7. Information theory

8. Support Vector Machine

9. Maximum likelihood

10. Sampling search (Gibbs, Monte Carlo, etc)

Page 13: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

1. “Solved” problems

2. “Developed” areas with remaining challenges hard to solve

3. Developing areas

4. Emergent areas

5. Future directions

Research Areas

54

3

2

1

Page 14: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

DNA sequence base calling and assembly Pairwise sequence comparison Protein secondary structure prediction Disordered region in proteins Transmembrane segment prediction Subcellular localization Signal peptide prediction Protein geometry Homology modeling Physical/genetic mapping informatics

“Solved” Problems

Page 15: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Gene finding Phylogenetic tree construction and evolution Protein docking Drug design Protein design Linkage analysis and quantitative traits (QTL) Microarray data collection Gene expression clustering

“Developed” areas with remaining challenges

Page 16: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Multiple sequence comparison and remote homolog search

Repetitive sequence analysis Protein structure comparison Protein tertiary structure prediction RNA secondary structure prediction Regulatory sequence analysis Computational proteomics Protein interaction networks Gene ontology and function prediction Computational neural science and applications in various

species and systems (e.g., cancer)

Developing Areas

Page 17: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Pathway (regulatory network) prediction ChIP-chip analysis Tiling array analysis Haplotype/SNP analysis Computational comparative genomics Text (literature) mining Small RNA and anti-sense regulation Alternative splicing prediction Computational metabolomics

Emergent Areas

Page 18: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Genome semantics Membrane protein structure prediction RNA tertiary structure prediction Post-translational modification Dynamics of regulatory networks Virtual cell/organism modeling Phenotype-genotype relationship … (nobody knows)

Possible future directions

Page 19: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Where the science is going? (1)

Bioinformatics has been a “technology” to biological research: Interpretation of data generated by bench biologists

We start to see a trend that computational predictions can guide experimental design

With more high-throughput technologies become available, discovery-driven science will play increasingly more important roles in biology research

With computational techniques continue to mature for biological applications, we will see more and more computational applications with powerful prediction capabilities

Page 20: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Where the science is going? (2)

Like physics, where general rules and laws are taught at the start, biology will surely be presented to future generations of students as a set of basic systems ....... duplicated and adapted to a very wide range of cellular and organismic functions, following basic evolutionary principles constrained by Earth’s geological history. --Temple Smith, Current Topics in Computational Molecular Biology

Page 21: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Major research centers (1)

National Center for Biotechnology Information (NCBI) of NIH (http://www.ncbi.nlm.nih.gov/) the home of many important databases including GenBank

the home of many important bioinformatics tools including BLAST

Page 22: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

European Molecular Biology Laboratory (EMBL) (http://www.embl-heidelberg.de/) has some of the most powerful research groups in

bioinformatics

Has numerous tools and databases

Major research centers (2)

Page 23: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Sanger Institute (http://www.sanger.ac.uk/)

The Institute for Gonomic Research (TIGR, http://www.tigr.org/)

Swiss-Prot (http://www.tigr.org/)

Major research centers (3)

Page 24: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Major Universities in US

University of California at Santa Cruz University of California at San Diego Washington University University of Southern California Stanford University Columbia University Boston University Harvard University MIT Virginia Tech

Page 25: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Major journals

BioinformaticsNucleic Acids ResearchGenome Research Journal of Computational Biology Journal of Bioinformatics and Computational Biology In silico Biology Briefings in bioinformatics Applied Bioinformatics IEEE/ACM Transactions on Computational Biology and

Bioinformatics Proteins: structure, function and bioinformatics Journal of Computer Science and TechnologyGenomics, Proteomics and Bioinformatics…

Page 26: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Major conferences

Intelligent Systems for Molecular Biology (ISMB)

Annual Conference on Computational Biology (RECOMB)

IEEE/Computational Systems Bioinformatics Conference (CSB)

Pacific Symposium on Biocomputing (PSB)

European Conference on Computational Biology (ECCB)

IEEE Conference on Biotechnology and Bioinformatics (BIBE)

International Workshop on Genome Informatics (GIW)

Asia-Pacific Bioinformatics Conference (APBC)

Page 27: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Academicians

Michael Waterman Phil Green Gene Myers Barry Honig

No Nobel Price Winner yet…

Page 28: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Discussions

Scope of the new biology (large-scale) Technology (tool development) vs. science

(biological application) Knowledge vs. prediction Experimental vs. computational/theoretical First principle vs. empirical / statistical Automated vs. curated

One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man.

Page 29: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Choosing Bioinformaticsas Career - 1

Field outlook Must be a believer of bioinformatics (for

its value to science) Must have a strong motivation and

willing to walk extra miles (learn more disciplines)

Technologist vs. technician

Page 30: CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University

Choosing Bioinformaticsas Career - 2

Molecular & cellular and evolutionary biologyunderstanding the science

Computational, mathematical, and statistical sciencesmastering the techniques

High-throughput measurement technologiesKnowing what biological data are obtainable