giri narasimhangiri/teach/bioinf/s11/lecx1.pdf · world bank statistics" data: world bank...

32
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 [email protected] www.cis.fiu.edu/~giri/teach/BioinfS11.html

Upload: others

Post on 12-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

CAP 5510: Introduction to BioinformaticsCGS 5166: Bioinformatics Tools"

Giri Narasimhan ECS 254; Phone: x3748

[email protected] www.cis.fiu.edu/~giri/teach/BioinfS11.html

Page 2: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 2

DNA Chips & Images"

Page 3: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 3

Microarray Data"

Gene Expression Level

Gene1

Gene2

Gene3

Page 4: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 4

K-Means Clustering: Example"

Example from Andrew Moore’s tutorial on Clustering.

Page 5: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 5

Start

Page 6: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 6

Page 7: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 7

Page 8: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 8

Start

End

Page 9: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 9

K-Means Clustering [McQueen ʼ67]"

Repeat "  Start with randomly chosen cluster centers "   Assign points to give greatest increase in score "   Recompute cluster centers "   Reassign points

until (no changes) Try the applet at: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/

AppletH.html

Page 10: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 10

Comparisons" Hierarchical clustering

"  Number of clusters not preset. "  Complete hierarchy of clusters "  Not very robust, not very efficient.

 K-Means "  Need definition of a mean. Categorical data? "  More efficient and often finds optimum clustering.

Page 11: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 11

Functionally related genes behave similarly across experiments

Page 12: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 12

Self-Organizing Maps [Kohonen]" Kind of neural network.  Clusters data and find complex relationships

between clusters.  Helps reduce the dimensionality of the data.  Map of 1 or 2 dimensions produced.  Unsupervised Clustering  Like K-Means, except for visualization

Page 13: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 13

SOM Architectures"  2-D Grid   3-D Grid   Hexagonal Grid

Page 14: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 14

SOM Algorithm" Select SOM architecture, and initialize weight

vectors and other parameters.  While (stopping condition not satisfied) do for each

input point x "  winning node q has weight vector closest to x. "  Update weight vector of q and its neighbors. "  Reduce neighborhood size and learning rate.

Page 15: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 15

SOM Algorithm Details" Distance between x and weight vector:  Winning node:  Weight update function (for neighbors):

 Learning rate:

iwx −

)]()()[,,()()1( kwkxixkkwkw iii −+=+ µ

ii

wxxq −=min)(

⎟⎟

⎜⎜

⎛ −−= 2

2)(

exp)(),,( 0

σηµ

xqi rrkixk

Page 16: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 16

World Bank Statistics" Data: World Bank statistics of countries in 1992.  39 indicators considered e.g., health, nutrition,

educational services, etc.  The complex joint effect of these factors can can

be visualized by organizing the countries using the self-organizing map.

Page 17: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 17

World Poverty PCA"

Page 18: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 18

World Poverty SOM"

Page 19: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 19

World Poverty Map"

Page 20: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 20

Page 21: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 21

Page 22: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 22

Viewing SOM Clusters on PCA axes"

Page 23: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 23

1

5

4

3

2

1

2 3 4 5 6

SOM Example [Xiao-rui He]"

Page 24: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 24

Neural Networks"

Σ Input X

Synaptic Weights W

ƒ(•)

Bias θ

Output y

Page 25: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 25

Learning NN"

×

×

× Σ

Σ

Adaptive Algorithm

Input X

1

Weights W

- +

Desired Response

Error

Page 26: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 26

Types of NNs"  Recurrent NN   Feed-forward NN   Layered

Other issues"  Hidden layers possible   Different activation functions possible

Page 27: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 27

Application: Secondary Structure Prediction"

Page 28: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 28

Polymerase Chain Reaction (PCR)" For testing, large amount of DNA is needed

"   Identifying individuals for forensic purposes  (0.1 microliter of saliva contains enough epithelial cells)

"   Identifying pathogens (viruses and/or bacteria)  PCR is a technique to amplify the number of copies of a

specific region of DNA.  Useful when exact DNA sequence is unknown  Need to know “flanking” sequences  Primers designed from “flanking” sequences

Page 29: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 29

PCR"

DNA

Region to be amplified!Flanking Regions with

known sequence!

Reverse Primer!

Millions of Copies

Forward Primer!

Flanking Regions with known sequence!

Page 30: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 30

PCR " ""

Page 31: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 31

Schematic outline of a typical PCR cycle"

Target DNA

Primers

dNTPs

DNA polymerase

Page 32: Giri Narasimhangiri/teach/Bioinf/S11/Lecx1.pdf · World Bank Statistics" Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational

2/22/11 CAP5510 / CGS 5166 32

Picture Copyright: AccessExcellence @ the National Museum of Health