analysis of go annotation at cluster level by agnieszka s. juncker

12
Analysis of GO annotation at cluster level by Agnieszka S. Juncker

Upload: nissim-franklin

Post on 30-Dec-2015

44 views

Category:

Documents


0 download

DESCRIPTION

Analysis of GO annotation at cluster level by Agnieszka S. Juncker. The DNA Array Analysis Pipeline. Question Experimental Design. Array design Probe design. Sample Preparation Hybridization. Buy Chip/Array. Image analysis. Normalization. Expression Index Calculation. Comparable - PowerPoint PPT Presentation

TRANSCRIPT

Analysis of GO annotation at cluster level

by Agnieszka S. Juncker

Sample PreparationHybridization

Array designProbe design

QuestionExperimental Design

Buy Chip/Array

Statistical AnalysisFit to Model (time series)

Expression IndexCalculation

Advanced Data Analysis

Clustering PCA Classification Promoter Analysis

Meta analysis Survival analysis Regulatory Network

Normalization

Image analysis

The DNA Array Analysis Pipeline

ComparableGene Expression Data

GO annotations

Gene Ontology

Gene Ontology (GO) is a collection of controlled vocabularies describing the biology of a gene product in any organism

There are 3 independent sets of vocabularies, or ontologies:

• Molecular Function (MF)– e.g. ”DNA binding” and ”catalytic activity”

• Cellular Component (CC)– e.g. ”organelle membrane” and ”cytoskeleton”

• Biological Process (BP)– e.g. ”DNA replication” and ”response to stimulus”

Gene Ontology structure

GO structure, example 2

KEGG pathways

• KEGG PATHWAYS:– collection of manually drawn pathway maps representing our

knowledge on the molecular interaction and reaction networks, for a large selection of organisms

• 1. Metabolism– Carbohydrate, Energy, Lipid, Nucleotide, Amino acid, Other

amino acid, Glycan, PK/NRP, Cofactor/vitamin, Secondary metabolite, Xenobiotics

• 2. Genetic Information Processing• 3. Environmental Information Processing • 4. Cellular Processes• 5. Human Diseases • 6. Drug Development

KEGG pathway example 1

KEGG pathway example 2

Cluster analysis and GO

Analysis example:

• Partitioning clustering of genes into e.g. 15 clusters based on expression profiles

• Assignment of GO terms to genes in clusters

• Looking for GO terms overrepresented in clusters

Hypergeometric test

• The hypergeometric distribution arises from sampling from a fixed population.

10 balls

• We want to calculate the probability for drawing 7 or more white balls out of 10 balls given the distribution of balls in the urn

20 white ballsout of

100 balls

Yeast cell cycle

Time series

experiment:

Gene expression

profiles:

Time

YY

YY

Y

Y

Y

Time

Gene1

Gene2

Sampling

R stuff Indexing of a matrix (used when you wish to select a subset of your

data, e.g. specific rows or columns):

• Example 1 rowindex <- 1:10colindex <- 1:5datamatrix[rowindex, colindex] # first 10 rows, first 5 columnsdatamatrix[1:10, 1:5] # gives the same as above

“Missing” rowindex (or columnindex) means that all rows (or columns) are selected

• Example 2 datamatrix[1:5,] # 5 first rows, all columnsdatamatrix[,5:10] # all rows, columns 5 to 10datamatrix[,] # is the same as datamatrix