lab gene expression data analysis

10
Data manipulation: Biostatistic & Gene expression data analysis (Microarray, NGS & qRT-PCR) Theme: Transcriptional Program in Response of Human Fibroblasts to Serum. Lab#. Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 30 2012 [email protected]

Upload: usd-bioinformatics

Post on 11-May-2015

1.382 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Lab Gene Expression Data Analysis

Data manipulation: Biostatistic & Gene expression data analysis

(Microarray, NGS & qRT-PCR)

Theme: Transcriptional Program in Response of Human Fibroblasts to Serum.

Lab#.

Etienne Z. GnimpiebaBRIN WS 2012

Sioux Falls, May 30 [email protected]

Page 2: Lab Gene Expression Data Analysis

Data manipulation Gene expression data analysisOMIC World

DNA

E

DNA

mRNA

E Degradatio

n

Degradation

Translation

Transcription

Gene Repressi

on

S P

Catalyse

Genomics

FunctionalGenomics

Transcriptomics

Proteomics

Metabolomics

Page 3: Lab Gene Expression Data Analysis

Data manipulation Gene expression data analysisOMIC World

GENOMICS

Page 4: Lab Gene Expression Data Analysis

Data manipulation Gene expression data analysis

Etienne Z. GnimpiebaBRIN WS 2012

Sioux Falls, May 31 2012

Excel used in genomics

• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)

• Transcriptome ENS (France)

• How to select columns• How to use functions• How to anchor a cell value in a function• How to copy the function result and not the

function itself• How to sort data by columns• How to search and replace

Page 5: Lab Gene Expression Data Analysis

Data manipulation Gene expression data analysisExcel used in genomics: Pre-treatment

• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)

• Transcriptome ENS (France)

1. Open the file containing the experiment series (your expression matrix) in Excel software, using the tabulation character as the column separator. Click on the second spreadsheet named Fibroblast real. Look over this spreadsheet quickly. It is a realistic data set from a microarray experiment. Click back on the first spreadsheet named Fibroblast lab. We will be using a condensed version.

2. For one column (corresponding to one DNA microarray experiment), calculate the mean value, using the AVERAGE Excel function. Verify that the value obtained is equal to zero.

3. If it is not the case, from each experiment (15MIN, 30MIN, 2HR, etc…) remove the log2(Ratio) value from the corresponding mean value by:

- subtract the average value for each column from the corresponding individual values (for the first example, B2-$B$37). Place these values in the corresponding table on the right (R2). Use the drag down box to quickly finish a column.

- Continue to center the data for each column (each DNA microarray experiment), filling in the blank table to the right. Again use the AVERAGE function to find mean values for each column in the new table. Each average should now be zero.

- Be careful, if there are missing values (empty cells), replace empty contents with the NULL or NA command, in order to avoid introducing a zero value in Excel calculations in this cell. Indeed, a missing value is different from a true null one!

- Be careful with decimal separator handling in Excel (dot or coma)!

Centering and scaling data

Page 6: Lab Gene Expression Data Analysis

Data manipulation Gene expression data analysisExcel used in genomics : Differential expression analysis (1)

• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)

• Transcriptome ENS (France)

Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most microarray users. Using SAM implies several modifications in your data file:

The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.

The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. You must highlight your header if you don’t want to loose the experiment information.

Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.

SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressed genes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/

Page 7: Lab Gene Expression Data Analysis

Data manipulation Gene expression data analysis

Etienne Z. GnimpiebaBRIN WS 2012

Sioux Falls, May 31 2012

Excel used in genomics : Differential expression analysis (2)

• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)

• Transcriptome ENS (France)

Under the Add-Ins tab, view the “SAM” toolbar Command. Highlight from R2 to AF37. Now select SAM. When SAM macro is launched in the tool bar, a setting window appears. For further information on the various options you can choose, it is best to refer to the SAM manual. However, the first important thing to do is to indicate if the data source has been transformed in log2 or not. In this case we will select Unlogged. Then, as data bootstrapping uses a random generator, you need to initialize it several times by selecting “Generate Random Seed”.

Click “OK”. Once all the chosen iterations have been done, SAM displays a plot representing each gene in reference to its score in the real distribution compared to the random distributions. Therefore, the differentially expressed genes are the ones moving away from the 45° slope line.

The table that appears indicates for each delta value, the number of putative differentially expressed genes, the significant genes, and the number of false positive genes estimated using the False Discovery Rate (FDR). The user can change the delta value according to the number of false positive or significant genes he or she wants to obtain.

Choose a delta value by selecting “Manually Enter Delta”. Enter your own delta value between 0 and 0.25. Then if you select the “List Significant Genes” button, SAM displays the list of differentially expressed genes in the “SAM output” sheet according to the delta value you chose.

This sheet summarizes the selected parameters and gives you the list of induced and repressed genes.

Page 8: Lab Gene Expression Data Analysis

Data manipulation Gene expression data analysis

Etienne Z. GnimpiebaBRIN WS 2012

Sioux Falls, May 31 2012

GEPAS: Gene Expression pattern Analysis suite

• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)

• Transcriptome ENS (France)

Verify that the data file FibroGEPAS.txt is in your folder Open the file Open GEPAS portal on http://

www.transcriptome.ens.fr/gepas/index.html Click on “Tools”

Preprocessing Preprocess DNA array data files: log-

transformation, replicate handling, missing value imputation, filtering and normalization

Filtering Viewing Clustering Differential expression Classification Data mining

Review this section. Become familiar on your own by reviewing each section listed under tools.

Page 9: Lab Gene Expression Data Analysis

Resolution process

Context

Specification & aims

Statement of problem / Case study: The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to

serum, was explored with a complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in this complex multicellular response than had previously been appreciated.

Gene Expression Data Analysis

16 Vishwanath R. Iyer, Scince, 1999

Conclusion: ?

Aim: The purpose of this lab is to initiate a gene expression data analysis process. We simulated the application on “Transcriptional Program in the Response of Human Fibroblasts to Serum” . Now we can understand how a researcher can come to identify a significant expressed gene from microarray datasets.

T1. Gene expression overview

T2. Excel used in GenomicsObjective: use of basic excel functionalities to solve some

gene expression data analysis needs

Acquired skills- Gene expression data overview- Excel Used for genomics- Microarray data analysis using GEPAS

T1.1. Review of genomics place in OMIC- world T1.2. Microarray data technics and process T1.3. Data analysis cycle and tools

T2.1. Column manipulation, functions used, anchor, copy with function, sort data, search and replaceT2.2. Experiment comparison: Data pre-treatmentT1.3. Differential expressed gene from replicate experiments (SAM)T2. GEPAS: Gene expression analysis

pattern suiteObjective: use of the GEPAS suite to apply the whole

microarray data analyzing process on fibroblast data.

Preprocessing Viewing Clustering Differential expression Classification Data mining

9

Expression profile clusteringData analysis

Slide scanningHybridizationTarget preparation

Page 10: Lab Gene Expression Data Analysis

END.