statistical analysis of gene expression data with oracle ... · pdf filestatistical analysis...
TRANSCRIPT
![Page 1: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/1.jpg)
Statistical Analysis of Gene Expression Data With
Oracle & “R” (- data mining)
Patrick E. Hoffman Sc.D. Senior Principal Analytical
Consultant [email protected]
![Page 2: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/2.jpg)
Agenda (Oracle & R Analysis)
ToolsLoading DataStatistical Analysis
– Most Important Genes MDL
– Correlations– t statistics, etc– R Visualizations
Predictive models?Other?
![Page 3: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/3.jpg)
Analysis Tools
Tools For DB * PLSQL development– ODM (Oracle Data Miner)– TOAD (From Quest)– Jdeveloper (Free from Oracle) – does Java & plsql
development– Enterprise Manager Console ( Oracle client)
Managing the whole DB– SQLPlus (command line sql or plsql)
R Project– Open source clone of S-Plus
TextPad (editor)
![Page 4: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/4.jpg)
The data
Affy gene expression (old AML/ALL)7129 genes72 patients (combined train/test)
Can be expanded to the new chips (50,000 genes)
![Page 5: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/5.jpg)
![Page 6: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/6.jpg)
Loading Data into Oracle DB?From flat file?
SQL*LoaderOracle Warehouse Builder
– Production (auto sql*ldr)– All types of data
Oracle Data Miner (ODM)– Quick & Easy
From R-Project tables
![Page 7: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/7.jpg)
Screen Shots ODM’rto load csv file
![Page 8: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/8.jpg)
![Page 9: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/9.jpg)
![Page 10: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/10.jpg)
![Page 11: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/11.jpg)
![Page 12: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/12.jpg)
![Page 13: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/13.jpg)
![Page 14: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/14.jpg)
Load Data From R to DB
Set up ODBC driver (in Windows)Download RODBC pkg (from cran)Load CSV fileSqlsave routine to store in DB
![Page 15: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/15.jpg)
R code – send csv to db# you must install(download) the RODBC first# r script to load a csv file to Oracle db# install packages(RODBC to connect to a database) from CRAN (menu option)library(RODBC)
# use standard microsoft odbc connection# set up a dsn name to connect to correct service# setup the channel to database, ODBC must be set up to connect to Oracle DBchan1 <- odbcConnect(dsn="")odbcGetInfo(chan1) #make sure channel is ok
# load in a csv filefilename <- "D:\\clients\\affy\\gs_EXPRESSION.csv"csv <- read.csv(filename)
sqlDrop(chan1, "GS_EXPRESSION", errors = TRUE) #drop the old table# load the csv file to a table in the Oracle dbsqlSave(chan1, csv, tablename = "GS_EXPRESSION",
rownames = FALSE,colnames = T, fast = F) # fast did not work on 10g
![Page 16: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/16.jpg)
7000 genes? 1000 column DB Limit?
Gone! Transactional FormatorNested Columns
Convert flat file table to one of these formats
![Page 17: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/17.jpg)
Analysis
Pl/Sql Package affy– affy_to_trans Flat to Transactional– affy_ai Calculate MDL Attrib. Import.– trans_to_affy Convert top gene to flat form.– corr_genes Correlate genes to genes– corr_cases Correlate cases to cases– corr_target Correlate genes to target – T – statistic Calculate t-statistics– anova Stats for Multiple Classes
![Page 18: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/18.jpg)
Convert gene expression tableto transactional format
Load data into table GS_EXPRESSIONUse Affy packageexec affy.affy_to_trans('GS_EXPRESSION');
OUTPUT - will be in the table AFFY_TRANS
![Page 19: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/19.jpg)
Normal format
![Page 20: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/20.jpg)
Transactional Format (72*700)
![Page 21: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/21.jpg)
PLSQL Affy_to_Trans
![Page 22: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/22.jpg)
ODM will give stats
![Page 23: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/23.jpg)
And Histograms
![Page 24: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/24.jpg)
For Transactional & Flat Tables
![Page 25: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/25.jpg)
Histogram for One Gene
![Page 26: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/26.jpg)
Attribute Importance Target is ALL or AML
This is Minimum Distance Length (MDL) algorithm
![Page 27: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/27.jpg)
ODM GUI or plsql API
![Page 28: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/28.jpg)
Top Genes by MDL (no normalization)
![Page 29: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/29.jpg)
Classification & Clustering (ODM)
Transactional or Flat tablesSVM, Naïve Bayes, Adaptive BayesJava, Plsql, APIJava based GUIAdvanced K-means, Orthogonal ClusteringLift and ROC curves
![Page 30: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/30.jpg)
10g Statistics & SQL AnalyticsRanking functions
– rank, dense_rank, cume_dist, percent_rank,ntile
Window Aggregate functions (moving and cumulative)
– Avg, sum, min, max, count, variance, stddev, first_value, last_value
LAG/LEAD functions– Direct inter-row reference using offsets
Reporting Aggregate functions– Sum, avg, min, max, variance, stddev, count,
ratio_to_report
Statistical Aggregates– Correlation, linear regression family, covariance
Linear regression– Fitting of an ordinary-least-squares regression
line to a set of number pairs. – Frequently combined with the COVAR_POP,
COVAR_SAMP, and CORR functions.
Descriptive Statistics– average, standard deviation, variance, min, max, median
(via percentile_count), mode, group-by & roll-up– DBMS_STAT_FUNCS: summarizes numerical columns
of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median,quantile values, +/- 3 sigma values, top/bottom 5 values
Correlations– Pearson’s correlation coefficients, Spearman's and
Kendall's (both nonparametric).
Cross Tabs– Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa
Hypothesis Testing– Student t-test , F-test, Binomial test, Wilcoxon Signed
Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
Distribution Fitting– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-
Squared Test, Normal, Uniform, Weibull, Exponential
Pareto Analysis (documented)– 80:20 rule, cumulative results table
![Page 31: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/31.jpg)
Analytical Functions with TX format
Correlating cases with casesCorrelating genes with genesCorrelating genes with TARGET (all/aml)T- statisticsANOVA
![Page 32: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/32.jpg)
Correlating Cases-- put caseid’s to correlate in table aftran2 create view aftran2 as select distinct(caseid) from affy_trans;
create table pcor1 asselect a.caseid p1, b.caseid p2, corr(a.attr_value,
b.attr_value) corrfrom affy_trans a, affy_trans bwhere a.caseid < b.caseid and
a.attrib = b.attrib anda.caseid in (select * from aftran2)
group by a.caseid, b.caseidhaving corr(a.attr_value, b.attr_value) > .93 orcorr(a.attr_value, b.attr_value) < -.93;
![Page 33: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/33.jpg)
199 cases correlated > .93
![Page 34: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/34.jpg)
Correlating Genes
select a.attrib p1, b.attrib g2, corr(a.attr_value, b.attr_value) corr
from affy_trans a, affy_trans bwhere a.attrib < b.attrib and
a.seqnum = b.seqnum anda.attrib = 'X95735_at‘ --- zyxin
group by a.attrib, b.attribhaving corr(a.attr_value, b.attr_value) > .5 orcorr(a.attr_value, b.attr_value) < -.5;
![Page 35: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/35.jpg)
208 genes correlate with zyxinacross 72 patients
![Page 36: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/36.jpg)
Correlating Genes with TARGET
select a.attrib p1, b.attrib g2, corr(a.attr_value, b.attr_value) corr
from affy_trans a, affy_trans bwhere a.attrib < b.attrib and
a.seqnum = b.seqnum anda.attrib = ‘TARGET‘ --- AML/ALL
group by a.attrib, b.attribhaving corr(a.attr_value, b.attr_value) > .5 orcorr(a.attr_value, b.attr_value) < -.5;
![Page 37: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/37.jpg)
57 genes correlate with TARGET
![Page 38: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/38.jpg)
R Correlationslibrary(RODBC)chan1 <- odbcConnect(dsn="")odbcGetInfo(chan1) #make sure channel is ok# get gene expression data from db, but drop gene namegs1 <- sqlQuery(chan1, query = "create table gs1 as SELECT *
FROM GS_EXPRESSION")gs1 <- sqlQuery(chan1, query = "alter table gs1 drop column gene")gs <- sqlQuery(chan1, query = "SELECT * FROM GS1")# get correlation matrixcm <- cor(gs,use="pairwise.complete.obs")# Write the new table to a file.write.table( cm, file="D:\\clients\\affy\\gs_cm.csv", append = FALSE,
quote = FALSE, sep = ",", eol = "\n", na = "", dec = ".", row.names = T, col.names = T )
heatmap(cm, Rowv=NA, Colv=NA,symm=TRUE)
![Page 39: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/39.jpg)
Case Correlation Matrix
![Page 40: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/40.jpg)
Heatmap from R Correlation Matrix
![Page 41: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/41.jpg)
Other Statistics
Corr_s - Spearman’s rho correlation coef. Corr_k - Kendall's tau-b correlation coef.stats_t_test_indep - equal variancestats_t_test_indepu - unequal variancestats_one_way_anova
![Page 42: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/42.jpg)
T statistics for each gene (tx format)Create table t_stats asSELECT a.attrib, count(*) cnt,avg(a.attr_value) avg_atr,
avg(b.attr_value) avg_trg,STATS_T_TEST_INDEP(b.attr_value, a.attr_value,
'STATISTIC') t_observed,STATS_T_TEST_INDEP(b.attr_value, a.attr_value)*7130
two_sided_p_value FROM affy_trans A, affy_trans bWHERE a.attrib in (select * from aftran1) and
b.attrib = 'TARGET' and a.caseid = b.caseid
group by a.attrib;
![Page 43: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/43.jpg)
T-statistic output table (Bonferroni correction)
![Page 44: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/44.jpg)
F-distribution one way ANOVAdrop table anova;create table anova asSELECT a.attrib, count(*) cnt,avg(a.attr_value) avg_atr,
avg(b.attr_value) avg_trg,STATS_ONE_WAY_ANOVA(b.attr_value, a.attr_value,
'F_RATIO') f_ratio,STATS_ONE_WAY_ANOVA(b.attr_value, a.attr_value,
'SIG')*7130 p_valueFROM affy_trans A, affy_trans bWHERE a.attrib in (select * from aftran1) and b.attrib =
'TARGET' anda.caseid = b.caseid
group by a.attrib;
![Page 45: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/45.jpg)
Top genes by ANOVA
![Page 46: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/46.jpg)
R for Plotting and Visualization
#### get data and plot all variablesa1 <- sqlQuery(chan1, query = "select * from anova")plot(a1)
![Page 47: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/47.jpg)
Scatterplot Matrix in R
![Page 48: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/48.jpg)
Histograms of Expression Distribution#Generate Histograms of gene expression cases, with labels and cut offs
pam1 = 40 #Barspam2 = 5000 #Ceiling pam3 = -1000 #floorwx = 1000 #widthhx = 714 #floorN <- ncol(csv) # gene expression data is in csv N = 3 # do only 3 histogramsR <- nrow(csv)nom<- attr(csv, "names")par(mfrow = c(N-1,1))b <- 0:pam1c = pam1/(pam2 - pam3)b <- b/cb <- b + pam3for( num in 2:N) {
h1 <- csv[,num]h1[ h1>pam2] <- pam2h1[ h1<pam3] <- pam3h<- hist(h1,nclass=pam1,breaks=b, main = paste("Histogram of" , nom[num],"clamp
at",pam2,pam3), xlab = nom[num], col=5)}
![Page 49: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/49.jpg)
3 columns of expression dist.
![Page 50: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/50.jpg)
Histogram – no labelsN <- ncol(csv) # number of columnspam1 = 40 #Barspam2 = 5000 #Ceiling pam3 = -1000 #floorR <- nrow(csv)#make the break pointsb <- 0:pam1c = pam1/(pam2 - pam3)b <- b/cb <- b + pam3par(mfrow = c(N,1),mar=c(0,0,0,0)) # space on graphfor( num in 2:N) {
h1 <- csv[,num]h1[ h1>pam2] <- pam2h1[ h1<pam3] <- pam3h<- hist(h1,breaks=b, main = "", xlab = "",axes=F, col=5)
}
![Page 51: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/51.jpg)
All 72 cases of expression dist.
![Page 52: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/52.jpg)
![Page 53: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/53.jpg)
More?
Many other applications of R BioconductorMany other applications of OracleOther code is available
Oracle Informatics Consulting
![Page 54: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/54.jpg)
Life SciencesDM Workshop
A one day onsite technical session educating organizations on how to leverage one of their most valuable assets to provide insight in the operations of their business, the behavioral patterns of their customers and hidden relationships found deep within corporate data that can have direct impact to the bottom line.
Life SciencesDM Blueprint
A documented technical roadmap providing the organization with the strategy to integrate and deploy Life Sciences technology. This includes recommendations based on feedback from the Life Sciences workshop focusing on source data preparation, mining methodologies and supporting architecture.
Life SciencesDM Insight
A five day onsite engagement focused on providing a detailed analysis of the business problem, data preparation, model build and analysis and knowledge deployment extending the analysis of the Life Sciences workshop culminating with a technical roadmap with a strategy to integrate and deploy Life Sciences technology.
Life SciencesDMQuickstart
A thirty day engagement focused on taking a business problem and transforming into a Life Sciences solution. This includes transforming the business problem, preparing e data, creation of the mining model and knowledge deployment. Upon completion, results will be delivered mapped to the initial business problem.
Life SciencesDM Services
A series of custom services focused on delivering Life Sciences methodologies and solutions to provide insight in the operations of their business, the behavioral patterns of their customers and hidden relationships found deep within corporate data that can have direct impact to the bottom line.
![Page 55: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/55.jpg)
Life Science Informatics experience
Gene expression analysisSequence Analysis (blast exon/intron prediction)Clinical/Medical data analysisQSAR/Cheminformatics – Isis,Molconz, Predictive ToxAnimal StudiesProtein analysis (arrays, Mass spec)Ontology's and Text Mining
![Page 56: Statistical Analysis of Gene Expression Data With Oracle ... · PDF fileStatistical Analysis of Gene Expression Data With Oracle & “R” (- data mining) Patrick E. Hoffman Sc.D](https://reader031.vdocuments.us/reader031/viewer/2022022004/5aac9ad67f8b9a2e088d33ae/html5/thumbnails/56.jpg)
Data Mining & Informatics Services
Contact Richard Solari
508-477-5765
630-561-9950
Contact Patrick Hoffman
781-744-0783
617-755-6740