introduction to igraph and shiny
TRANSCRIPT
An Introduction to Graphs
Chris Hammill
2015-04-01
Chris Hammill An Introduction to Graphs 2015-04-01 1 / 47
About Me
Graduate Student in BiologyBioinformatics Research AssistantR AfficianadoData Analysis/Visualization ContractorAlumnus of this course
Chris Hammill An Introduction to Graphs 2015-04-01 2 / 47
Why I’m Here
Talk about my research
Teach you a bit about graphs
Introduce you to some useful packages
Get you excited about interactive analysis
Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47
Why I’m Here
Talk about my research
Teach you a bit about graphs
Introduce you to some useful packages
Get you excited about interactive analysis
Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47
Why I’m Here
Talk about my research
Teach you a bit about graphs
Introduce you to some useful packages
Get you excited about interactive analysis
Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47
Why I’m Here
Talk about my research
Teach you a bit about graphs
Introduce you to some useful packages
Get you excited about interactive analysis
Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47
Outline
Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources
Chris Hammill An Introduction to Graphs 2015-04-01 4 / 47
This presentation was written in R Markdown!
The slides and code will be made available via D2L
Chris Hammill An Introduction to Graphs 2015-04-01 5 / 47
Outline
Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources
Chris Hammill An Introduction to Graphs 2015-04-01 6 / 47
So What Are Graphs?
0
25
50
75
100
0 10 20 30 40 50x
y
This?
Chris Hammill An Introduction to Graphs 2015-04-01 7 / 47
So What Are Graphs?
0
25
50
75
100
0 10 20 30 40 50x
y
Nope!
Chris Hammill An Introduction to Graphs 2015-04-01 8 / 47
So What Are Graphs
Graphs are a formal system for representing connections between thingsGraphs are composed of nodes (or vertices) and edges (connections)Edges can be weighted or unweighted, directed or notGraphs have recently been rebranded as networks
Chris Hammill An Introduction to Graphs 2015-04-01 9 / 47
So What Are Graphs?
1
23
4
56
78
9
10
So This?
Chris Hammill An Introduction to Graphs 2015-04-01 10 / 47
So What Are Graphs
1
2
3
4
5
6
7
8
9
10
Yup!
Chris Hammill An Introduction to Graphs 2015-04-01 11 / 47
Graphs in Math
Graphs were first described by Euler (of e fame)
-The bridges of Konigsberg
The name graph is due Sylvester (1878) which is widely consideredfrustrating
Chris Hammill An Introduction to Graphs 2015-04-01 12 / 47
Graphs For the Rest of Us
Graphs were brought out of the math domain primarily by socialscientistsFor example Sampson (1968) did a social network analysis on monks ina monastery identifying social dynamics
Chris Hammill An Introduction to Graphs 2015-04-01 13 / 47
But More Importantly
Chris Hammill An Introduction to Graphs 2015-04-01 14 / 47
And
Chris Hammill An Introduction to Graphs 2015-04-01 15 / 47
And
Chris Hammill An Introduction to Graphs 2015-04-01 16 / 47
So
Graphs are everywhere
Social Networks? Graphs
Internet? Graph
Metabolic pathways? Graphs
Due to this amazing generality, graph based representationsand algorithms can be incredibly useful for both exploration andinference
Chris Hammill An Introduction to Graphs 2015-04-01 17 / 47
What Can We Learn From Graphs?
Disclaimer: I’m still learning plenty about what can be done using graphs, sothis section will be necessarily over simplified.
Typically graphs are used to answer questions about the nature of itsconnections (although graph representations can be used to carry outimmensely complex calculations as well; as you might have noticedwhen you learned about artificial neural networks)Typical questions include:
1 Where are the hubs (highly connected nodes)?2 Can the graph be subdivided into clusters or communities?3 Are there unexpected connections?
But as with any data representation you’re usually limited by your ability toask interesting questions, not the representations ability to answer them
Chris Hammill An Introduction to Graphs 2015-04-01 18 / 47
Graph Properties
Degree DistributionDegree is the number of edges a node hasThe distribution of degrees in a graph is interesting and can hint at theprocess generating the graph
DiameterWhat is the longest direct path between two nodes
Average PathWhat is the average path length between two nodes
Chris Hammill An Introduction to Graphs 2015-04-01 19 / 47
Outline
Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources
Chris Hammill An Introduction to Graphs 2015-04-01 20 / 47
Creating and Using Graphs
Manipulating graphs with R is typically done with the igraph package,so let’s try it out:
First Off, install igraph and attach it with the usual code
install.packages("igraph")library(igraph)
Chris Hammill An Introduction to Graphs 2015-04-01 21 / 47
Create a Random GraphFor exploration sake, lets generate a random graph (An Erdos-Renyirandom graph)
randomGraph <- erdos.renyi.game(20, 0.2)plot(randomGraph)
12 3
4
5
6
7
8
9
10
11 12
13
14
15
16
17
18
19
20
Chris Hammill An Introduction to Graphs 2015-04-01 22 / 47
Summary StatisticsDegree
hist(degree(randomGraph))
Histogram of degree(randomGraph)
degree(randomGraph)
Frequency
2 4 6 8
01
23
45
Chris Hammill An Introduction to Graphs 2015-04-01 23 / 47
Summary Statistics
Diameter
diameter(randomGraph)
## [1] 4
Path Length
average.path.length(randomGraph)
## [1] 2.052632
Chris Hammill An Introduction to Graphs 2015-04-01 24 / 47
Other Useful Commands
# Pull out all the VerticesV(graph)
# Pull out all the EdgesE(graph)
#Change a component of the edges (or vertices)E(graph)$weight <- newWeights
#Get all node pairsget.edgelist(graph)
#Compute the adjacency matrixget.adjacency(graph)
Chris Hammill An Introduction to Graphs 2015-04-01 25 / 47
Outline
Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources
Chris Hammill An Introduction to Graphs 2015-04-01 26 / 47
Switching gears
Lets talk about exploratory analysis
Chris Hammill An Introduction to Graphs 2015-04-01 27 / 47
Interactivity
A typical first pass of data analysis involves:
1 Visualizing your data2 Searching for hypotheses to test3 Tuning parameters and repeating steps 1 and 2
You will waste untold hours (if you pursue science) doingguess-and-check plot parameter tuningYou will grow weary in your search and likely settle for less thanoptimal choices
Why not take the guess work out and make it faster toexplore parameter space
Chris Hammill An Introduction to Graphs 2015-04-01 28 / 47
Enter Shiny
Shiny is a framework developed by the people at R Studio to bringinteractivity to R
Provides a tool to bring your analyses into the modern age
Not to mention the benefit in presenting your analyses to non-expertswhen they can see for themselves how parameters affect the results.
Slightly frustrating interface, but very little new needs to be learned
Chris Hammill An Introduction to Graphs 2015-04-01 29 / 47
So How Does Shiny Work
A shiny app is composed of (at least) two files
1 server.R2 UI.R
server.R is responsible for performing the calculations in the appUI.R is responsible for coordinating input from the user and outputfrom the server
Chris Hammill An Introduction to Graphs 2015-04-01 30 / 47
Minimal Example
server.Rlibrary(shiny)
shinyServer(function(input, output){output$quadraticPlot <- renderPlot({
x <- seq(-2,2, length.out = 500)y <- input$a * x^2 + input$b * x + input$cplot(y ~ x,
xlim = c(-2,2),ylim = c(-2,4),type = "l")
})})
Chris Hammill An Introduction to Graphs 2015-04-01 31 / 47
Minimal Example
UI.Rlibrary(shiny)
shinyUI(fluidPage(
sliderInput("a", "a", min = -2L, max = 2L, value = 1),sliderInput("b", "b", min = -1L, max = 1L, value = 0),sliderInput("c", "c", min = -2L, max = 2L, value = 0),plotOutput("quadraticPlot")
))
Chris Hammill An Introduction to Graphs 2015-04-01 32 / 47
A Not So Minimal Example
Pedigree
Addisons_CompIBD_AI
Thyroid_Disease_AI
CVD_Comp
dyslipidemia_Comp
heart_disease_Comp
blood_pressure_Comp
nerve_damage_Compretinopathy_Comp
DKA_Comp
Hyperglycemia_Comp
Hypoglycemia_Comp
diabetes_nurse
diabetes_specialist
dietician
GPnephrologist_new
opthalmologist
cardiologist
podiatrist
Ace_inhibitor
Statin
addiction
anxiety_MH
depression_MH
Cholesterol_HDL_ratio
Creatinine
Glucose_Fasting
Glucose_Random
Hgb_A1C
M_C_Ratio
TSH
TTG
GenderWeight
Smoke
Pneumococcal_Vax
Excercise
Health_Rating
Diabetes_Management_Rating
Rating_Of_Health_Care
DKA_ER
Dialysis
DOBDiagnosis_Date
Insulin_started
DKA_Diagnosis
Ketones_Diagnosis
Weight_Loss_Symptom
bedwetting_Symptom
Breast_Fed
Sister_T1D
Father_T1D
Paunt_T1D
Puncle_T1D
Thyroid_Disease_FH
Hypertension_FH
Retinopathy_Diagnosis
Microalb_DiagnosisNephropathy_DiagnosisNeuropathy_Diagnosis
Unknown_HospitalizationsDKA_Hospitalizations_Old
other_hospitalizations
cd1d_rs3754471
cd1d_rs859009
ctla4_rs1863800
ctla4_mh30
ctla4_a49g
ctla4_ct60g_ga
ctla4_jo31g
ctla4_jo27tc
ccr2_v64i_ga
ccr5_a676g
wolf_611ag
dob_ga
sumo4_rs237025
sumo4_rs237012
adrb1_gains_67ag
vdr_rs2544038 vdr_rs2408876
pld2_rs3764900
nos2a_rs4796017
nos2a_rs2248814
BCL2_c8687299
ptpns1_rs6075340
ptpns1_rs6111988
ptpns1_rs1884565
ptpns1_rs2267916
amel
amel_new
mit_nt7028
nos2a
−50
0
50
−80 −40 0 40 80
−log(p)
10
20
30
dataSet
gen
new
old
Pedigree
Number of Observations
40
60
80
100
Chris Hammill An Introduction to Graphs 2015-04-01 33 / 47
Outline
Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources
Chris Hammill An Introduction to Graphs 2015-04-01 34 / 47
Diabetes Project
Attempting to predict health outcomes for Newfoundlanders sufferingfrom type one diabetes mellitusData from a large cohort of diabetes patents gathered ~10 years agoHeterogenous mix of data sources, types, and completenessLots of data cleaning
Chris Hammill An Introduction to Graphs 2015-04-01 35 / 47
The Datathree major data sources
1 Diabetes databasecontains information about 631 study participants at the time of studystart
2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)
3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants
This analysis is only concerned with the individuals for whom we haveupdated information
After cleaning 300 features exist for the participants
Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47
The Datathree major data sources
1 Diabetes databasecontains information about 631 study participants at the time of studystart
2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)
3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants
This analysis is only concerned with the individuals for whom we haveupdated information
After cleaning 300 features exist for the participants
Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47
The Datathree major data sources
1 Diabetes databasecontains information about 631 study participants at the time of studystart
2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)
3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants
This analysis is only concerned with the individuals for whom we haveupdated information
After cleaning 300 features exist for the participants
Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47
The Datathree major data sources
1 Diabetes databasecontains information about 631 study participants at the time of studystart
2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)
3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants
This analysis is only concerned with the individuals for whom we haveupdated information
After cleaning 300 features exist for the participants
Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47
The Datathree major data sources
1 Diabetes databasecontains information about 631 study participants at the time of studystart
2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)
3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants
This analysis is only concerned with the individuals for whom we haveupdated information
After cleaning 300 features exist for the participants
Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47
Analysis Approach
Considering each feature how well does it correlate to the rest of thefeatures
Pairwise correlation measures can be treated as a distance measurebetween features
Correlations can be filtered by signficance level
Each significant correlation can be viewed as an edge connecting thetwo features
Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47
Analysis Approach
Considering each feature how well does it correlate to the rest of thefeatures
Pairwise correlation measures can be treated as a distance measurebetween features
Correlations can be filtered by signficance level
Each significant correlation can be viewed as an edge connecting thetwo features
Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47
Analysis Approach
Considering each feature how well does it correlate to the rest of thefeatures
Pairwise correlation measures can be treated as a distance measurebetween features
Correlations can be filtered by signficance level
Each significant correlation can be viewed as an edge connecting thetwo features
Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47
Analysis Approach
Considering each feature how well does it correlate to the rest of thefeatures
Pairwise correlation measures can be treated as a distance measurebetween features
Correlations can be filtered by signficance level
Each significant correlation can be viewed as an edge connecting thetwo features
Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47
Creating the Graph
Challenge in going from
Spread Sheet Representationhead(bigtable[25:28,c(1,21,23, 41)])
## Pedigree dietician_new nephrologist_new Hgb_A1C_new## 25 93001 0 0 8.7## 26 94001 3 0 10.2## 27 101001 0 0 9.2## 28 105001 0 0 13.7
Chris Hammill An Introduction to Graphs 2015-04-01 38 / 47
Pedigree
Addisons_CompIBD_AI
Thyroid_Disease_AI
CVD_Comp
dyslipidemia_Comp
heart_disease_Comp
blood_pressure_Comp
nerve_damage_Compretinopathy_Comp
DKA_Comp
Hyperglycemia_Comp
Hypoglycemia_Comp
diabetes_nurse
diabetes_specialist
dietician
GPnephrologist_new
opthalmologist
cardiologist
podiatrist
Ace_inhibitor
Statin
addiction
anxiety_MH
depression_MH
Cholesterol_HDL_ratio
Creatinine
Glucose_Fasting
Glucose_Random
Hgb_A1C
M_C_Ratio
TSH
TTG
GenderWeight
Smoke
Pneumococcal_Vax
Excercise
Health_Rating
Diabetes_Management_Rating
Rating_Of_Health_Care
DKA_ER
Dialysis
DOBDiagnosis_Date
Insulin_started
DKA_Diagnosis
Ketones_Diagnosis
Weight_Loss_Symptom
bedwetting_Symptom
Breast_Fed
Sister_T1D
Father_T1D
Paunt_T1D
Puncle_T1D
Thyroid_Disease_FH
Hypertension_FH
Retinopathy_Diagnosis
Microalb_DiagnosisNephropathy_DiagnosisNeuropathy_Diagnosis
Unknown_HospitalizationsDKA_Hospitalizations_Old
other_hospitalizations
cd1d_rs3754471
cd1d_rs859009
ctla4_rs1863800
ctla4_mh30
ctla4_a49g
ctla4_ct60g_ga
ctla4_jo31g
ctla4_jo27tc
ccr2_v64i_ga
ccr5_a676g
wolf_611ag
dob_ga
sumo4_rs237025
sumo4_rs237012
adrb1_gains_67ag
vdr_rs2544038 vdr_rs2408876
pld2_rs3764900
nos2a_rs4796017
nos2a_rs2248814
BCL2_c8687299
ptpns1_rs6075340
ptpns1_rs6111988
ptpns1_rs1884565
ptpns1_rs2267916
amel
amel_new
mit_nt7028
nos2a
−50
0
50
−80 −40 0 40 80
−log(p)
10
20
30
dataSet
gen
new
old
Pedigree
Number of Observations
40
60
80
100
Chris Hammill An Introduction to Graphs 2015-04-01 39 / 47
Producing the Base Graph
Convert to a distance matrixbt <- pCorrelationMatrix(bigtable)
Convert To Adjacency MatrixadjacencyMat <- bt < threshold
Create an Igraph Objectnetwork <- igraph.adjacency(adjacencyMat)
Chris Hammill An Introduction to Graphs 2015-04-01 40 / 47
Converting the Igraph to a data.frame
Create a data.frame of vectices
getVertices <- function(graph, vertexNames = NULL){vertices <- as.data.frame(layout.fruchterman.reingold(graph))names(vertices) <- c("x","y")vertices$vertexName <- 1:nrow(vertices)if(!is.null(vertexNames)) vertices$vertexName <- vertexNamesvertices$size <- get.vertex.attribute(graph, "weight")
vertices}
Chris Hammill An Introduction to Graphs 2015-04-01 41 / 47
Converting the Igraph to a data.frame
Create a data.frame of edges
getEdges <- function(graph, vertices){edgeLocations <- get.edgelist(graph)edgeCoords <- mapply(function(v1,v2){
c(vertices[v1,], vertices[v2,])}, edgeLocations[,1], edgeLocations[,2])edgeFrame <- as.data.frame(t(edgeCoords))[,c(1,2,5,6)]edgeFrame[,1:4] <- lapply(edgeFrame[,1:4], as.numeric)edgeFrame$weight <- get.edge.attribute(graph, "weight")edgeFrame$npo <- get.edge.attribute(graph, "npo")
names(edgeFrame) <- c("x0", "y0", "x1", "y1", "weight", "npo")
return(edgeFrame)}
Chris Hammill An Introduction to Graphs 2015-04-01 42 / 47
Do Both and Smoosh ’em Together
graph2frame <- function(graph, vertexNames = NULL){vertices <- getVertices(graph, vertexNames)edges <- getEdges(graph, vertices)
names(vertices) <- c("x0","y0", "vertexName", "size")vertices$x1 <- NAvertices$y1 <- NAvertices$weight <- NAvertices$npo <- NAvertices$use <- "vertex"
edges$vertexName <- NAedges$use <- "edge"edges$size <- NA
rbind(vertices, edges)}
Chris Hammill An Introduction to Graphs 2015-04-01 43 / 47
Outline
Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources
Chris Hammill An Introduction to Graphs 2015-04-01 44 / 47
The App
Chris Hammill An Introduction to Graphs 2015-04-01 45 / 47
Resources
Igraph
Ggplot
Shiny
R Markdown
Knitr
Datatables for R
My Blog!
Chris Hammill An Introduction to Graphs 2015-04-01 46 / 47
Thanks For Having Me
Any questions?
Chris Hammill An Introduction to Graphs 2015-04-01 47 / 47