introduction to igraph and shiny

57
An Introduction to Graphs Chris Hammill 2015-04-01 Chris Hammill An Introduction to Graphs 2015-04-01 1 / 47

Upload: chris-hammill

Post on 17-Jul-2015

374 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Introduction To Igraph and Shiny

An Introduction to Graphs

Chris Hammill

2015-04-01

Chris Hammill An Introduction to Graphs 2015-04-01 1 / 47

Page 2: Introduction To Igraph and Shiny

About Me

Graduate Student in BiologyBioinformatics Research AssistantR AfficianadoData Analysis/Visualization ContractorAlumnus of this course

Chris Hammill An Introduction to Graphs 2015-04-01 2 / 47

Page 3: Introduction To Igraph and Shiny

Why I’m Here

Talk about my research

Teach you a bit about graphs

Introduce you to some useful packages

Get you excited about interactive analysis

Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47

Page 4: Introduction To Igraph and Shiny

Why I’m Here

Talk about my research

Teach you a bit about graphs

Introduce you to some useful packages

Get you excited about interactive analysis

Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47

Page 5: Introduction To Igraph and Shiny

Why I’m Here

Talk about my research

Teach you a bit about graphs

Introduce you to some useful packages

Get you excited about interactive analysis

Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47

Page 6: Introduction To Igraph and Shiny

Why I’m Here

Talk about my research

Teach you a bit about graphs

Introduce you to some useful packages

Get you excited about interactive analysis

Chris Hammill An Introduction to Graphs 2015-04-01 3 / 47

Page 7: Introduction To Igraph and Shiny

Outline

Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources

Chris Hammill An Introduction to Graphs 2015-04-01 4 / 47

Page 8: Introduction To Igraph and Shiny

This presentation was written in R Markdown!

The slides and code will be made available via D2L

Chris Hammill An Introduction to Graphs 2015-04-01 5 / 47

Page 9: Introduction To Igraph and Shiny

Outline

Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources

Chris Hammill An Introduction to Graphs 2015-04-01 6 / 47

Page 10: Introduction To Igraph and Shiny

So What Are Graphs?

0

25

50

75

100

0 10 20 30 40 50x

y

This?

Chris Hammill An Introduction to Graphs 2015-04-01 7 / 47

Page 11: Introduction To Igraph and Shiny

So What Are Graphs?

0

25

50

75

100

0 10 20 30 40 50x

y

Nope!

Chris Hammill An Introduction to Graphs 2015-04-01 8 / 47

Page 12: Introduction To Igraph and Shiny

So What Are Graphs

Graphs are a formal system for representing connections between thingsGraphs are composed of nodes (or vertices) and edges (connections)Edges can be weighted or unweighted, directed or notGraphs have recently been rebranded as networks

Chris Hammill An Introduction to Graphs 2015-04-01 9 / 47

Page 13: Introduction To Igraph and Shiny

So What Are Graphs?

1

23

4

56

78

9

10

So This?

Chris Hammill An Introduction to Graphs 2015-04-01 10 / 47

Page 14: Introduction To Igraph and Shiny

So What Are Graphs

1

2

3

4

5

6

7

8

9

10

Yup!

Chris Hammill An Introduction to Graphs 2015-04-01 11 / 47

Page 15: Introduction To Igraph and Shiny

Graphs in Math

Graphs were first described by Euler (of e fame)

-The bridges of Konigsberg

The name graph is due Sylvester (1878) which is widely consideredfrustrating

Chris Hammill An Introduction to Graphs 2015-04-01 12 / 47

Page 16: Introduction To Igraph and Shiny

Graphs For the Rest of Us

Graphs were brought out of the math domain primarily by socialscientistsFor example Sampson (1968) did a social network analysis on monks ina monastery identifying social dynamics

Chris Hammill An Introduction to Graphs 2015-04-01 13 / 47

Page 17: Introduction To Igraph and Shiny

But More Importantly

Chris Hammill An Introduction to Graphs 2015-04-01 14 / 47

Page 18: Introduction To Igraph and Shiny

And

Chris Hammill An Introduction to Graphs 2015-04-01 15 / 47

Page 19: Introduction To Igraph and Shiny

And

Chris Hammill An Introduction to Graphs 2015-04-01 16 / 47

Page 20: Introduction To Igraph and Shiny

So

Graphs are everywhere

Social Networks? Graphs

Internet? Graph

Metabolic pathways? Graphs

Due to this amazing generality, graph based representationsand algorithms can be incredibly useful for both exploration andinference

Chris Hammill An Introduction to Graphs 2015-04-01 17 / 47

Page 21: Introduction To Igraph and Shiny

What Can We Learn From Graphs?

Disclaimer: I’m still learning plenty about what can be done using graphs, sothis section will be necessarily over simplified.

Typically graphs are used to answer questions about the nature of itsconnections (although graph representations can be used to carry outimmensely complex calculations as well; as you might have noticedwhen you learned about artificial neural networks)Typical questions include:

1 Where are the hubs (highly connected nodes)?2 Can the graph be subdivided into clusters or communities?3 Are there unexpected connections?

But as with any data representation you’re usually limited by your ability toask interesting questions, not the representations ability to answer them

Chris Hammill An Introduction to Graphs 2015-04-01 18 / 47

Page 22: Introduction To Igraph and Shiny

Graph Properties

Degree DistributionDegree is the number of edges a node hasThe distribution of degrees in a graph is interesting and can hint at theprocess generating the graph

DiameterWhat is the longest direct path between two nodes

Average PathWhat is the average path length between two nodes

Chris Hammill An Introduction to Graphs 2015-04-01 19 / 47

Page 23: Introduction To Igraph and Shiny

Outline

Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources

Chris Hammill An Introduction to Graphs 2015-04-01 20 / 47

Page 24: Introduction To Igraph and Shiny

Creating and Using Graphs

Manipulating graphs with R is typically done with the igraph package,so let’s try it out:

First Off, install igraph and attach it with the usual code

install.packages("igraph")library(igraph)

Chris Hammill An Introduction to Graphs 2015-04-01 21 / 47

Page 25: Introduction To Igraph and Shiny

Create a Random GraphFor exploration sake, lets generate a random graph (An Erdos-Renyirandom graph)

randomGraph <- erdos.renyi.game(20, 0.2)plot(randomGraph)

12 3

4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

19

20

Chris Hammill An Introduction to Graphs 2015-04-01 22 / 47

Page 26: Introduction To Igraph and Shiny

Summary StatisticsDegree

hist(degree(randomGraph))

Histogram of degree(randomGraph)

degree(randomGraph)

Frequency

2 4 6 8

01

23

45

Chris Hammill An Introduction to Graphs 2015-04-01 23 / 47

Page 27: Introduction To Igraph and Shiny

Summary Statistics

Diameter

diameter(randomGraph)

## [1] 4

Path Length

average.path.length(randomGraph)

## [1] 2.052632

Chris Hammill An Introduction to Graphs 2015-04-01 24 / 47

Page 28: Introduction To Igraph and Shiny

Other Useful Commands

# Pull out all the VerticesV(graph)

# Pull out all the EdgesE(graph)

#Change a component of the edges (or vertices)E(graph)$weight <- newWeights

#Get all node pairsget.edgelist(graph)

#Compute the adjacency matrixget.adjacency(graph)

Chris Hammill An Introduction to Graphs 2015-04-01 25 / 47

Page 29: Introduction To Igraph and Shiny

Outline

Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources

Chris Hammill An Introduction to Graphs 2015-04-01 26 / 47

Page 30: Introduction To Igraph and Shiny

Switching gears

Lets talk about exploratory analysis

Chris Hammill An Introduction to Graphs 2015-04-01 27 / 47

Page 31: Introduction To Igraph and Shiny

Interactivity

A typical first pass of data analysis involves:

1 Visualizing your data2 Searching for hypotheses to test3 Tuning parameters and repeating steps 1 and 2

You will waste untold hours (if you pursue science) doingguess-and-check plot parameter tuningYou will grow weary in your search and likely settle for less thanoptimal choices

Why not take the guess work out and make it faster toexplore parameter space

Chris Hammill An Introduction to Graphs 2015-04-01 28 / 47

Page 32: Introduction To Igraph and Shiny

Enter Shiny

Shiny is a framework developed by the people at R Studio to bringinteractivity to R

Provides a tool to bring your analyses into the modern age

Not to mention the benefit in presenting your analyses to non-expertswhen they can see for themselves how parameters affect the results.

Slightly frustrating interface, but very little new needs to be learned

Chris Hammill An Introduction to Graphs 2015-04-01 29 / 47

Page 33: Introduction To Igraph and Shiny

So How Does Shiny Work

A shiny app is composed of (at least) two files

1 server.R2 UI.R

server.R is responsible for performing the calculations in the appUI.R is responsible for coordinating input from the user and outputfrom the server

Chris Hammill An Introduction to Graphs 2015-04-01 30 / 47

Page 34: Introduction To Igraph and Shiny

Minimal Example

server.Rlibrary(shiny)

shinyServer(function(input, output){output$quadraticPlot <- renderPlot({

x <- seq(-2,2, length.out = 500)y <- input$a * x^2 + input$b * x + input$cplot(y ~ x,

xlim = c(-2,2),ylim = c(-2,4),type = "l")

})})

Chris Hammill An Introduction to Graphs 2015-04-01 31 / 47

Page 35: Introduction To Igraph and Shiny

Minimal Example

UI.Rlibrary(shiny)

shinyUI(fluidPage(

sliderInput("a", "a", min = -2L, max = 2L, value = 1),sliderInput("b", "b", min = -1L, max = 1L, value = 0),sliderInput("c", "c", min = -2L, max = 2L, value = 0),plotOutput("quadraticPlot")

))

Chris Hammill An Introduction to Graphs 2015-04-01 32 / 47

Page 36: Introduction To Igraph and Shiny

A Not So Minimal Example

Pedigree

Addisons_CompIBD_AI

Thyroid_Disease_AI

CVD_Comp

dyslipidemia_Comp

heart_disease_Comp

blood_pressure_Comp

nerve_damage_Compretinopathy_Comp

DKA_Comp

Hyperglycemia_Comp

Hypoglycemia_Comp

diabetes_nurse

diabetes_specialist

dietician

GPnephrologist_new

opthalmologist

cardiologist

podiatrist

Ace_inhibitor

Statin

addiction

anxiety_MH

depression_MH

Cholesterol_HDL_ratio

Creatinine

Glucose_Fasting

Glucose_Random

Hgb_A1C

M_C_Ratio

TSH

TTG

GenderWeight

Smoke

Pneumococcal_Vax

Excercise

Health_Rating

Diabetes_Management_Rating

Rating_Of_Health_Care

DKA_ER

Dialysis

DOBDiagnosis_Date

Insulin_started

DKA_Diagnosis

Ketones_Diagnosis

Weight_Loss_Symptom

bedwetting_Symptom

Breast_Fed

Sister_T1D

Father_T1D

Paunt_T1D

Puncle_T1D

Thyroid_Disease_FH

Hypertension_FH

Retinopathy_Diagnosis

Microalb_DiagnosisNephropathy_DiagnosisNeuropathy_Diagnosis

Unknown_HospitalizationsDKA_Hospitalizations_Old

other_hospitalizations

cd1d_rs3754471

cd1d_rs859009

ctla4_rs1863800

ctla4_mh30

ctla4_a49g

ctla4_ct60g_ga

ctla4_jo31g

ctla4_jo27tc

ccr2_v64i_ga

ccr5_a676g

wolf_611ag

dob_ga

sumo4_rs237025

sumo4_rs237012

adrb1_gains_67ag

vdr_rs2544038 vdr_rs2408876

pld2_rs3764900

nos2a_rs4796017

nos2a_rs2248814

BCL2_c8687299

ptpns1_rs6075340

ptpns1_rs6111988

ptpns1_rs1884565

ptpns1_rs2267916

amel

amel_new

mit_nt7028

nos2a

−50

0

50

−80 −40 0 40 80

−log(p)

10

20

30

dataSet

gen

new

old

Pedigree

Number of Observations

40

60

80

100

Chris Hammill An Introduction to Graphs 2015-04-01 33 / 47

Page 37: Introduction To Igraph and Shiny

Outline

Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources

Chris Hammill An Introduction to Graphs 2015-04-01 34 / 47

Page 38: Introduction To Igraph and Shiny

Diabetes Project

Attempting to predict health outcomes for Newfoundlanders sufferingfrom type one diabetes mellitusData from a large cohort of diabetes patents gathered ~10 years agoHeterogenous mix of data sources, types, and completenessLots of data cleaning

Chris Hammill An Introduction to Graphs 2015-04-01 35 / 47

Page 39: Introduction To Igraph and Shiny

The Datathree major data sources

1 Diabetes databasecontains information about 631 study participants at the time of studystart

2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)

3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants

This analysis is only concerned with the individuals for whom we haveupdated information

After cleaning 300 features exist for the participants

Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47

Page 40: Introduction To Igraph and Shiny

The Datathree major data sources

1 Diabetes databasecontains information about 631 study participants at the time of studystart

2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)

3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants

This analysis is only concerned with the individuals for whom we haveupdated information

After cleaning 300 features exist for the participants

Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47

Page 41: Introduction To Igraph and Shiny

The Datathree major data sources

1 Diabetes databasecontains information about 631 study participants at the time of studystart

2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)

3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants

This analysis is only concerned with the individuals for whom we haveupdated information

After cleaning 300 features exist for the participants

Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47

Page 42: Introduction To Igraph and Shiny

The Datathree major data sources

1 Diabetes databasecontains information about 631 study participants at the time of studystart

2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)

3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants

This analysis is only concerned with the individuals for whom we haveupdated information

After cleaning 300 features exist for the participants

Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47

Page 43: Introduction To Igraph and Shiny

The Datathree major data sources

1 Diabetes databasecontains information about 631 study participants at the time of studystart

2 Genetics Datacontains genotype markers for 591 study participants (and familymembers)

3 2014 Checkup Databasecontains survey data and chart review for ~100 study participants

This analysis is only concerned with the individuals for whom we haveupdated information

After cleaning 300 features exist for the participants

Chris Hammill An Introduction to Graphs 2015-04-01 36 / 47

Page 44: Introduction To Igraph and Shiny

Analysis Approach

Considering each feature how well does it correlate to the rest of thefeatures

Pairwise correlation measures can be treated as a distance measurebetween features

Correlations can be filtered by signficance level

Each significant correlation can be viewed as an edge connecting thetwo features

Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47

Page 45: Introduction To Igraph and Shiny

Analysis Approach

Considering each feature how well does it correlate to the rest of thefeatures

Pairwise correlation measures can be treated as a distance measurebetween features

Correlations can be filtered by signficance level

Each significant correlation can be viewed as an edge connecting thetwo features

Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47

Page 46: Introduction To Igraph and Shiny

Analysis Approach

Considering each feature how well does it correlate to the rest of thefeatures

Pairwise correlation measures can be treated as a distance measurebetween features

Correlations can be filtered by signficance level

Each significant correlation can be viewed as an edge connecting thetwo features

Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47

Page 47: Introduction To Igraph and Shiny

Analysis Approach

Considering each feature how well does it correlate to the rest of thefeatures

Pairwise correlation measures can be treated as a distance measurebetween features

Correlations can be filtered by signficance level

Each significant correlation can be viewed as an edge connecting thetwo features

Chris Hammill An Introduction to Graphs 2015-04-01 37 / 47

Page 48: Introduction To Igraph and Shiny

Creating the Graph

Challenge in going from

Spread Sheet Representationhead(bigtable[25:28,c(1,21,23, 41)])

## Pedigree dietician_new nephrologist_new Hgb_A1C_new## 25 93001 0 0 8.7## 26 94001 3 0 10.2## 27 101001 0 0 9.2## 28 105001 0 0 13.7

Chris Hammill An Introduction to Graphs 2015-04-01 38 / 47

Page 49: Introduction To Igraph and Shiny

Pedigree

Addisons_CompIBD_AI

Thyroid_Disease_AI

CVD_Comp

dyslipidemia_Comp

heart_disease_Comp

blood_pressure_Comp

nerve_damage_Compretinopathy_Comp

DKA_Comp

Hyperglycemia_Comp

Hypoglycemia_Comp

diabetes_nurse

diabetes_specialist

dietician

GPnephrologist_new

opthalmologist

cardiologist

podiatrist

Ace_inhibitor

Statin

addiction

anxiety_MH

depression_MH

Cholesterol_HDL_ratio

Creatinine

Glucose_Fasting

Glucose_Random

Hgb_A1C

M_C_Ratio

TSH

TTG

GenderWeight

Smoke

Pneumococcal_Vax

Excercise

Health_Rating

Diabetes_Management_Rating

Rating_Of_Health_Care

DKA_ER

Dialysis

DOBDiagnosis_Date

Insulin_started

DKA_Diagnosis

Ketones_Diagnosis

Weight_Loss_Symptom

bedwetting_Symptom

Breast_Fed

Sister_T1D

Father_T1D

Paunt_T1D

Puncle_T1D

Thyroid_Disease_FH

Hypertension_FH

Retinopathy_Diagnosis

Microalb_DiagnosisNephropathy_DiagnosisNeuropathy_Diagnosis

Unknown_HospitalizationsDKA_Hospitalizations_Old

other_hospitalizations

cd1d_rs3754471

cd1d_rs859009

ctla4_rs1863800

ctla4_mh30

ctla4_a49g

ctla4_ct60g_ga

ctla4_jo31g

ctla4_jo27tc

ccr2_v64i_ga

ccr5_a676g

wolf_611ag

dob_ga

sumo4_rs237025

sumo4_rs237012

adrb1_gains_67ag

vdr_rs2544038 vdr_rs2408876

pld2_rs3764900

nos2a_rs4796017

nos2a_rs2248814

BCL2_c8687299

ptpns1_rs6075340

ptpns1_rs6111988

ptpns1_rs1884565

ptpns1_rs2267916

amel

amel_new

mit_nt7028

nos2a

−50

0

50

−80 −40 0 40 80

−log(p)

10

20

30

dataSet

gen

new

old

Pedigree

Number of Observations

40

60

80

100

Chris Hammill An Introduction to Graphs 2015-04-01 39 / 47

Page 50: Introduction To Igraph and Shiny

Producing the Base Graph

Convert to a distance matrixbt <- pCorrelationMatrix(bigtable)

Convert To Adjacency MatrixadjacencyMat <- bt < threshold

Create an Igraph Objectnetwork <- igraph.adjacency(adjacencyMat)

Chris Hammill An Introduction to Graphs 2015-04-01 40 / 47

Page 51: Introduction To Igraph and Shiny

Converting the Igraph to a data.frame

Create a data.frame of vectices

getVertices <- function(graph, vertexNames = NULL){vertices <- as.data.frame(layout.fruchterman.reingold(graph))names(vertices) <- c("x","y")vertices$vertexName <- 1:nrow(vertices)if(!is.null(vertexNames)) vertices$vertexName <- vertexNamesvertices$size <- get.vertex.attribute(graph, "weight")

vertices}

Chris Hammill An Introduction to Graphs 2015-04-01 41 / 47

Page 52: Introduction To Igraph and Shiny

Converting the Igraph to a data.frame

Create a data.frame of edges

getEdges <- function(graph, vertices){edgeLocations <- get.edgelist(graph)edgeCoords <- mapply(function(v1,v2){

c(vertices[v1,], vertices[v2,])}, edgeLocations[,1], edgeLocations[,2])edgeFrame <- as.data.frame(t(edgeCoords))[,c(1,2,5,6)]edgeFrame[,1:4] <- lapply(edgeFrame[,1:4], as.numeric)edgeFrame$weight <- get.edge.attribute(graph, "weight")edgeFrame$npo <- get.edge.attribute(graph, "npo")

names(edgeFrame) <- c("x0", "y0", "x1", "y1", "weight", "npo")

return(edgeFrame)}

Chris Hammill An Introduction to Graphs 2015-04-01 42 / 47

Page 53: Introduction To Igraph and Shiny

Do Both and Smoosh ’em Together

graph2frame <- function(graph, vertexNames = NULL){vertices <- getVertices(graph, vertexNames)edges <- getEdges(graph, vertices)

names(vertices) <- c("x0","y0", "vertexName", "size")vertices$x1 <- NAvertices$y1 <- NAvertices$weight <- NAvertices$npo <- NAvertices$use <- "vertex"

edges$vertexName <- NAedges$use <- "edge"edges$size <- NA

rbind(vertices, edges)}

Chris Hammill An Introduction to Graphs 2015-04-01 43 / 47

Page 54: Introduction To Igraph and Shiny

Outline

Introduce graphsIntroduce igraphIntroduce Interactivity with ShinyIntroduce the diabetes projectDemo the diabetes project appOffer resources

Chris Hammill An Introduction to Graphs 2015-04-01 44 / 47

Page 55: Introduction To Igraph and Shiny

The App

Chris Hammill An Introduction to Graphs 2015-04-01 45 / 47

Page 56: Introduction To Igraph and Shiny

Resources

Igraph

Ggplot

Shiny

R Markdown

Knitr

Datatables for R

My Blog!

Chris Hammill An Introduction to Graphs 2015-04-01 46 / 47

Page 57: Introduction To Igraph and Shiny

Thanks For Having Me

Any questions?

Chris Hammill An Introduction to Graphs 2015-04-01 47 / 47