responsenet esti yeger-lotem, laura riva, linhui julie su, aaron d gitler, anil g cashikar, oliver d...

36
ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan, David R Karger, Susan Lindquist & Ernest Fraenkel Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity, Nature Genetics 2009 Presented by Shahar Zini June 2013

Upload: amos-blake

Post on 18-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

ResponseNetEsti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King,

Pavan K Auluck, Melissa L Geddie, Julie S Valastyan, David R Karger, Susan Lindquist & Ernest Fraenkel

Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity,

Nature Genetics 2009Presented by Shahar Zini

June 2013

Page 2: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Introduction

• Gene expression profiling alone reveals very little about the inner workings of the cell and the mechanism involved.• In the last semester we’ve seen many efforts to combine data from

gene expression and protein-protein/protein-DNA knowledge in order to reveal more about the processes that occur within the cell.• ResponseNet is a method which adds another high-throughput

experimental data into the mix: genetic screening.

Page 3: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Genetic screening

• A method which involves alteration or deletion of an organism’s gene products and the examination of the phenotype of the resulting organism.• Phenotype: the composite of an organism’s observable characteristics• The process:

• Pick-up an healthy organism• Delete/alter/overexpress a gene or a group of genes• Test the resulting organism’s phenotype against a desirable characteristic (called

“scoring”)

• For example: delete the MEP1 gene in a yeast cell and check for impaired growth.

Page 4: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Genetic screening

• This process can be used as a high-throughput procedure, meaning one could test alterations of 1000x of genes for numerous desired characteristics.• The set of genes found to affect a process is called “genetic hit”.• Can reveal an answer to some questions like: “What genes, if

removed, will cause a yeast cell to die?”• Just like gene expression: lots of data, yet, does not reveal a lot about

the procedures the took place inside the cell to cause the observed phenotype.

Page 5: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Page 6: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Motivation

• Cells respond to stimuli by changes in various processes. • Efforts to identify components of these responses using both methods

(gene expression and genetic screening) yields different sets of genes.• Genetic screens tend to identify response regulators.• Gene expression tend to identify metabolic responses.• Combining the data together may reveal something deeper about the

processes of the cell.• It is biologically plausible that some of the reported response regulators

will be connected through regulatory pathways to differently expressed genes.

Page 7: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Page 8: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

ResponseNet

• Identify molecular interaction paths connecting genetic hits and differentially expressed genes.• Create a graph containing 2 types of nodes: genes and proteins.• Genetic hits represented as their corresponding protein

• Fill the graph with (mostly bi-directional) edges connecting protein to another protein if they have a known interaction• Fill the graph with directional edges connecting a protein node to a

gene node if the protein is probable transcriptional regulator of the gene.

Page 9: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Genetic hit

Differentially expressed gene

Protein

Page 10: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

The signaling and regulatory sub-network, by which stimulus-related proteins detected by genetic screens may lead to the measured transcriptomic response.

Stimulus-related proteins

stimulus-related genes

intermediary proteins

TFs

Page 11: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Min-cost flow

• Represent the problem as a flow problem.• Each edge’s weight set to the negative log of it’s probability

• Try to pass flow from the genetic hits to the differently expressed genes using regulatory proteins.

• Strong flow => Highly regulated path => Important component in the process• Rank nodes based on amount of flow to distinguish core component of

the cellular response and peripherals.

Page 12: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Genetic hit

Differentially expressed gene

Protein

Source

Sink

Page 13: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Page 14: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Weighting scheme – protein-protein

• There are K types of evidence for a protein-protein interaction• For example, “Two-hybrid” connection can be tested using direct experiment

and using high-throughput experiment• We trust the direct experiment more

• We are interested in finding regulatory pathways, and thus we would like to emphasis interactions between proteins that function together in a common response pathway.

Page 15: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Weighting scheme – protein-protein

• For each pair of proteins A and B we construct a vector X = (x1,x2…,xk)• Xi is 1 There is evidence of type i that A and B are connected• wab = P(a and b are connected | X)• By bayes’ rule:

• = P(X | a and b are connected) * P(a and b are connected) / P(X)• P(X) = P(X | a and b are connected) * P(a and b are connected)

+ P(X | a and b are not connected) * P(a and b are not connected)• Assume independence between types of evidence:• P(X | a and b are [not] connected) = P(Xi| a and b are [not] connected)

Page 16: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Weighting scheme – protein-protein

• In order to calculate values using the equations derived we need to calculate the prior distributions P(Xk | a and b are [not] connected) • They used 3 sets of data:• A set of response pathways• A positive set containing all interacting protein pairs in a common response

pathway based on reliable GO annotation (direct assays or expert knowledge)• A negative set of interacting protein pairs known not to be in a common

response pathway

• They capped the weight to 0.7 because some interactions received a score close to 1 (probably well studied proteins)

Page 17: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Weighting scheme – protein-gene

• They used data obtained using “Chip-chip” and “Chip-chip motif” methods.• Those experiments are used to detect transcription factors for genes. • The weighting scheme is irrelevant, but it was set so that the weight

of each edge would reflect the interaction’s reliability.

Page 18: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Flow problem details• For each genetic hit, the weight and capacity of the edge from the source to the

node is set to:

• Strength is determined by results of the experiment• For each edge connecting a differentially expressed gene to the sink, set the weight

and capacity to:

• Strength is determined by fold-change or p-value of the experiment’s results• All other capacities are set to 1 (and weight using the weighting scheme)

Source

Sink

Page 19: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Linear programming

• Min-cost flow optimization can be solved efficiently using linear programming• The objective function used included a nice twist, so we would

investigate it further:

• The first term is what you would expect from min-cost-flow problem• Reduce the sum of cost*flow in all edges

• The second term is used to control the size of the input connected by flow

Page 20: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

• Smaller values will result in an empty flow (any flow will increase the cost)• A value of 0 means you will only allow for perfect edges (weight=1)• As the value increases, you give room to more and more

imperfection, and allow for lower-probabilities edges to be added• Adding more edges => better coverage• Meaning, we can adjust the coverage of the solution and so the

conflict between coverage and accuracy continues.

Page 21: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Page 22: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Results

• Validation of algorithm results on well-studied response pathways• STE5 deletion• DNA Damage

• Broader validation on less-studied response pathways• Alpha-synuclein toxicity

Page 23: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

STE5 deletion• A protein that coordinates a MAPK cascade activated by pheromone• Multi-tier pathway activated by the presence of pheromone• Simple and well-studied response pathway

• Used a set of genetic hits associated with STE5 and differentially expressed genes from a yeast strain lacking STE5

Page 24: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Page 25: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

DNA Damage

• There are many mechanisms in the cell which identifies and fixes damage created to the DNA.• A much more complex pathway, but still highly-studied• The input is very large:• 91 genetic hits• 9 differentially expressed genes (exposed to mutagen)

• Large output

Page 26: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Page 27: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Results – DNA Damage

• Contains 361 edges between 166 intermediate proteins• Highly enriched for DNA damage stimulus (p<10-14) and DNA repair

(p<10-14)

Page 28: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

MEC1, RAD53,RFC2,RFC3,RFC4 and RFC5 are essential genesAnd thus could not have been detected via genetic screening

Some of highly ranked parts of the network are proteins thatHave been uncovered by years of intense research. (MEC1,RAD53, RFC2,RFC3,RFC4,RFC5, and RFX1)

Page 29: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Broader experiments

• In order to evaluate the algorithm’s results better a broader evaluation took place• The writers used data from 101 different perturbation, each involved

the inactivation of a single gene • In most cases, the response pathways are poorly studied• The genetic hits contained the genetic interactors of the inactive gene• The differentially expressed genes are based on microchip profiling of the

inactivated strain

Page 30: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Broader experiments

• In order to validate the results, the identity of the inactivated gene was hidden from the algorithm • The result of each test were considered successful if:

• They included the inactivated gene of the specific test and it was ranked significantly well

• They were significantly enriched for a specific biological process attributed to the specific gene

• Significance was tested by creating a randomized inputs for each test:• 50 times the genetic hits inputs were randomized• 50 times the differentially expressed genes were randomized• A significant solution was one which was better (in ranking / enrichment level) than

95% of the randomized solutions

Page 31: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Broader experiments – results

• Finding the inactivated gene was successful in 25 of the tests• In 8 of the tests the results contained the inactivated gene but it was not scored

high enough

• Finding enriched results where successful in 28 of the tests• The success rate for cases based on incomplete genetic hits was 40% and

under full genetic screens it was 47% => An indication that the algorithm works well even under very limited data sets• A typical result covered only 1% of the yeast genes• We can infer that both criteria were important• Overall: 41% success (considered high because of the low quality data)

Page 32: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Alpha-synuclein toxicity

• A protein known to be a key component to Parkinson disease.• Despite intense research, the cellular pathways leading to cell death are just

beginning to get uncovered.• Genetic screen experiment detected 77 proteins• No significant overlap between genetic hits and transcriptional profiling results

were found, meaning, we have lots of data, but no of to explain it• Applying ResponseNet revealed a sub-network connecting 34 genetic hits and

166 differentially expressed genes shading light on the inner processes effecting alpha-synyclein.• Some of the conclusions were tested in the lab, and confirmed.• Real contribution to the understanding of the Parkinson disease!

Page 33: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Freedom of data

• The authors complained that many experiments are taking place world-wide but only a few release source codes and complete data sets on the internet• So, they decided to open their algorithm and method to the world by

creating a web page that anyone can upload data into and receive results in minutes• One can use yeast data (as used in the article) or can upload it’s own

data representing the protein-protein and protein-gene relations in a different organism

Page 34: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Page 35: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Any Questions?

Page 36: ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,

Questions

• What is the motivation to connect genetic hits and differentially expressed genes?• Why do the authors capped the weight of each edge? What would

happen they would have skipped this stage?• What would happen if we set the gamma parameter to value which is

too high?

Please send answers to my mail (shshzi at gmail)