learning regulatory networks that represent regulator states and roles keith noto ([email protected])...
TRANSCRIPT
Learning Regulatory Networks that Represent Regulator States and Roles
Keith Noto ([email protected]) and Mark Craven
K. Noto and M. Craven, Learning Regulatory Network Models that Represent Regulator States and Roles. To appear in Lecture Notes in Bioinformatics.
Task• Given:
– Gene expression data– Other sources of data
• e.g. sequence data, transcription factor binding sites, transcription unit predictions
• Do:– Construct a model that captures regulatory
interactions in a cell
Effector
Key Ideas: States and Roles
CellularCondition
RegulatorExpression
RegulateeExpression
RegulateeExpression
RegulatorState
• Regulator states– Cannot be observed– Depend on more than
regulator expression– We use cellular conditions
as surrogates/predictors of regulation effectors
• Regulator roles– Is a regulator an activator or
a repressor?– We use sequence analysis
to predict these roles
Network Variables and Structure
Hidden Regulator States:“activated” or “inactivated”
Cellular Conditions:“stationary growth phase”, “heat shock”, ...
Regulatees: expression states represented as a mixture of Gaussians
Regulators: expression states represented as a mixture of Gaussians
Connect where we have evidence of regulation
Select relevant parents
Network Parameters: Hidden Nodes use CPD-Trees
GrowthMedium
HeatShock
metJ
metJstate
Growth Phase = Log Phase
GrowthPhase
Growth Phase
metJ
• Parents selected from regulator expression, cellular conditions
• May contain context-sensitive independence
metJ = Low expression metJ ≠ Low expression
Growth Phase ≠ Log
P(metJ state = activated): 0.001
P(metJ state = activated): 0.994P(metJ state = activated): 0.004
Initializing Roles
0.6 0.40.2 0.80.9 0.10.5 0.5
metA transcription unit
Transcription Start Site*-35
Upstream Downstream
DNA
metRstate
metJstate
metA
metJ state
P(Low) P(High)
activated activated
activated inactivated inactivated activatedInactivated inactivated
metR state
CPT for regulatee metA
Binding sites
(metR binds upstream;
considered an activator)
(metJ binds downstream; considered a
repressor)
*Predicted transcription start sites from Bockhorst et. al., ISMB ‘03
Training the Model
• Initialize the parameters– Activators tend to bind more upstream than
repressors
• Use an EM algorithm to set parameters– E-Step: Determine expected states of
regulators– M-Step: Update CPDs
• Repeat until convergence
Experimental Data and Procedure
• Expression measurements from Affymetrix microarrays (Fred Blattner’s lab, University of Wisconsin-Madison)
• Regulator binding site predictions from TRANSFAC, EcoCyc, cross-species comparison (McCue, et. al., Genome Research 12, 2002)
• Experimental data consists of:– 90 Experiments– 6 Cellular condition variables (between two and seven values)– 296 regulatees– 64 regulators
• Cross-fold validation– Microarrays held aside for testing– Conditions from test microarrays do not appear in training set
Log Likelihood
Average Squared
Error
Classification
Error Model
-12,0040.5113.34%Our Model(3 iterations of adding missing TFs)
-12,1930.5112.42%Baseline #2(No hidden nodes, using cellular conditions)
-13,3630.7522.16%Baseline #1(No hidden nodes, no cellular conditions)
-11,8930.5414.19%
Random Initialization(3 iterations of adding missing TFs)