nima asgharbeygi, pat langley, stephen bay center for the study of language and information stanford...

40
Nima Asgharbeygi, Pat Langley, Stephen Bay Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Center for the Study of Language and Information Stanford University Stanford University Kevin Arrigo Kevin Arrigo Department of Geophysics Department of Geophysics Stanford University Stanford University Computational Revision of Computational Revision of Ecological Process Models Ecological Process Models S. Dzeroski, J. Sanchez, K. Saito, J. Shrager, and L. Todorovski fo S. Dzeroski, J. Sanchez, K. Saito, J. Shrager, and L. Todorovski fo ions to this research, which is funded by the US National Science Fo ions to this research, which is funded by the US National Science Fo

Upload: john-jimenez

Post on 26-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Nima Asgharbeygi, Pat Langley, Stephen BayNima Asgharbeygi, Pat Langley, Stephen BayCenter for the Study of Language and InformationCenter for the Study of Language and Information

Stanford UniversityStanford University

Kevin ArrigoKevin ArrigoDepartment of Geophysics Department of Geophysics

Stanford UniversityStanford University

Computational Revision of Computational Revision of Ecological Process ModelsEcological Process Models

Thanks to S. Dzeroski, J. Sanchez, K. Saito, J. Shrager, and L. Todorovski for their Thanks to S. Dzeroski, J. Sanchez, K. Saito, J. Shrager, and L. Todorovski for their contributions to this research, which is funded by the US National Science Foundation.contributions to this research, which is funded by the US National Science Foundation.

Page 2: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Data Mining vs. Scientific DiscoveryData Mining vs. Scientific Discovery

induce predictive models from large (often business) data sets;induce predictive models from large (often business) data sets; represent models in notations invented by AI researchers.represent models in notations invented by AI researchers.

There exist two computational paradigms for discovering explicit There exist two computational paradigms for discovering explicit knowledge from data. knowledge from data.

The The data miningdata mining movement develops computational methods that: movement develops computational methods that:

This talk focuses on applications of the second framework to This talk focuses on applications of the second framework to environmental and ecosystem modeling. environmental and ecosystem modeling.

constructing models from (often small) scientific data sets;constructing models from (often small) scientific data sets; stated in formalisms invented by scientists themselves.stated in formalisms invented by scientists themselves.

In contrast,In contrast, computational scientific discovery computational scientific discovery focuses onfocuses on::

Page 3: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Observations from the Ross SeaObservations from the Ross Sea

Page 4: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

A Model of Ross Sea EcosystemA Model of Ross Sea Ecosystem

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo + 0.411 zoo + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto + phyto + 0.251 0.251 zoo + 0.385 zoo + 0.385 0.495 0.495 zoo zoo 0.005 0.005 residue residue

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 residue residue

Page 5: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Inductive Revision of Ecosystem ModelsInductive Revision of Ecosystem Models

RevisionRevision

observationsobservations

initial modelinitial model

revised modelrevised model

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo zoo + 0.411 + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto + phyto + 0.251 0.251 zoo zoo + 0.385 + 0.385 0.495 0.495 zoo zoo 0.005 0.005 residueresidue

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 residueresidue

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo zoo + 0.411 + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto + phyto + 0.251 0.251 zoo zoo + 0.385 + 0.385 0.495 0.495 zoo zoo 0.005 0.005 residueresidue

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 residueresidue

Page 6: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

A Space of Ecosystem ModelsA Space of Ecosystem Models

Model revision requires ways to constrain search through this space. Model revision requires ways to constrain search through this space.

Page 7: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Phytoplankton Loss in Ross Sea EcosystemPhytoplankton Loss in Ross Sea Ecosystem

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo + 0.411 zoo + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto phyto + + 0.251 0.251 zoo + 0.385 zoo + 0.385 0.495 0.495 zoo zoo 0.005 0.005 residue residue

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 residue residue

Phytoplankton loss is a Phytoplankton loss is a processprocess that affects two variables; no model that affects two variables; no model should include one influence without the other.should include one influence without the other.

Page 8: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Grazing in the Ross Sea EcosystemGrazing in the Ross Sea Ecosystem

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo zoo + 0.411 + 0.411 phyto phyto

d[zoo,t,1] =d[zoo,t,1] = 0.251 0.251 zoo zoo + 0.615 + 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto phyto + + 0.251 0.251 zoo zoo + 0.385 + 0.385 0.495 0.495 zoo zoo 0.005 0.005 residue residue

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 residue residue

We can view an ecosystem model as a set of processes that provide We can view an ecosystem model as a set of processes that provide an alternative way to encode its assumptions. an alternative way to encode its assumptions.

Page 9: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Process Model of Ross Sea EcosystemProcess Model of Ross Sea Ecosystem

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

process phyto_lossprocess phyto_loss equations:equations: d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto phyto

process zoo_lossprocess zoo_loss equations:equations: d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo zoo

d[residue,t,1] = 0.251 d[residue,t,1] = 0.251 zoo zoo

process zoo_phyto_grazingprocess zoo_phyto_grazing equations:equations: d[zoo,t,1] = 0.615 d[zoo,t,1] = 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.385 d[residue,t,1] = 0.385 0.495 0.495 zoo zood[phyto,t,1] = d[phyto,t,1] = 0.495 0.495 zoo zoo

process nitro_uptakeprocess nitro_uptake equations:equations: d[phyto,t,1] = 0.411 d[phyto,t,1] = 0.411 phyto phyto

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto phyto

process nitro_remineralization;process nitro_remineralization; equations:equations: d[nitro,t,1] = 0.005 d[nitro,t,1] = 0.005 residue residue

d[residue,t,1 ] = d[residue,t,1 ] = 0.005 0.005 residue residue

Page 10: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Inductive Revision of Process ModelsInductive Revision of Process Models

RevisionRevision

initial modelinitial model

observationsobservations revised modelrevised model

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo zoo + 0.411 + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto + phyto + 0.251 0.251 zoo zoo + 0.385 + 0.385 0.495 0.495 zoo zoo 0.005 0.005 residueresidue

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 residueresidue

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residuevariables: phyto, zoo, nitro, residueobservables: phyto, nitroobservables: phyto, nitro

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo zoo + 0.411 + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto + phyto + 0.251 0.251 zoo zoo + 0.385 + 0.385 0.495 0.495 zoo zoo 0.005 0.005 residueresidue

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 residueresidue

process exponential_growth process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1,equations: d[P,t] = [0, 1,] ] P P

process logistic_growthprocess logistic_growth variables: P {population}variables: P {population} equations: d[P,t] = [0, 1, equations: d[P,t] = [0, 1, ] ] P P (1 (1 P / [0, 1, P / [0, 1, ])])

process constant_inflowprocess constant_inflow variables: I {inorganic_nutrient}variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, equations: d[I,t] = [0, 1, ]]

process consumptionprocess consumption variables: P1 {population}, P2 {population}, variables: P1 {population}, P2 {population}, nutrient_P2 nutrient_P2 equations: d[P1,t] = [0, 1, equations: d[P1,t] = [0, 1, ] ] P1 P1 nutrient_P2, nutrient_P2, d[P2,t] = d[P2,t] = [0, 1, [0, 1, ] ] P1 P1 nutrient_P2 nutrient_P2

process no_saturationprocess no_saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = Pequations: nutrient_P = P

process saturationprocess saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, equations: nutrient_P = P / (P + [0, 1, ])])

generic processesgeneric processes

Page 11: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Generic Processes for Aquatic EcosystemsGeneric Processes for Aquatic Ecosystems

generic process exponential_lossgeneric process exponential_loss generic process remineralizationgeneric process remineralization variables: S{species}, D{detritus}variables: S{species}, D{detritus} variables: N{nutrient}, variables: N{nutrient}, D{detritus}D{detritus} parameters: parameters: [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[S,t,1] = d[S,t,1] = 1 1 S S equations: equations: d[N, t,1] = d[N, t,1] = D D

d[D,t,1] = d[D,t,1] = S S d[D, t,1] = d[D, t,1] = 1 1 DD

generic process grazinggeneric process grazing generic process constant_inflowgeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus}variables: S1{species}, S2{species}, D{detritus} variables: variables: N{nutrient}N{nutrient} parameters: parameters: [0, 1], [0, 1], [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[S1,t,1] = d[S1,t,1] = S1 S1 equations: equations: d[N,t,1] = d[N,t,1] =

d[D,t,1] = (1 d[D,t,1] = (1 ) ) S1 S1d[S2,t,1] = d[S2,t,1] = 1 1 S1 S1

generic process nutrient_uptakegeneric process nutrient_uptake variables: S{species}, N{nutrient}variables: S{species}, N{nutrient} parameters: parameters: [0, [0, ], ], [0, 1], [0, 1], [0, 1] [0, 1] conditions:conditions: N > N > equations:equations: d[S,t,1] = d[S,t,1] = S S

d[N,t,1] = d[N,t,1] = 1 1 S S

Page 12: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

A Method for Process Model RevisionA Method for Process Model Revision

1. Find all ways to instantiate available generic processes with 1. Find all ways to instantiate available generic processes with specific variables, subject to type constraints;specific variables, subject to type constraints;

2. Generate candidate model structures by deleting the current 2. Generate candidate model structures by deleting the current processes and adding new ones, subject to complexity limits; processes and adding new ones, subject to complexity limits;

3. For each generic model, carry out search through parameter 3. For each generic model, carry out search through parameter space to find good coefficients [difficult];space to find good coefficients [difficult];

4. Return a list of revised models ordered by their overall scores.4. Return a list of revised models ordered by their overall scores.

We have implemented RPM, an algorithm that revises an initial We have implemented RPM, an algorithm that revises an initial process model in four main stages:process model in four main stages:

The evaluation metric can be squared error or description length The evaluation metric can be squared error or description length based on error and distance from the initial model.based on error and distance from the initial model.

Page 13: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Observations from the Ross SeaObservations from the Ross Sea

Page 14: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Revised Model of Ross Sea EcosystemRevised Model of Ross Sea Ecosystem

model RossSeaEcosystemmodel RossSeaEcosystem

variables: phyto, zoo, nitro, residue, light, G, growth_rate, nitro_rate, variables: phyto, zoo, nitro, residue, light, G, growth_rate, nitro_rate, light_ratelight_rateobservables: phyto, nitro, lightobservables: phyto, nitro, light

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto G G zoo zoo + growth_rate + growth_rate phyto phyto

d[zoo,t,1] = 0.615 d[zoo,t,1] = 0.615 G G zoo zoo

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto phyto ++ 0.385 0.385 G G zoo zoo 0.083 0.083 residue residue

d[nitro,t,1] = d[nitro,t,1] = 1 1 n_to_c n_to_c growth_rate growth_rate phyto phyto + 0.083 + 0.083 n_to_c n_to_c residueresidue

G = 0.415 G = 0.415 (1 (1 –– exp( exp(– – 1 1 0.27 0.27 phyto) phyto)

growth_rate = r_max growth_rate = r_max min(nitro_rate, light_rate) min(nitro_rate, light_rate)

nitro_rate = nitro / (nitro + 4.33)nitro_rate = nitro / (nitro + 4.33)

light_rate = light / (light + 11.67)light_rate = light / (light + 11.67)

n_to_c = 0.251, r_max = 0.194, remin_rate = 0.0676n_to_c = 0.251, r_max = 0.194, remin_rate = 0.0676

Page 15: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Initial Results on Ross Sea Training DataInitial Results on Ross Sea Training Data

The best revised model reproduces the observations quite well.The best revised model reproduces the observations quite well.

Page 16: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Initial Results on Ross Sea Test DataInitial Results on Ross Sea Test Data

But the model predicts nearly the same behavior for both years.But the model predicts nearly the same behavior for both years.

Page 17: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Revised Results on Ross Sea Test DataRevised Results on Ross Sea Test Data

Refitting initial values for zooplankton gives better generalization.Refitting initial values for zooplankton gives better generalization.

Page 18: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Results on Data from Protist StudyResults on Data from Protist Study

Page 19: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Results on Data from Rinkobing FjordResults on Data from Rinkobing Fjord

Page 20: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

specify a quantitative process model of the target system;specify a quantitative process model of the target system;

display and edit the model’s structure and details graphically;display and edit the model’s structure and details graphically;

simulate the model’s behavior over time and situations;simulate the model’s behavior over time and situations;

compare the model’s predicted behavior to observations; compare the model’s predicted behavior to observations;

invoke a revision module in response to detected anomalies.invoke a revision module in response to detected anomalies.

Because few scientists want to be replaced, we are developing Because few scientists want to be replaced, we are developing PPROMETHEUSROMETHEUS, an interactive environment that lets users:, an interactive environment that lets users:

The environment offers computational assistance in forming and The environment offers computational assistance in forming and evaluating models but lets the user retain control. evaluating models but lets the user retain control.

Interfacing with ScientistsInterfacing with Scientists

Page 21: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Viewing and Editing a Process ModelViewing and Editing a Process Model

Page 22: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

computational scientific discovery (e.g., Langley et al., 1983);computational scientific discovery (e.g., Langley et al., 1983);

theory revision in machine learning (e.g., Towell, 1991);theory revision in machine learning (e.g., Towell, 1991);

qualitative physics and simulation (e.g., Forbus, 1984);qualitative physics and simulation (e.g., Forbus, 1984);

languages for scientific simulation (e.g., languages for scientific simulation (e.g., STELLA, MATLABSTELLA, MATLAB););

interactive tools for data analysis (e.g., Schneiderman, 2001).interactive tools for data analysis (e.g., Schneiderman, 2001).

Intellectual InfluencesIntellectual Influences

Our approach to computational discovery incorporates ideas from Our approach to computational discovery incorporates ideas from many traditions:many traditions:

Our work combines ideas from machine learning, AI, programming Our work combines ideas from machine learning, AI, programming languages, and human-computer interaction.languages, and human-computer interaction.

Page 23: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Directions for Future ResearchDirections for Future Research

produce additional results on other ecosystem modeling tasksproduce additional results on other ecosystem modeling tasks

develop improved methods for fitting model parametersdevelop improved methods for fitting model parameters

implement heuristic methods for searching the structure spaceimplement heuristic methods for searching the structure space

utilize knowledge of subsystems to further constrain searchutilize knowledge of subsystems to further constrain search

augment the modeling environment to make it more usableaugment the modeling environment to make it more usable

Despite our progress to date, we need further work in order to:Despite our progress to date, we need further work in order to:

Process modeling has great potential to aid model development Process modeling has great potential to aid model development in environmental science.in environmental science.

Page 24: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Contributions of the ResearchContributions of the Research

a new formalism for representing scientific process models;a new formalism for representing scientific process models;

an encoding for background knowledge as generic processes; an encoding for background knowledge as generic processes;

an algorithm for revising process models with time-series data;an algorithm for revising process models with time-series data;

an interactive environment for model construction/utilization.an interactive environment for model construction/utilization.

In summary, our work on computational discovery has produced:In summary, our work on computational discovery has produced:

We have demonstrated this approach to model revision on both We have demonstrated this approach to model revision on both ecosystem modeling and an environmental domain. ecosystem modeling and an environmental domain.

The PThe PROMETHEUSROMETHEUS modeling/revision environment is available at: modeling/revision environment is available at:

http://www.isle.org/process.htmlhttp://www.isle.org/process.html

Page 25: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

End of PresentationEnd of Presentation

Page 26: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

The Challenge of Systems ScienceThe Challenge of Systems Science

focusing on synthesis rather than analysis in their operation;focusing on synthesis rather than analysis in their operation;

using computer modeling as one of their central methods;using computer modeling as one of their central methods;

developing system-level models with many variables / relations;developing system-level models with many variables / relations;

evaluating models on observational, not experimental, data. evaluating models on observational, not experimental, data.

Disciplines like Earth science differ from traditional disciplines by:Disciplines like Earth science differ from traditional disciplines by:

Constructing such models are complex tasks that would benefit Constructing such models are complex tasks that would benefit from computational aids, but existing methods are insufficient. from computational aids, but existing methods are insufficient.

Page 27: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Why Are Process Models Interesting?Why Are Process Models Interesting?

they incorporate they incorporate scientific formalismsscientific formalisms rather than AI notations; rather than AI notations;

that are easily that are easily communicable communicable to scientists and engineers;to scientists and engineers;

they move beyond descriptive generalization to they move beyond descriptive generalization to explanationexplanation;;

while retaining the while retaining the modularitymodularity needed to support induction. needed to support induction.

Process models are a crucial target for machine learning because: Process models are a crucial target for machine learning because:

These reasons point to process models as an ideal representation These reasons point to process models as an ideal representation for scientific and engineering knowledge.for scientific and engineering knowledge.

Process models are an important alternative to formalisms used Process models are an important alternative to formalisms used currently in machine learning. currently in machine learning.

Page 28: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Advantages of Quantitative Process ModelsAdvantages of Quantitative Process Models

they embed quantitative relations within qualitative structure;they embed quantitative relations within qualitative structure;

that refer to notations and mechanisms familiar to experts;that refer to notations and mechanisms familiar to experts;

they provide dynamical predictions of changes over time;they provide dynamical predictions of changes over time;

they offer causal and explanatory accounts of phenomena;they offer causal and explanatory accounts of phenomena;

while retaining the modularity needed to support induction.while retaining the modularity needed to support induction.

Process models offer scientists a promising framework because: Process models offer scientists a promising framework because:

Quantitative process models provide an important alternative to Quantitative process models provide an important alternative to formalisms used currently in ecosystem modeling. formalisms used currently in ecosystem modeling.

Page 29: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Inductive Process ModelingInductive Process Modeling

Our response is to design, construct, and evaluate computational Our response is to design, construct, and evaluate computational methods for methods for inductive process modelinginductive process modeling, which: , which:

represent scientific models as sets of quantitative processes;represent scientific models as sets of quantitative processes;

use these models to predict and explain observational data;use these models to predict and explain observational data;

search a space of process models to find good candidates;search a space of process models to find good candidates;

utilize background knowledge to constrain this search. utilize background knowledge to constrain this search.

This framework has great potential to aid environmental science, This framework has great potential to aid environmental science, but it raises new computational challenges. but it raises new computational challenges.

Page 30: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Challenges of Inductive Process ModelingChallenges of Inductive Process Modeling

process models characterize behavior of dynamical systems; process models characterize behavior of dynamical systems;

variables are continuous but can have discontinuous behavior; variables are continuous but can have discontinuous behavior;

observations are not independently and identically distributed;observations are not independently and identically distributed;

models may contain unobservable processes and variables;models may contain unobservable processes and variables;

multiple processes can interact to produce complex behavior. multiple processes can interact to produce complex behavior.

Process model induction differs from typical learning tasks in that:Process model induction differs from typical learning tasks in that:

Compensating factors include a focus on deterministic systems and Compensating factors include a focus on deterministic systems and the availability of background knowledge. the availability of background knowledge.

Page 31: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Generating Predictions and ExplanationsGenerating Predictions and Explanations

To utilize or evaluate a given process model, we must simulate its To utilize or evaluate a given process model, we must simulate its behavior over time: behavior over time:

specify initial values for input variables and time step size;specify initial values for input variables and time step size;

on each time step, determine which processes are active;on each time step, determine which processes are active;

solve active algebraic/differential equations with known values;solve active algebraic/differential equations with known values;

propagate values and recursively solve other active equations; propagate values and recursively solve other active equations;

when multiple processes influence the same variable, assume when multiple processes influence the same variable, assume their effects are additive. their effects are additive.

This performance method makes specific predictions that we can This performance method makes specific predictions that we can compare to observations. compare to observations.

Page 32: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Generic Processes as Background KnowledgeGeneric Processes as Background Knowledge

the variables involved in a process and their types;the variables involved in a process and their types;

the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges;

the forms of conditions on the process; andthe forms of conditions on the process; and

the forms of associated equations and their parameters.the forms of associated equations and their parameters.

Our framework casts background knowledge as Our framework casts background knowledge as generic processesgeneric processes that specify: that specify:

Generic processes are building blocks from which one can compose Generic processes are building blocks from which one can compose a specific process model. a specific process model.

Page 33: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Estimating Parameters in Process ModelsEstimating Parameters in Process Models

1. Selects random initial values that fall within ranges specified 1. Selects random initial values that fall within ranges specified in the generic processes;in the generic processes;

2. Improves these parameters using the Levenberg-Marquardt 2. Improves these parameters using the Levenberg-Marquardt method until it reaches a local optimum;method until it reaches a local optimum;

3. Generates new candidate values through random jumps along 3. Generates new candidate values through random jumps along dimensions of the parameter vector and continue search; dimensions of the parameter vector and continue search;

4. If no improvement occurs after N jumps, it restarts the search 4. If no improvement occurs after N jumps, it restarts the search from a new random initial point.from a new random initial point.

To estimate the parameters for each generic model structure, the To estimate the parameters for each generic model structure, the IPM algorithm:IPM algorithm:

This multi-level method gives reasonable fits to time-series data This multi-level method gives reasonable fits to time-series data from a number of domains, but it is computationally intensive. from a number of domains, but it is computationally intensive.

Page 34: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

A Process Model for an Aquatic EcosystemA Process Model for an Aquatic Ecosystem

model Ross_Sea_Ecosystemmodel Ross_Sea_Ecosystem

variables: phyto, nitro, residue, light, growth_rate, effective_light, variables: phyto, nitro, residue, light, growth_rate, effective_light, ice_factorice_factorobservables: phyto, nitro, light, ice_factorobservables: phyto, nitro, light, ice_factor

process phyto_lossprocess phyto_loss equations:equations: d[phyto,t,1] = d[phyto,t,1] = 0.1 0.1 phyto phyto

d[residue,t,1] = 0.1 d[residue,t,1] = 0.1 phyto phyto

process phyto_growthprocess phyto_growth equations:equations: d[phyto,t,1] = growth_rate d[phyto,t,1] = growth_rate phyto phyto

process phyto_uptakes_nitroprocess phyto_uptakes_nitro conditions:conditions: nitro > 0nitro > 0 equations:equations: d[nitro,t,1] = d[nitro,t,1] = 1 1 0.204 0.204 growth_rate growth_rate phyto phyto

process growth_limitationprocess growth_limitation equations:equations: growth_rate = 0.23 growth_rate = 0.23 min(nitrate_rate, light_rate) min(nitrate_rate, light_rate)

process nitrate_availabilityprocess nitrate_availability equations:equations: nitrate_rate = nitrate / (nitrate + 5)nitrate_rate = nitrate / (nitrate + 5)

process light_availabilityprocess light_availability equations:equations: light_rate = effective_light / (effective_light + 50)light_rate = effective_light / (effective_light + 50)

process light_attenuationprocess light_attenuation equations:equations: effective_light = light effective_light = light ice_factor ice_factor

Page 35: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Generic Processes for Aquatic EcosystemsGeneric Processes for Aquatic Ecosystems

generic process exponential_lossgeneric process exponential_loss generic process remineralizationgeneric process remineralization variables: S{species}, D{detritus}variables: S{species}, D{detritus} variables: N{nutrient}, variables: N{nutrient}, D{detritus}D{detritus} parameters: parameters: [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[S,t,1] = d[S,t,1] = 1 1 S S equations: equations: d[N, t,1] = d[N, t,1] = D D

d[D,t,1] = d[D,t,1] = S S d[D, t,1] = d[D, t,1] = 1 1 DD

generic process grazinggeneric process grazing generic process constant_inflowgeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus}variables: S1{species}, S2{species}, D{detritus} variables: variables: N{nutrient}N{nutrient} parameters: parameters: [0, 1], [0, 1], [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[S1,t,1] = d[S1,t,1] = S1 S1 equations: equations: d[N,t,1] = d[N,t,1] =

d[D,t,1] = (1 d[D,t,1] = (1 ) ) S1 S1d[S2,t,1] = d[S2,t,1] = 1 1 S1 S1

generic process nutrient_uptakegeneric process nutrient_uptake variables: S{species}, N{nutrient}variables: S{species}, N{nutrient} parameters: parameters: [0, [0, ], ], [0, 1], [0, 1], [0, 1] [0, 1] conditions:conditions: N > N > equations:equations: d[S,t,1] = d[S,t,1] = S S

d[N,t,1] = d[N,t,1] = 1 1 S S

Page 36: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Inductive Process ModelingInductive Process Modeling

process exponential_growth process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1,equations: d[P,t] = [0, 1,] ] P P

process logistic_growthprocess logistic_growth variables: P {population}variables: P {population} equations: d[P,t] = [0, 1, equations: d[P,t] = [0, 1, ] ] P P (1 (1 P / [0, 1, P / [0, 1, ])])

process constant_inflowprocess constant_inflow variables: I {inorganic_nutrient}variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, equations: d[I,t] = [0, 1, ]]

process consumptionprocess consumption variables: P1 {population}, P2 {population}, variables: P1 {population}, P2 {population}, nutrient_P2 nutrient_P2 equations: d[P1,t] = [0, 1, equations: d[P1,t] = [0, 1, ] ] P1 P1 nutrient_P2, nutrient_P2, d[P2,t] = d[P2,t] = [0, 1, [0, 1, ] ] P1 P1 nutrient_P2 nutrient_P2

process no_saturationprocess no_saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = Pequations: nutrient_P = P

process saturationprocess saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, equations: nutrient_P = P / (P + [0, 1, ])])

model AquaticEcosystemmodel AquaticEcosystem

variables: nitro, phyto, zoo, nutrient_nitro, variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phytonutrient_phytoobservables: nitro, phyto, zooobservables: nitro, phyto, zoo

process phyto_exponential_growthprocess phyto_exponential_growth equations: d[phyto,t] = 0.1 equations: d[phyto,t] = 0.1 phyto phyto

process zoo_logistic_growthprocess zoo_logistic_growth equations: d[zoo,t] = 0.1 equations: d[zoo,t] = 0.1 zoo / (1 zoo / (1 zoo / 1.5) zoo / 1.5)

process phyto_nitro_consumptionprocess phyto_nitro_consumption equations: d[nitro,t] = equations: d[nitro,t] = 1 1 phyto phyto nutrient_nitro, nutrient_nitro, d[phyto,t] = 1 d[phyto,t] = 1 phyto phyto nutrient_nitro nutrient_nitro

process phyto_nitro_no_saturationprocess phyto_nitro_no_saturation equations: nutrient_nitro = nitroequations: nutrient_nitro = nitro

process zoo_phyto_consumptionprocess zoo_phyto_consumption equations: d[phyto,t] = equations: d[phyto,t] = 1 1 zoo zoo nutrient_phyto, nutrient_phyto, d[zoo,t] = 1 d[zoo,t] = 1 zoo zoo nutrient_phyto nutrient_phyto

process zoo_phyto_saturationprocess zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5)equations: nutrient_phyto = phyto / (phyto + 0.5)

InductionInduction

training datatraining data

generic processesgeneric processes

process modelprocess model

Page 37: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

The NPPc Portion of CASAThe NPPc Portion of CASA

NPPc = NPPc = monthmonth max max (E(E··IPAR, 0)IPAR, 0)

E = 0.56 · T1 · T2 · WE = 0.56 · T1 · T2 · W

T1 = 0.8 + 0.02 · Topt – 0.0005 · ToptT1 = 0.8 + 0.02 · Topt – 0.0005 · Topt22

T2 = 1.18 / [(1 + T2 = 1.18 / [(1 + ee 0.2 · (Topt – Tempc – 10)0.2 · (Topt – Tempc – 10) ) · (1 + ) · (1 + ee 0.3 · (Tempc – Topt – 10)0.3 · (Tempc – Topt – 10) )] )]

W = 0.5 + 0.5 · EET / PETW = 0.5 + 0.5 · EET / PET

PET = 1.6 · (10 · Tempc / AHI)PET = 1.6 · (10 · Tempc / AHI)AA · PET-TW-M if Tempc > 0 · PET-TW-M if Tempc > 0

PET = 0 if Tempc < 0PET = 0 if Tempc < 0

A = 0.00000068 · AHIA = 0.00000068 · AHI33 – 0.000077 · AHI – 0.000077 · AHI22 + 0.018 · AHI + 0.49 + 0.018 · AHI + 0.49

IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-ConverIPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver

FPAR-FAS = FPAR-FAS = minmin [(SR-FAS – 1.08) / [(SR-FAS – 1.08) / SRSR (UMD-VEG) , 0.95] (UMD-VEG) , 0.95]

SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

Page 38: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Results of Revising the NPP ModelResults of Revising the NPP Model

Initial model:Initial model:

E = 0.56 · T1 · T2 · WE = 0.56 · T1 · T2 · W

T2 = 1.18 / [(1 + e T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10)0.2 · (Topt – Tempc – 10) ) · (1 + e ) · (1 + e 0.3 · (Tempc – Topt – 10)0.3 · (Tempc – Topt – 10) )] )]

PET = 1.6 · (10 · Tempc / AHI)PET = 1.6 · (10 · Tempc / AHI)AA · PET-TW-M · PET-TW-M

SR SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05}

RMSE on training data = 465.212 RMSE on training data = 465.212 andand r r 2 2 = 0.799 = 0.799

Revised model:Revised model:

E = 0.353 · T1E = 0.353 · T10.000.00 · T2 · T2 0.080.08 · W · W 0.000.00

T2 = 0.83 / [(1 + e T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34)1.0 · (Topt – Tempc – 6.34) ) · (1 + e ) · (1 + e 1.0 · (Tempc – Topt – 11.52)1.0 · (Tempc – Topt – 11.52) )] )]

PET = 1.6 · (10 · Tempc / AHI)PET = 1.6 · (10 · Tempc / AHI) AA · PET-TW-M · PET-TW-M

SR SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61}

Cross-validated RMSE = 397.306 Cross-validated RMSE = 397.306 andand r r 2 2 = 0.853 [ 15= 0.853 [ 15 % reduction ]% reduction ]

••

••

••

Page 39: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

Generic Processes for Photosynthesis RegulationGeneric Processes for Photosynthesis Regulation

generic process translationgeneric process translation generic process transcriptiongeneric process transcription variables: P{protein}, M{mRNA}variables: P{protein}, M{mRNA} variables: M{mRNA}, R{rate} variables: M{mRNA}, R{rate} parameters: parameters: [0, 1] [0, 1] parameters: parameters: equations:equations: d[P,t,1] = d[P,t,1] = M M equations: equations: d[M,t,1] = Rd[M,t,1] = R

generic process regulate_onegeneric process regulate_one generic process regulate_twogeneric process regulate_two variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} parameters: parameters: [ [1 , 1] 1 , 1] parameters: parameters: [ [1 , 1], 1 , 1], [0, 1] [0, 1] equations:equations: R = R = S S equations: equations: R = R = S S

d[S, t,1] = d[S, t,1] = 1 1 S S

generic process automatic_degradationgeneric process automatic_degradation generic process controlled_degradationgeneric process controlled_degradation variables: C{concentration}variables: C{concentration} variables: D{concentration}, variables: D{concentration}, E{concentration}E{concentration} conditions:conditions: C > 0C > 0 conditions: conditions:D > 0, E > 0D > 0, E > 0 parameters: parameters: [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[C,t,1] = d[C,t,1] = 1 1 C C equations: equations: d[D,t,1] = d[D,t,1] = 1 1 E E

d[E,t,1] = d[E,t,1] = 1 1 E Egeneric process photosynthesisgeneric process photosynthesis variables: L{light}, P{protein}, R{redox}, S{ROS}variables: L{light}, P{protein}, R{redox}, S{ROS} parameters: parameters: [0, 1], [0, 1], [0, 1] [0, 1] equations:equations: d[R,t,1] = d[R,t,1] = L L P P

d[S,t,1] = d[S,t,1] = L L P P

Page 40: Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin Arrigo Department of Geophysics Stanford

A Process Model for Photosynthetic RegulationA Process Model for Photosynthetic Regulation

model photo_regulationmodel photo_regulation

variables: light, mRNA_protein, ROS, redox, transcription_ratevariables: light, mRNA_protein, ROS, redox, transcription_rateobservables: light, mRNAobservables: light, mRNA

process photosynthesis;process photosynthesis; equations:equations: d[redox,t,1] = 0.0155 d[redox,t,1] = 0.0155 light light protein protein

d[ROS,t,1] = 0.019 d[ROS,t,1] = 0.019 light light protein protein

process protein_translationprocess protein_translation process mRNA_transcriptionprocess mRNA_transcription equations:equations: d[protein,t,1] = 7.54 d[protein,t,1] = 7.54 mRNA mRNA equations:equations: d[mRNA,t,1] = transcription_rated[mRNA,t,1] = transcription_rate

process regulate_one_1process regulate_one_1 process regulate_two_2process regulate_two_2 equations:equations: transcription_rate = 0.99 transcription_rate = 0.99 light light equations:equations: transcription_rate = 1.203 transcription_rate = 1.203 redox redox

d[redox,t,1] = d[redox,t,1] = 0.0002 0.0002 redoxredox

process automatic_degradation_1process automatic_degradation_1 process controlled_degradation_1process controlled_degradation_1 conditions:conditions: protein > 0protein > 0 conditions: conditions:redox > 0, ROS > 0redox > 0, ROS > 0 equations:equations: d[protein,t,1] = d[protein,t,1] = 1.91 1.91 protein protein equations:equations: d[redox,t,1] = d[redox,t,1] = 0.0003 0.0003 ROS ROS

d[ROS,t,1] = d[ROS,t,1] = 0.0003 0.0003 ROS ROS