hierarchal inductive process modeling and analysis youri...

101
HIERARCHAL INDUCTIVE PROCESS MODELING AND ANALYSIS Youri No¨ el Nelson A Thesis Submitted to the University of North Carolina Wilmington in Partial Fulfillment of the Requirements for the Degree of Master of Science Department of Mathematics and Statistics University of North Carolina Wilmington 2011 Approved by Advisory Committee Michael Freeze Xin Lu Wei Feng Stuart Borrett Chair Co-Chair Accepted by Dean, Graduate School

Upload: others

Post on 26-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • HIERARCHAL INDUCTIVE PROCESS MODELING AND ANALYSIS

    Youri Noël Nelson

    A Thesis Submitted to theUniversity of North Carolina Wilmington in Partial Fulfillment

    of the Requirements for the Degree ofMaster of Science

    Department of Mathematics and Statistics

    University of North Carolina Wilmington

    2011

    Approved by

    Advisory Committee

    Michael Freeze Xin Lu

    Wei Feng Stuart Borrett

    Chair Co-Chair

    Accepted by

    Dean, Graduate School

  • TABLE OF CONTENTS

    ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

    ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . v

    LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

    LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

    1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.1 HIPM Description . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.1.1 Measure of Fit . . . . . . . . . . . . . . . . . . . . . 12

    2.1.2 Entities specification and model library . . . . . . . . 13

    2.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . 16

    3 COMPUTATIONAL RESULTS . . . . . . . . . . . . . . . . . . . . . 20

    3.1 Increase in number of time-series input . . . . . . . . . . . . . 24

    3.2 Value of Information . . . . . . . . . . . . . . . . . . . . . . . 28

    3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    4 ANALYTICAL ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . 33

    4.1 Most recurrent models . . . . . . . . . . . . . . . . . . . . . . 33

    4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    4.3 Model A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    4.4 Model B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.5 Model C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    4.6 Effects of increasing the number of constraints . . . . . . . . . 63

    5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    ii

  • APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    A. Sample CIAO data - 1997 . . . . . . . . . . . . . . . . . . . . . . . 72

    B. Full entity specification file . . . . . . . . . . . . . . . . . . . . . . 73

    C. Full ross Sea generic model library . . . . . . . . . . . . . . . . . . 75

    D. Models selected in both experiment 8 and 19 . . . . . . . . . . . . 87

    E. Models selected in both experiment 8 and 21 . . . . . . . . . . . . 89

    iii

  • ABSTRACT

    Understanding the Phytoplankton dynamic in the Ross Sea Polynya may yield useful

    knowledge in the search for solving the worlds rising carbon dioxide levels. Modeling

    such dynamics is a very lengthy and tedious process that can be helped with the use

    of computational tools like HIPM. This system relies on knowledge that is already

    available, in the shape of time series data and process library, to construct and then

    evaluates these models. In this research models were ranked by sum of squared

    error, from lowest to highest. The lowest being the best fit model. Some of the

    questions that arise from the use of HIPM are about the amount and value of the

    time series provided to the software, from which we formulated two hypotheses.

    Will having more time series better the output of the system ? Will time series

    for different variables provide different quality of output? Through 31 experiments

    and mathematical analysis, we began to answer these questions. The computational

    result showed us that our first hypothesis does not always hold true, which is thought

    to be because of the way the fit is measured. On the other hand the mathematical

    analysis showed us many variations, over all the experiments, in the zooplankton

    equation structure which can be indication that the process library needs to be better

    defined and that the system needs to take into consideration not only Phaeocystis

    antartica phytoplankton species but also diatoms. This thesis provides the start to

    an answer for this hypothesis but further research is still needed.

    iv

  • DEDICATION

    This Thesis is dedicated to all my friends and family have supported me in this

    incredible journey I started 5 years ago. More importantly I want to dedicate to our

    Lord and Savior as I certainly would not be here today without his help, support

    and comfort.

    “I can do anything through God who strengthens me.”(Philippians 4:13)

    I also want to dedicate this to my nephew Noah Nelson and my niece Sarah Nelson

    for always putting a smile on my face during the tough times, their unconditional

    love and making me want to persevere always. I love you beyond words.

    Thank you, Christel & Douglas Nelson, Lara Nelson, Celio & Elise Nelson, Sven

    Diebold, Andrew & Robin Nelson, Ed & Pat Nelson, Joann Nelson, Philip Varvaris,

    Luke Brown, Taylor Jackson and Bud Edwards (for always being there at the right

    place at the right time) and all my other friends and family members that are not

    named here but are present in my heart and to whom I am so grateful for all the

    words of encouragement and support throughout the years.

    v

  • ACKNOWLEDGMENTS

    I would like to thank Dr. Feng, Dr. Borrett, Dr. Simmons, Dr. Freeze and Dr.

    Lu for all their help and support in this endeavor and process, as well as my friend

    Brevin Rock for his advice in completing a Masters thesis.

    vi

  • LIST OF TABLES

    1 Example of entity definition and instantiation (P) . . . . . . . . . . . 15

    2 Example of process definition (Growth) . . . . . . . . . . . . . . . . . 16

    3 Data contained in CIAO set . . . . . . . . . . . . . . . . . . . . . . . 18

    4 Cutoff Value Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    5 Model A Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 34

    6 Model B Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 36

    7 Model C Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 57

    vii

  • LIST OF FIGURES

    1 Initial Conceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Tree diagram representing the process library . . . . . . . . . . . . . 5

    3 Map of the Ross Sea . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    4 reMSE summary - Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . 21

    5 reMSE summary - Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . 22

    6 reMSE summary - Part 3 . . . . . . . . . . . . . . . . . . . . . . . . . 23

    7 Good fit Models VS. Number of inputted time-series . . . . . . . . . 24

    8 Mean Activation Values Graph . . . . . . . . . . . . . . . . . . . . . 29

    viii

  • LIST OF SYMBOLS

    P = Amount of Phytoplankton present in the system (mg Chla/m3),

    D = Detritus concentration (mg C/m3),

    F = Iron concentration (µM),

    Z = Zooplankton concentration (mg C/m3),

    N = Nitrate concentration (µM),

    Eice(t) = Sea ice concentration

    ETH2O(t) = Temperature of the water (◦C)

    EPUR(t) = Photosynthetically usable radiation ( µmol photons m−2 s−1)

    ETH2Omax= Maximum water temperature

    ETH2Omin = Minimum water temperature

    ai = Optimal parameters of the system selected by HIPM software

    ix

  • 1 INTRODUCTION

    Whether you talk about biology, mathematics, physics, ecology, or any other type

    of science, all have a common objective to explain and describe the world that sur-

    rounds us. All of these fields build upon the collection of observations, to explain

    recurring phenomena. To explain and depict some of these phenomena scientists

    make use of models which can take a variety of forms including conceptual, formal,

    physical and diagrammatic (Haefner, 2005).

    Models are widely used in science and researchers continue to look for tools or

    techniques that will enhance and optimize their ability to construct new models or

    improve existing ones. Given a certain task the type of modeling technique will

    differ, for instance in his book Haefner (2005) uses a Forrester Diagram to model a

    hypothetical agro-ecosystem system, which is a qualitative model formulation. An-

    other example would be in biology when describing predator-prey interaction, one

    can use differential equations models like those formulated by Lokta and Volterra

    (Berryman 1992). Models are useful for system study because they let researchers

    conduct experiments and test theories on the system that would otherwise be un-

    ethical or impossible to perform, as well as enabling them to predict the behavior of

    varying components of an ecosystem.

    Model construction is a difficult and lengthy endeavor. For a given system there

    may be many different combinations of processes (i.e. grazing, decay, growth) that

    could provide a plausible explanation for the behavior being studied. Thus, ex-

    ploring and evaluating all these possibilities makes for a tedious task. In the past,

    limitations in computational powers restricted scientists in their ability to investi-

    gate more complex models, certain known or suspected processes would be left out

    to simplify calculations in part because as computational powers increased so did our

    capacity to evaluate more intricate models (Oreskes 2000). In addition, numerical

  • models of natural systems are non-unique, there is multiple ways to represent the

    same dynamic. Creating computational tools that would quickly and automatically

    evaluate multiple models seemed to be a promising idea to search through the exten-

    sive model space. The success of machine learning and data mining in commercial

    domains led scientists to investigate the field of automated modeling to serve that

    particular purpose (Fayyad et al., 1996).

    The act of gathering small pieces of information and combining it to prior knowl-

    edge to formulate a complex overview of an object or process studied is called in-

    duction. Induction prevents from searching the entire space of possible equations

    by only piecing together the meaningful terms, for instance a predator-prey model

    will need terms specifying growth and death (Todorovski et al. 2005). Inductive

    modeling methods (i.e. LAGRAMGE, HIPM, ARIMA, FUSE) use the principles of

    induction to construct models of the studied system. Methods used for commercial

    application, such as Knowledge Discovery in Database (KDD) process, were insuffi-

    cient for scientific purposes as they only described and did not explain the observed

    system behavior (Langley et al. 2006). A simple example would be the modeling of

    water consumption in a city, a water company could easily create a numerical model

    based on previous years that would give a good estimate of the projected water

    consumption over time but it may not explain why the consumption fluctuates the

    way it does. In other words the commercial methods were able to produce models

    that are useful when trying to make accurate predictions for a system but become

    very limited when trying to explain which processes drive systems behaviors; these

    methods did not explore the realm of all possible models. Thus, induction methods

    had to be enhanced to automate the task of building and evaluating multiple models

    (Dzeroski et al. 1995).

    In this thesis, I used the hierarchal inductive process modeling technique, which

    is encoded as computer algorithm called HIPM (Langley et al. 2006; Bridewell et

    2

  • al. 2005; Dzeroski et al. 1995; Borrett et al. 2007). Inductive process modeling

    methods such as HIPM (Bridewell et al. 2008; Borrett et al. 2007; Langley et al.

    2006; Todorovski et al. 2005) searches through two spaces; the first space is made

    up of mathematical formulations and alternative model structures, which consist of

    entities, processes and the connection biding the two and the second space is made

    up of parameter values (Borrett et al. 2007).The system takes as input a hierarchy

    of generic processes - a process being a certain action on the system which is defined

    by mean of fragment mathematical equations and the rule on how to combine these

    fragments with the rest of the equations -, a set of entities - an entity being an ob-

    ject regrouping the properties of the organism or nutrient by mean of variables and

    parameters - and a set of observed time series of the entities variables (Todorovski

    et al. 2005). HIPM will perform one of two search for for the model structure, a

    heuristic search or exhaustive search. With the search option selected, HIPM creates

    all the possible model structures with the given background knowledge and selects

    the best set of parameters for each model structure. Finally, the system ranks the

    models based on their sum of squared error (Todorovski et al. 2005).

    This system allows for model representation of complex system dynamics, for

    example in the study of photosynthesis regulation it generated a model that repro-

    duced both the qualitative shape and the quantitative details of the time series data

    while incorporating processes that made biological sense (Langley et al. 2006). In

    our case we studied the phytoplankton dynamic in the aquatic ecosystem of the Ross

    Sea.

    In this thesis I used the HIPM tool combined with the appropriate process li-

    brary to study of the phytoplankton dynamic in Ross Sea ecosystem. Here the term

    process library is defined as the collection of processes (i.e. grazing, decay, growth)

    and entities (i.e. phytoplankton, zooplankton, nitrate), with their relation to one

    another. It is best represented by Figure 2.

    3

  • Figure 1: This schematic represent the interaction between entities and exogenousvariables driving the model. Here, P, Z , D , NO3 and Fe are the state variables.PUR, T and Ice are the exogenous variables acting on the system and influencing thestate variables. The arrows represent the interaction of one variable onto another(Borrett, unpublished research).

    Arrigo, Borrett, Bridewell and Langley used HIPM and the Ross Sea process li-

    brary to create and search a space of over 1120 possible model structures to explain

    the phytoplankton and nitrogen temporal dynamics in the Ross Sea ecosystem; all

    models contained five state variables, phytoplankton, zooplankton, detritus, nitro-

    gen and iron. Time series for both phytoplankton and nitrogen where available and

    given to HIPM along with the process library. Their initial research found that 200

    model structures were deemed of good fit, in this case good fit was defined by models

    having a sum of squared error less than or equal to 0.2. From a computer scientist

    standpoint, reducing the search space from 1120 models structure to 200 is a great

    accomplishment; however for a biologist the solution is not specific enough and offers

    few insights on the ecosystem dynamics. There is a need for ways to constraint the

    search further, bringing down the number of good fit models, making the output

    4

  • Figure 2: A tree diagram representing the process library constructed for the RossSea ecosystem problem. The interaction between processes and entities is defined inthe library as explained in Section 2.1.2 ( Borrett et al. 2007)

    useful to biologists.

    Superficially, HIPM appears related to equation discovery methods, which is a

    subfield of machine learning (Langley, 1995; Mitchell, 1997) that investigates col-

    lections of measurements and observations, using different computational methods,

    in search of quantitative laws (Todorovski, 2003). For example the LAGRAMGE

    system will take in as input background knowledge encoded in terms of a grammar

    5

  • specifying the space of possible equations and a dependent variable and will output

    the best equation for the variable, able to only perform the search for one variable

    at the time (Dzeroski et al. 1993, Todrovski 2003). This is further related to the

    methods used in Ljungs work (1993) on system identification, but is further removed

    to that of inductive process modeling.

    The main assumption behind system identification is that the model structure

    is known and that the primary concern is finding the adequate parameter values;

    equation discovery focuses on both the structure and parameter values (Todorovski

    et al. 1998). Both of these approach produce descriptive models that summarize

    and predict the data but they fail to search through the space of alternative expla-

    nations, these methods do not take into account models with theoretical variables

    or consider alternate processes to explain certain dynamics (Bridewell et al. 2005).

    The Southern Ocean covers an area equivalent to about 10% of the global ocean

    and is a key element of the global ocean system as it links all major ocean basins and

    facilitates the global distribution of its deep water; it is considered to play an impor-

    tant part in the global carbon (C) cycle (Arrigo et al. 2003). The Ross Sea polynya

    (area of open water surrounded by sea ice) is one of the most productive ecosystems

    in the Southern Ocean as it experiences some of the largest phytoplankton blooms

    in the region (Arrigo et al 1994, 1998, 2000, 2003). Indeed, phytoplankton produc-

    tivity (photosynthesis) is important to the carbon cycle as it removes carbon dioxide

    (CO2) from surface water during photosynthesis, part of which will then be exported

    to deep ocean water. What makes the Ross Sea polynya so interesting for ecologist

    compared to other locations such as Terra Nova Bay, is the type of phytoplankton

    dominating the ecosystem. In the Ross Sea polynya , Phaeocystis antartica domi-

    nates as opposed to diatoms (species such as Fragilariopsis spp.) in Terra Nova Bay.

    Phaeocystis antartica are thought to resist grazing more than other phytoplankton

    species, which could imply that more carbon would be taken from shallow water into

    6

  • the depth as the un-eaten phytoplankton full of CO2 sinks to the bottom (Tagliabue

    and Arrigo 2003). Deep ocean water has a larger residence time than shallow water,

    meaning that carbon trapped in deep ocean water will be effectively removed from

    atmospheric circulation for a much longer time than the carbon contained in surface

    water.

    Figure 3: Map of the southwestern Ross Sea showing the Ross Sea ploynya, locatednorth of the Ross Sea Ice Shelf, and the Terra Nova Bay polynya, located on thewestern continental shelf (Arrigo et al. 2003)

    Thus, there is an incentive to understand the ecological processes that control the

    7

  • phytoplankton productivity and community composition -which species dominates-

    in the Ross Sea. Fluctuations in phytoplankton population could potentially have

    effects on the CO2 levels in the atmosphere (Carlson et al. 1998) and if we can

    figure out why Phaeocystis antartica is predominant it would be useful informa-

    tion to scientist as they entertain the idea of altering phytoplankton populations

    around the world to create carbon sinks, providing a temporary solution to our CO2

    problem. It is all these elements that initiated the search for the best process ex-

    planation of the phytoplankton dynamics in the Ross Sea, by determining which

    processes act upon the system and which entities are most important, scientist will

    accumulate knowledge that may prove valuable in the fight against rising CO2 levels.

    As mentioned the tool that I have chosen for model search relies on measure-

    ments and observations of one or more variables of a system to make inferences on

    the remaining variables for which no data is available and the processes at works in

    the system. In Borrett’s study, the only state variables for which he had measure-

    ments and observations are Phytoplankton and Nitrate. Ultimately the goal is to

    select model structures that would be good approximations of the natural system

    and give good insights on the processes at work in the system. However, here I was

    faced with an under constrained optimization problem, there was no data available

    for 3 of the state variables. Indeed, one of the big challenges of using HIPM for this

    particular ecosystem was that the data that is used to conduct the search is very

    expensive to collect, and it becomes especially complicated when it comes to iron

    (Fe) as it is difficult to measure. From this last statement arise two questions: does

    knowing data for more than one state variable narrow down the number of possible

    good fit models in a significant manner? Will knowledge about certain variable have

    better optimization power than for others? For example if we could only afford to

    collect data for one of the five variables in the system, would phytoplankton give us

    8

  • better model output (fewer good fit models) in HIPM than zooplankton or would it

    be detritus ?

    This is an important question because as scientist are trying to advance their knowl-

    edge on the Ross Sea; there is a need to make educated decisions on what information

    to collect in an effort to optimize the use of resources.

    This thesis is structured in five parts, firstly I described the method used to

    gather the data that was used in my analysis, and this includes the HIPM software

    as well as an overview of the data sets. I then went into the quantitative analysis,

    by looking strictly at the results generated from the HIPM software and discussing

    what it tells us on an ecological standpoint. In section 4, I entered the analytical

    part of our analysis, picking and studying some of the best-fit models selected during

    the quantitative analysis. I then discussed these analytical results and in the next

    section tied it back to the biology in an effort to link both qualitative and quantita-

    tive research. Through this analysis we saw how we can help HIPMs model selection

    method as well as assist scientists in finding a model that most accurately explain

    the processes at works in the ecosystem observed.

    9

  • 2 METHOD

    The method employed in this paper involves constructing process models from con-

    tinuous data. To assist in this task we used a piece of software named HIPM. It

    is the output and model selection efficiency of this computer software that we are

    investigating. To better understand the task at hand it is important to define what

    HIPM does, as well as the steps we are taking to test its efficiency.

    2.1 HIPM Description

    Ecologists rely on system modeling quite heavily to build ecological theory, guide

    environmental assessment and management (Borrett et al. 2007). Typically scien-

    tists will build and study a couple of models, basing the model structure on previous

    research or by making a judgement call on which entities and processes should or

    not be included. One of the aspirations and problems of modeling natural systems is

    to capture the essence of the system necessary for the model purpose by figuring out

    what can be left out; in that regards which entities and processes should be included,

    and what are the best mathematical formulation and parameter values for a given

    structure become an essential part of this search. Choosing from among the possible

    model structures presents an intricate and time consuming challenge for ecologists

    who want to navigate this space (Borrett et al. 2007). In searching through this

    space of possible models, we are guided by the claim made by Langley et al. (1987),

    which we support, that we must look for models that will fit real-life observations. In

    summary,we are faced with the problem of constructing models anchored in domain

    theory, conducting a time consuming search and linking the models to empirical

    data (Borrett et al. 2007). This is where the HIPM software comes into play to

    remedy these issues, HIPM stands for Hierarchal Inductive Process Modeling. This

    scientific approach (Lantley et al. 2005) assumes the following:

    10

  • • Given: Time-series data for continuous variables.

    • Given: Background knowledge about the entities of the system; in other words

    constraints on variables and other parameters driving these entities.

    • Given: Background knowledge on the type of processes that may be involved

    in driving the ecosystem as well as the constraints that may exist for the said

    processes.

    Then the task for the software is to perform a search through the structure and

    parameter space defined by the process-entity library to find the models that best

    fit the data. HIPM operates in four phases.

    1. In an exhaustive search, it first finds all the possible instantiations of the

    generic processes for all variables. This means that the system will find all the

    possible combinations of processes that can affect a given variable (We will

    give an example in Section 2.1.2 ). For our purposes we used the exhaustive

    search option programmed into the software but there is also a heuristic search

    option available.

    2. The system then walks through each model and puts them together. In other

    words, it puts together, into a generic model, one instantiation of generic

    processes for each variable present in the system. It uses the constraints given

    by the users to determine which instantiations can be linked together into a

    generic model; the program goes through an exhaustive search to find all the

    possible models. In our study it makes 1120 model structures, due mainly to

    the large amount of different grazing processes that are potentially present in

    the ecosystem.

    3. It searches for the parameter values for each model using the constraints de-

    fined by the users. To infer these parameters, the system picks a random

    11

  • set of values that respect the constraints and, using the Levenberg-Marquardt

    gradient descent method, finds a local optimum. To avoid entrapment in lo-

    cal minima, the system will restart the parameter estimation from multiple

    random points retaining only the parameters that produce the lowest error.

    In our experiment we set the number of restarts to 128. This technique has

    been found to produce reasonable matches to time series in multiple systems

    (Langley et al. 2007).

    4. Evaluates the performances of the produced model structures (predicted val-

    ues) against the data series (observed values) by calculating the root mean

    square error (reMSE); models with the lowest reMSE will be considered best

    fit models.

    2.1.1 Measure of Fit

    As mentioned above, HIPM evaluates and selects the best model structure and set

    of parameters according to a fitness measure. The system currently uses the sum

    of square error (SSE) to evaluate fitness (Bridewell et al. 2007), which is defined as

    follow:

    n∑i=1

    SSE(xi, xobsi ) =

    n∑i=1

    m∑k=1

    (xi,k − xobsi,k )2

    where xi, . . . , xn are the variables that are being fitted with m observed values for

    each. To take into account the modeling of variables of varying scale, the system

    uses a relative mean squared error that we define in the following way:

    reMSE =

    ∑ni=1

    SSE(xi,xobsi )

    s2(xobsi )

    nm

    Here s2(xobsi ) is the sample variance of the observation for xi. Across this paper

    12

  • we will refer to the relative mean squared error as reMSE. The biggest asset to this

    rescaling is the ability to compare values across data sets. Typically, an ReMSE of

    1.0 or above signifies that the model performs poorly and inversely, the lower the

    reMSE, the better the fit.

    2.1.2 Entities specification and model library

    Each entity of a system is defined by a combination of variables and parameters

    which makes them actors but also receivers of action in the model. A distinction is

    to be made between generic entity and instantiated entity. Indeed, a formal generic

    entity has a name and a set of properties which can include both variables and

    parameters. In a given model the parameters of the instantiated entity will not

    change whereas the variables do. Every variable in the entity has a name and a

    rule that determines how multiple processes and their subprocesses are combined

    (e.g. summed, minimum, product, etc...). For the parameters there is a name

    and a range that constrains their possible values. On the other hand, instantiated

    entities have their variables associated with either time-series or they are given initial

    values and the parameters have been assigned real values. A field is also included

    to indicate the parent generic entity (Borrett et al. 2007). One given generic entity

    can be instantiated multiple times, the generic entity can be thought of as a blue

    print for the instantiated entities. For example in our system we defined the entity

    phytoplankton as presented in Table 1. Here our entity’s name is “P”; it contains the

    variables “conc”, “growth rate” and “growth lim” with the rules determining how

    they will be aggregated with other processes; the next part of the entity definition is

    the list of parameters that are of concern for this entity such as “max growth’ with

    possible values in the (0,600) range. Following the definition of a generic entity in

    Table 1 is an instantiated entity, “pe” which refers to the parent generic entity. The

    variables are then either given the name of a time-series to which the model will be

    13

  • fitted such as for “conc”, with the “PHA c” referring to the phytoplankton column

    of the CIAO data set, or an initial value such as 0 for “growth rate”, indicating

    that this particular state variable won’t be fitted to a time-series. The mention

    “system” as opposed to “exogenous” simply states that this variable is dependent

    on the system as opposed to being independent like variables such as solar radiation

    or water temperature. The full instantiated entity library can be found in Appendix

    B and the generic entity library in Appendix C.

    For HIPM to be fully functional there needs to be a library of processes. Processes

    are the physical, chemical, or biological actions that drive change in dynamic models.

    Just as we made a distinction between generic entity and instantiated entity, we

    make a distinction between generic processes and instantiated processes. All generic

    processes are defined by a name by which entities can tie into the process, the

    subprocesses that are tied to that one process and one or multiple equations. The

    generic process can also include a set of Bolean conditions that determine if the

    process is active, making the process dynamic by turning the process on and off

    depending on whether the conditions are satisfied (Borrett et al. 2007). For instance

    we could set the photosynthetic process to only occur if a set environment light

    variable is greater than zero. We have an example of generic process in Table 2, it is

    named “growth”, and any of the following entities “P, N, D, E”can take a role in the

    process, then there is a list of the subprocesses, with the entities that can take a role

    in the subprocess, that are linked to this process and finally the equation that defined

    this process; this equation calls onto the “conc” and “growth rate’ variables that all

    entities must have. The instantiated process will take on a specific name and will be

    bound to a specific instantiated entity, one of P, N, D or E. The instantiated entity

    will take it’s role in the equation of the instantiated process. All the instantiated

    processes will be aggregated according to the rule defined in the generic entity. It

    is this organization in terms of entity and process that drives inductive process

    14

  • modeling. It makes for an easier construction of systems of equations by building in

    fragments.

    Table 1: In this table we are first giving an example of generic entity definition withits variables and parameters followed by an example of an instantiated entity, morespecifically Phytoplankton - P, to which the variable “conc” is given a time seriesand the other variables initial values.

    pe = lib.add_generic_entity("P",

    { "conc":"sum",

    "growth_rate":"prod",

    "growth_lim":"min"},

    { "max_growth": (0.4,0.8),

    "exude_rate": (0.001,0.2),

    "death_rate": (0.02,0.04),

    "Ek_max":(1,100),

    "sinking_rate":(0.0001,0.25),

    "biomin":(0.02,0.04),

    "PhotoInhib":(200,1500),});

    p1 = entity_instance (pe, "phyto",

    { "conc": ("system", "PHA_c", (0,600)),

    "growth_rate": ("system", 0, (0,1)),

    "growth_lim": ("system", 1, (0,1))},

    { "max_growth":0.59,

    "exude_rate":0.19,

    "death_rate":0.025,

    "Ek_max":30,

    "biomin":0.025,

    "PhotoInhib":200 } );

    15

  • Table 2: Defining a process - Growth

    lib.add_generic_process(

    "growth", "",

    [("P",[pe],1,1), ("N",[no3,fe],1,100),

    ("D",[de],1,1), ("E",[ee],1,1)],

    [("limited_growth", ["P","N","E"], 0),

    ("exudation",["P"],1),

    ("nutrient_uptake",["P","N"],0)],

    {},

    {},

    {"P.conc": "P.growth_rate * P.conc"} );

    To sum it up, HIPM’s power resides in its knowledge of the modeled domain as

    well as its ability to estimate parameters (Bridewell et al. 2007).

    2.2 Experiment Design

    Having now established how HIPM works let us consider the problem at hand.

    Though in theory HIPM is an extremely powerful tool which permits a search

    through a wide structure and parameter space, previous research has demonstrated

    that a more thorough investigation of HIPM’s output is necessary to evaluate its

    potential and usefulness to biologist. In our example of the Ross Sea ecosystem

    with the process-entity library set up as described, the search space represents 1120

    possible models; each model can take on a wide variety of parameters set depending

    on the constraints given to the software. The Phytoplankton dynamic models of the

    Ross Sea have five variables: Phytoplankton (P ), Zooplankton (Z), Detritus (D),

    Nitrate (N) and Iron (F ). In previous research, real-life time series about Phyto-

    16

  • plankton and Nitrate were available to us for this particular ecosystem, thus the

    data was fed to HIPM. By doing so, HIPM came out with about 200 possible mod-

    els that have a reMSE of less or equal to 0.2 which from a computer science stand

    point is a good improvement. Indeed, we reduce the search space from 1120 possible

    models to 200 models. However, for a biologist that is still a quite large amount of

    models approximating the ecosystem studied; going through and testing out every

    one of these 200 models would be extremely time-consuming. Therefore, it is clear

    that we somehow need to lower this number of possible models to a point deemed

    reasonable/useful to biologist. Logically we assume that increasing the number of

    constraints (i.e. add real-life time series of a variable for which we had no previous

    empirical data) would help model discrimination in HIPM. But this would imply

    that the scientist would have to go into the field and collect time series for one of

    the variables in the system; that process being very expensive, can HIPM be used

    to make an informed decision about which variable would yield the most discrimi-

    natory powers, if there is at all a difference between variables? This is what we are

    investigating and in the light of these elements we have formulated two hypotheses:

    • Hypothesis 1: Increasing the number of constraints: increasing the number of

    time-series for which we have data in HIPM for model selection will induce

    better fits. In other words, the increase in number of known time-series of

    system variables leads to better model discrimination and therefore better

    model selection.

    • Hypothesis 2: Variables yield different values of information: some variables

    will have more discriminatory power and restrict the best fit models more than

    others.

    To test our two hypotheses it was imperative to employ a full data set including

    time-series for all variables of the system in order to compare the results depending

    17

  • upon whether certain time-series are included or not as constraint for HIPM. Since

    no full data set with real-life data was available, we turned to a simulated data set

    called the ”Couple Ice and Ocean model” datasets otherwise referred to as CIAO

    datasets. This dataset is generated from a three dimensional ecosystem model that

    spans the entire water column and multiple stations across the Ross Sea. However,

    for our purposes only a portion of this data, the top 5 meters at the Ross Sea Polynya

    station 01, is used. The type of information contained in the CIAO dataset is stated

    in Table 3.

    Table 3: Information included in the CIAO data set.NOTE: A sample of the CIAO 1997 data can be found as Appendix A.

    Symbol Units DescriptionJDAY Day Day of the measurementsTEMP ◦C Temperature of the waterDPML m Mixed layer depthAI Sea ice concentrationNITR µM Nitrate concentrationPHOS mg Chla/m3 Phosphate concentration,SILC µM Silicate concentrationIRON nM or µM Iron concentrationPARL µmol photons m−2 s−1 Solar radiation used by organism in photosynthesis.PHA mg Chla/m3 Phaeo chlorophyll concentrationDIAT mg Chla/m3 Diatom chlorophyll concentrationZOO mg C/m3 Zooplankton concentrationDET mg C/m3 Detritus concentrationPURL µmol photons m−2 s−1 Photosynthetically usable radiation

    In addition to a full data set, it is necessary to have a working library, that, as

    stated in Section 2.1.2, defined both entities and processes for HIPM. The process-

    entity library that we used is available in Appendix B and C, it was previously

    put together by Bridewell, Borrett, Langley and Arrigo. All the processes and

    subprocesses in which the instantiated entities can take a role in our study are

    represented in Figure 2.

    18

  • Having the background knowledge necessary for HIPM to conduct successful runs

    we designed thirty one experiments; each experiment represents a possible combi-

    nation of time-series constraints that could potentially be entered into the software.

    For example, if we had time-series for Iron and Nitrate and fed the information into

    HIPM they would act as additional constraints in the model selection process. To

    be selected, models have to exhibit behavior close to the given time-series. All the

    experiments are summarized in Table 4 .

    19

  • 3 COMPUTATIONAL RESULTS

    The main topic in this paper, is to determine how to optimize the usage we make of

    HIPM to assist scientists in there decision making process when it comes to selecting

    a model that most accurately represent an ecosystem. The first need is to narrow

    down the number of possible good fit models capable of describing the system. We

    did this feeding additional time series about one of the state variable into HIPM,

    thus providing more constraints; so did this assumption hold true? Secondly, if

    adding more constraints to HIPM does reduce that number, are observations for a

    specific state variable holding more reducing power than the other state variables?

    The data collected helped us answer these questions as well as discuss the efficiency

    of HIPM in its current state.

    There were thirty-one different experiments performed, each returning a measure of

    fit value (reMSE) every one of the 1120 models tested in every experiment. This

    makes for a large amount of data to analyze. To get a better idea of what this data

    looks like, the measures of fit values of models that had an reMSE between 0 and

    2 were graphed, ranking and graphing them from lowest to highest (see Figure 4, 5

    and 6) value. We did not look at reMSE higher than 2.0 since, as stated previously,

    models with reMSE higher than 1.0 are typically classified as poorly performing

    models as it indicates a very large difference between observed and expected values.

    We estimated that the (0,2) range would be sufficient for our purpose, as it would

    encompass most models. Based on these initial results we decided to pick an reMSE

    of 0.5 as our good fit model cutoff; any model under that cutoff is considered of good

    fit. This choice of cutoff was made because the multiple graphs seemed to exhibit a

    turning point or slight step pattern around this reMSE value, such as portrayed in

    the graph for experiments 1, 5 or 20.

    20

  • ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    0.0

    0.5

    1.0

    1.5

    2.0 1[P]

    197 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    2[Z]

    101 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    3[D]

    366 Good Fit Models

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●

    0.0

    0.5

    1.0

    1.5

    2.0 4[N]

    439 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●

    5[F]

    509 Good Fit Models●●●●

    ●●●●●●●●●●●●●●●

    ●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●

    ●●●●●●●

    ●●●●●●●●●●●●●●●●●●

    ●●

    ●●●●●●

    ●●

    ●●

    6[P,Z]

    5 Good Fit Models

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●

    ●●

    0.0

    0.5

    1.0

    1.5

    2.0 7[P,D]

    61 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●

    8[P,N]

    25 Good Fit Models ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●

    ●●●

    9[P,F]

    79 Good Fit Models

    ●●●●●

    ●●●

    ●●●

    ●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●

    0 200 400 600 800 1200

    0.0

    0.5

    1.0

    1.5

    2.0 10[Z,D]

    8 Good Fit Models

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●

    ●●

    ●●

    ●●

    0 200 400 600 800 1000

    11[Z,N]

    1 Good Fit Models

    ●●●●●●●●●●

    ●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●

    ●●

    ●●●

    ●●

    ●●●

    0 200 400 600 800 1000

    12[Z,F]

    0 Good Fit Models

    Figure 4: reMSE value are ranked from lowest to highest. The reMSE = 0.5 signifiesthe good fit model cutoff, any models under that value are considered good fit models.The experimental setup for each run as well as the ID number is indicated in thetop right corner.

    21

  • ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    0.0

    0.5

    1.0

    1.5

    2.0 13[D,N]

    67 Good Fit Models ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●