stochastic catastrophe models and multimodal distributions

STOCHASTIC CATASTROPHE MODELS AND MULTIMODAL DISTRIBUTION s by Loren Cobb University of South Florida, Tampa

Nonlinear models such as have been appearing in the applied catastrophe theory literature are almost universally deterministic, as opposed to stochastic (probabilistic). The purpose of this article is to show how to convert a deterministic catastrophe model into a stochastic model with the aid of several reasonable assumptions, and how to calculate explicitly the resulting multimodal equilibrium probability density. Examples of such models from epidemiology, psychology, sociology, and demography are presented. Lastly, a new statistical technique is presented, with which the parameters of empirical multimodal frequency distributions may be estimated.

KEY WORDS: catastrophe theory, stochastic models, epidemiology, parameter estimation, multimodal distributions.

Fu

PIRICAL frequency distributions E which unmistakably possess more than one mode, or relative maximum, arise from time to time in all the sciences. Un- fortunately, none of the commonly used theoretical probability densities, normal, exponential, gamma, etc., are bimodal, and so the phenomenon is usually ignored al- together. It is not generally recognized that bimodality is strong evidence for the exis- tence of a fundamentally nonlinear underlying stochastic process, and that to each such process there corresponds a possibly multimodal probability density.

By fundamentally nonlinear we mean that there is more than one stable equilibrium position available to the system described by the variable in question. By stochastic we mean the system is continuously perturbed by random influences. It wanders about the general neighborhood of one of the stable equilibria, occasionally crossing over into the neighborhood of an adjoining stable equilibrium. A random sample drawn from an ensemble of such systems, each possessing an identical dynamic, would yield a multimodal frequency distribution.

Nonlinear dynamic systems can and do

exhibit quite exotic behavior, even when they are unperturbed in the stochastic sense. In this paper we shall restrict our attention to a class of nonlinear systems that are particularly well behaved in many ways: those that are topologically equivalent to the canonical catastrophes of Thom. Although this class is not large compared to the universe of nonlinear systems, a strong argument can be made that it en- compasses within its scope almost all of the fundamentally nonlinear models that are likely to be used in the behavioral sciences for some time to come. In the first section of this paper we present a stochastic form of these nonlinear systems which is seen to generate equilibrium probability densities that bear a very close relationship to the potential functions of the canonical catastrophes. Several varieties of the stochastic cusp are discussed in the next section. In the third we shift our attention to the problems of the statistical analysis of data using catastrophe models. In the last section we present a new technique for analyzing multimodal distributions with a worked example.

' The research reported here was supported by the Laboratory for Applied Mathematics in Behavioral Systems, Department of Psychiatry, University of South Florida. The author gratefully acknowledges

the assistance of Brad Crain, Tim Poston, and especially Ian Stewart, who provided the computer graphics found in this article. All errors are the author's responsibility.

360

Behavioral Stience, Volume 23.1978 0006-7940/78/2305-0$01.W

0 1978 James G. Miller, M.D.. Ph.D., Editor

STOCHASTIC CATASTROPHE MODELS 361

STOCHASTIC CATASTROPHE THEORY The development of extensions to the

calculus that are sufficiently powerful to cope with random functions has been one of the cardinal achievements of 20th cen- tury mathematics. The earliest efforts in this direction occurred in economics (Bach- elier, 1900), physics (Pearson, 1905; Ein- stein, 1905/1956), and genetics (Fisher, 1930; Wright, 1938).

Common to all of these areas was a preoc- cupation with the process now known as difft&ion. It was seen in the fluctuations of the stock market, in the Brownian motion of small particles immersed in fluids or gases, and in the changes in gene frequencies in randomly breeding populations. In each of the cases, and in many more since discovered, the state of the system changes in accordance with a differential equation one of whose terms is in some sense random.

The rigorous theory of diffusion processes was developed by Wiener (1923), Kol- mogorov (1931), and Feller (1954). It3s (1951; It6 & McKean, 1965) stochastic integral opened the way to a full-fledged theory of the stochastic calculus on which there are now many texts. An excellent introductory text on stochastic differential equations has been written by Soong (1973).

These developments occurred indepen- dently of the equally dramatic growth and maturation of differential topology and dynamic systems theory which are extensions of the calculus in quite different directions. It would seem, however, that there is a natural marriage to be made between these modern progeny of classical Newtonian mathematics. The first section of this article represents an attempt to indicate the points of contact between them that seem most promising for the behavioral sciences.

To simplify the exposition, the development of stochastic catastrophe theory here wil l be limited to the so-called cuspoid catastrophes which are characterized by hav- ing only one behavioral dimension. The extension of stochastic methods to several behavioral variables is difficult (Soong, 1973, p la), especially in the nonlinear case which includes all the umbilic catastrophes.

In one behavioral dimension, however, the results are easily obtained through the solution of an ordinmy differential equation. As explained in detail by Fararo (1978)

the dynamics of a deterministic gradient system are described by a differential equation of the form

where the potential P(x, a, b, . . .) is a real- valued function of the behavioral variable x and several control variables (a, b, * . .). This differential equation implies that the variable x changes in the direction of de- creasing potential a t a rate proportional to the slope or gradient of the potential field. The equilibrium manifold of this system is the set of values of x such that dxldt = 0, i.e., the solutions of aP/ax = 0. For example, if P(x, b) = x4 - 2 b 2 , then the system is in equilibrium when 4x3 - 4bx = 0. Thus, for any particular control value b ~ , there are equilibria at x = 0, and, if bo is positive, at x = f (bo)’”, as displayed in Fig. 1. This bifurcation of the solution set as a control variable changes is a characteristic of nonlinear systems in general, and of catastrophe models in particular. We note in pass- ing that the potential examined above is related to the cusp potential which has an additional linear term.

Let us suppose that the rate of change in the behavioral variable is not strictly determined by Eq. ( l ) , but rather that there is a probability density over the range of possible rates of change in x. To maintain correspondence with the deterministic case, let us assume that Eq. (1) specifies the expected or mean rate of change in x. In place of dxldt we must now use the awk- ward expression b o h-’E{x(t + h) - x(t) I x(t)} , where E refers to the expected value of the random variable contained in the curly braces. In the interest of clarity, let us substitute for this expression the function m(x) which we wi l l understand as the mean rate of change in x. Following standard practice in biomathematics, m(x) wil l be called the drift function of the diffusion process. In the stochastic catastrophes the drift function satisfies Eq. (l), so that we have

Behavioral Science. Volume 23. 1978

362 LOREN COBB

(-)- 0 -(+) X

FIG. 1. The equilibrium solution set for x3 - bx = 0.

m(x) = limbo h-’E(x(t + h) (2) - x ( t ) I x ( t ) )

= -ap/ax. Thus the stochastic system behaves like the deterministic system on the average. The probabilistic nature of the rate of change in the behavioral variable requires some additional assumptions beyond that stated in Eq. (2). In the following we shall assume that the trajectory of x(t ) is contin- uous, that is, that instantaneous jumps do not occur. Further, we shall assume that the variance of the rate of change in x can also be represented by a function of x. There are several equivalent ways to define this function, and the form chosen for Eq. (3) turns out to be quite convenient for the material to come:

(3)

The function u(x ) is often referred to as the infinitestimal variance of the process. In many cases it is reasonable to assume that u(x) is a polynomial, e.g., u(x) = €2 where E is a small positive constant. The closer E is to zero, the more the stochastic system wil l resemble a deterministic system. Thus E can be understood as a randomness coefficient.

u(x) = l h L 0 h-’E((1/2 [x ( t + h) - N I 2 I &)I.

The three assumptions stated above place the stochastic processes considered here within the class of diffusion processes, whose mathematics have been under inten- sive study. It has been shown many times (Ricciardi, 1977 p. 38) that the conditional probability density function f (x I t ) of a diffusion process changes in accordance with a partial differential equation superficially similar to Eq. (1):

(4) af/at = -aF/ax,

where the function F(x , t), called the probability flux, is defined by

(5) F ( X , t ) = mf- a(ufi/ax. The first term on the righthand side of Eq. (5 ) can be thought of as a purely deterministic flow. The second term is a purely stochastic flow caused by diffusion. An equilibrium probability density function f* ( x ) is defined as any solution to #/at = 0, i.e., any density function that does not change with time. By Eq. (4), this occurs when dF/ax = 0, i.e., when the probability flux is constant across all values of x , the behavioral variable. Unless very special boundary conditions obtain, the only possibility for constant flux is zero flux. The problem of finding the equilibrium density reduces to

Behavioral Science, Volume 23. 1978


solving F = 0 for some density f * which is a far simpler task than solving Eq. (4) directly.

When the probability flux F is zero, then Eq. (5) reduces to an ordinary differential equation: (6) The difficulty of solving this equation depends entirely on the functions rn and u. It is interesting to note that if rn is linear and u is at most quadratic, then the solutions to Eq. (6) are exactly the probability densities of the venerable Pearson system (Pearson, 1894, 1895; Elderton & Johnson, 1969). These include many familiar densities: exponential, normal, beta, gamma, and oth- ers. Since in elementary catastrophe models the function rn is always a polynomial of degree greater than one, the stochastic catastrophe densities appear to form an extension of the Pearson system. In fact, the structure of Eq. (6) indicates that the indefinite integral

f-’af/ax = (rn - u’)/u.

(7) @(x) = - u-’(rn - u’)dx, I if it exists, wi l l provide a solution to Eq. (6). This solution is (8) f ‘ ( x ) = k-exp (-Wx)), where the normalizing constant k is chosen so that JZ fcdx = 1, and i9 is as defined in Eq. (7). From this solution we see that the stochastic densities f’ are contained within the family of canonical or regular exponential densities (Ferguson, 1967, p. 125; Bury, 1975, p. 106).

The relationship between an equilibrium density f * and the potential function P of a stochastic catastrophe model can be seen most clearly when u(x) = E, in other words, when the infinitesimal variance is a small constant. In this special case Cg = P/E, and thus the solution of Eq. (8) simpUes to (9) f * ( x ) = k.exp [-P(x)/E]. We see that for this class of stochastic systems the potential for the system is proportional to the logarithm of the equilibrium probability density. For example, if P is quadratic in x, then f‘ is normal, while if P is linear in x, then f‘ is exponential. In the elementary catastrophes, however, the

potential P is a polynomial of degree higher than two, which leads to the appearance of multimodal probability densities. In many applications, moreover, the infinitesimal variance u does indeed depend on x, in which case Eq. (9) is not valid.

Several examples at this point may clar- ify the meaning of these statements. To find the equilibrium probability density for a stochastic version of the deterministic model depicted in Fig. 1, for which P(x) = x4 - 2 b 2 , we make the ad hoc assumption that u(x) = E . Then for fixed b we have by Eq. (9):

(10) f * ( x ) = K, exp [ - (x4 - 2 b 2 ) / ~ ] .

For each value of the control variable b, the above formula provides a probability density function as depicted in Fig. 2. Note that the most likely values of x are among the solutions of the deterministic system: x = 0 if b < 0, and x = f b’” if b > 0. The bifurcation of the probability density function is clearly visible in Fig. 2. As b crosses zero the density shifts from unimodal to bimodal.

If we view the equilibrium density fn* as a function of n = €-I, where E is the randomness coefficient in Eq. (9), then the sequence of densities fn* as n + m becomes more and more concentrated at its peaks. In the limit the density is nonzero only at these points, but its area is still one, i-e., JS fn*dx = 1 for all n. The points where g(x) = km fn* is nonzero are the stable equilibrium points of the nonstochastic dynamic system. The limit density g(xf as defined above is an example of a generalized function (Lighthill, 1964). When the potential is quadratic in x, the limit density g(x) is the well-known Dirac delta function 6(x - x,) where xm represents the point of minimum potential. When the potential has several minima, the limit density resembles a weighted sum of delta functions.

The simple relationship between the equilibrium probability density f* and the potential indicated by Eq. (9) breaks down when u(x) , the infinitesimal variance, does depend on x. Such dependence is a natural feature of models in economics, x = dollars, population dynamics, x = organisms, genetics, x = alleles, and other areas. Fortu-

Behariord Science, Volume 23.1978

364 LOREN COBB

FIG. 2. A set of equilibrium densities for Eq. (10). Note the resemblance to Fig. 1. (Computer graphics by Ian Stewart ).

nately, it is still possible to extract signifi- cant information about the shape off* from the functions m and u without explicitly calculating f*. For example, the minima and maxima of f* occur when af*/ax = 0 and this information is available directly from Eq. (6). We find the extreme points of the equilibrium density occur when m(x) - u'(x) = 0. Thus the modes, and antimodes, of the equilibrium density are dis- placed away from the stable and unstable equilibria of the corresponding deterministic system, i.e., the solutions to m(x) = 0.

The interesting phenomenon of barriers in the equilibrium density appears when there are points x b for which u(&) = 0. Generally speaking, these barriers form the natural boundaries of the domain of the density. If the drift function m(xb) is zero at a barrier X b , then a system at xb is clearly trapped since neither the deterministic nor the stochastic components of change are nonzero. Such barriers are called absorbing

and stochastic systems which contain absorbing barriers are called transient. The mean length of time before entrapment has been called the persistence of the transient stochastic system (Ludwig, 1975). The equilibrium density of a transient system does not satisfy Eq. (8). In general, it is a generalized function that is nonzero only on the absorbing barrier (s) .

In stochastic catastrophe theory, the function m(x) is obtained from the univer- sal unfolding of the potential. Therefore, it is not generic for the zeroes of m and u to coincide. In other words, the barriers of stochastic catastrophe theory are not ordi- narily absorbing, although it can happen in special cases. The nonabsorbing barriers, i.e., those points x b for which v(xb) = 0 while m(xb) # 0, can be roughly classified as either repelling or attracting, depending on whether f*(xb) = 0 or +ax

Intuitively, systems located in the neighborhood of a repelling barrier tend to move

Behavioral Science, Volume 23,1978


away toward regions of higher probability. Systems located near an attracting barrier tend to persist in its neighborhood, although they are not trapped on the barrier. The rigorous theory of barriers in one-dimensional diffusion processes was developed by Feller (1954). A thorough account can be found in Mandl (1968), although his treatment is at a very high level of abstrac- tion, using semi-groups of contraction op- erators. Ricciardi (1977) outlines Feller's original typology of barriers, which differs somewhat from the simpler and less precise one presented above.

Fortunately, neither the type nor number of barriers changes under the topological transformations permitted in catastrophe theory. In fact, although we shall not un- dertake to prove this here, the number of modes and antimodes is also invariant up to a diffeomorphism of the potential function. This is the sense in which there is a stochastic catastrophe theory.

There is an instructive example from epidemiology of a stochastic dynamic system whose equilibrium density has a minimum, a maximum, a repelling barrier, and an absorbing barrier. Consider a randomly mixing population and a nonfatal infectious disease that does not confer immunity. Let the variable x represent the fraction of the population infected and assume that the mean and variance of the rate of change in x can be modeled, at least approximately, by

m(x) = ~ ( l - X ) - bx, (11) v (x ) = €x(l - x ) .

In these equations the parameter a is the intensity of transmission from infectives to susceptibles and b is the recovery rate. Note that u(x) was chosen so as to be zero at both x = 0 and x = 1 which are the natural boundaries of the process since x is a fraction. The deterministic version of this model, in which E = 0, belongs to the family of fold catastrophes. In this version, any initial nonzero value of x wil l lead to an explosive increase in infectives until the equilibrium value x* = 1 - b/a is reached.

The stochastic version of this epidemic has an absorbing barrier at zero since both m(0) = 0 and u(0) = 0. In order to study its behavior, therefore, we must resort to spe-

cial methods. Our approach is to appeal to the structural stability of equations such as Eq. (11) and examine a sequence of epidemic processes which include the above as a special case. Stated in another way, we shall study the equilibrium densities of the stochastic fold catastrophes that are structurally close to Eq. (11).

For the purposes of this discussion, the most convenient perturbation of Eq. (11) is the following:

(12) m,(x) = a~ (1 - X ) - bx + c(1 - x ) , for any c > 0.

The parameter c in this new equation may be interpreted as the intensity of transmission of the disease from an external source to the susceptibles, i.e., specifically not from person- to-person transmission. Clearly, model Eq. (12) corresponds to Eq. (11) when c = 0. The latter, however, does not have any absorbing barriers as long as c is positive. We shall now examine the sequence of equilibrium densities of Eq. (12) as c approaches zero.

In the stochastic version (E > 0) of Eq. (12), much the same behavior is observed as in the deterministic version (E = 0). Ex- plosive growth in the number of infectives occurs until an equilibrium density f r * is established with a maximum a + x* = 1 - (b + c) /a . However, if c is so small that c < E, then the stochtktic epidemic differs strikingly &om its deterministic counter- part near the zero infectives point. A threshold appears below which the epidemic is unlikely to get off the ground due, intuitively, to the nonzero chance that all the infectives may recover before transmit- ting the disease to any susceptible.

Eq. (8), applied to the model specified in Eq. (12), yields the following explicit solution for the equilibrium probability density of infectives:

(13) 'exp ( ( x Y / E ) .

This density is depicted for a > b > E > c > 0 in Fig. 3. The barrier at x = 0 is attracting because c < E, confirming the threshold phenomenon mentioned earlier. By contrast, the barrier at x = 1 is repelling because b > E.

fc* = k.x-l+c/f (1 - x)- l+* /c

Behavioral Science. Volume 23.1978

366 LOREN COBB

t t 0 2 W 2 0 W

LL a

D EPIDEMIC SIZE-

FIG. 3. The equilibrium density for Eq. (13) with a = 1, b = .19, E = .l, and c = .01.

The most likely size of the epidemic is the larger of the two solutions to rnc(x) - d ( x ) = 0. The epidemic threshold size, beyond which the epidemic is almost sure to occur, is the smaller of these solutions. Spe- cifically, if we let d = 1 - ( b + c - 24/a, then these solutions are given by

(14) In the special case when c = E the attracting barrier at zero becomes repelling and the most likely epidemic size is 1 - b/a + €/a. Note that this is still slightly perturbed away from the deterministic-epidemic size by a displacement of €/a.

However, in order to determine the equilibrium density of the system defined by Eq. (ll), we must examine the sequence of equilibrium densities Eq. (13) as c + 0. As this limit is approached, the first term in Eq. (13) resembles x-', which is not inte- grable. Hence almost all the probability accumulates in the positive neighborhood of x = 0. In this region the second and third terms are effectively unity and, therefore, k = C/E so that the area under the density is one. Thus the difference between fc* and the density described by

(15) becomes neglegible as c + 0. Eq. (15), con- veniently enough, converges to the generalized function 2 4 4 as c approaches zero where 6 ( x ) is the Dirac delta function

(1/2)d & 1/2 [dL + 4(c - ~)/a]" ' .

gc(x) = ( C / € ) I x I--l+c/r

(Lighthill, 1964, p. 28). Therefore, the lim- iting density as c + 0 is the one-sided generalized function defined by the sequence

(16) The preceding heuristic argument illus- trates how the equlibrium densities of the stochastic catastrophes degenerate into generalized functions when they contain absorbing barriers. It is worth reiterating, however, that absorbing barriers are not generic, and that models which have them are not structurally stable.

fn* ( x ) = x-'+"-l /n, (x 2 0).

THE STOCHASTIC CUSP CATASTROPHES

The cusp catastrophe model is by far the most commonly applied form of the elementary catastrophes and we shall consider its stochastic forms in some detail. The expected behavior of the canonical cusp is determined by

(17) m(x) = -x3 + bx + a, where b and a are the control variables, named by Zeeman the splitting and normal factors, respectively. This model has al- ready been depicted for a = 0 in Fig. l , the deterministic version, and Fig. 2, a stochastic version. Note that increasing the splitting factor b past the bifurcation point causes the equilibrium probability density to split into a bimodal form.

The action of the normal factor of the

Behavioral Science, Volume 23, 1978


probability density can be isolated in this stochastic version by considering a system characterized by Eq. (17) for fixed b and u(x) = E. Then the equilibrium probability density function is, by Eq. (91,

(18) f" = k, exp [-(l/4)(x4

The extrema of these densities occur at the roots of x3 - bx - a. The discriminant of this cubic is D = a2/4 - b3/27. The bifurcation set consists of those values for b for which the discriminant is negative. In this case there are three possible roots. Within the bifurcation set, therefore, f" has two modes and one antimode. As depicted in Fig. 4, the relative height of the modes depends strongly on a, the normal factor.

There is an intimate relationship between multimodal densities such as Eq. (18) and the familiar normal probability density function. If E is small enough, each mode of Eq. (18) closely resembles the normal density. In fact, this is true generally for any

- 2bX2 - ~ux)/E].

multimodal density based on Eq. (9), i.e., any nonsingular equilibrium density for a gradient stochastic system whose potential function resembles a polynomial of degree higher than two. This can be seen by ex- panding the potential function in a Taylor series near one of its stable points, say XI. At this root, the first and second derivatives of the potential with respect to x are P'(xI) = 0 and pl(xl) > 0. As a second approxi- mation, therefore, P(x) = P(x1) + (1/2)(x - XI)~P"(XI). From this it can be seen that the density f" = exp (-PIE) is approximately normal in the neighborhood of xl, with mean p = xr and variance 2 = E / P ' (XI).

An important difference in interpretation between the deterministic and stochastic versions of a cusp catastrophe emerges from Fig. 4. Whereas in the deterministic models shifts from one domain of attraction to another occur only at the boundaries of the bifurcation set; in the stochastic models these shifta can take place at any point within the bifurcation set. However, the

Behavioral Seience, Volume 23,1978

368 LOREN likelihood of these spontaneous catastrophes is strongly dependent on E. For small E the likelihood is extremely small. The persistence of a domain of attraction in a stochastic dynamic system can be defined as the mean length of time until the system spontaneously departs the domain (Lud- wig, 1975). Thus in the stochastic theory the concept of stability is largely replaced by the concept of persistence.

An interesting empirical example of a bimodal distribution that may well have arisen from a stochastic dynamic system of the cusp type appeared recently in Science (McKay, Sinisterra, McKay, Gomez, & Llo- reda, 1978). The bimodal distributions arose in a controlled experiment on the effects of a program of enhanced nutrition and preschool education on the intelligence of children from impoverished families.

The control group exhibited a unimodal IQ distribution with mode at about I$ = 78. By contrast, the experimental group with the longest treatment program exhibited a bimodal IQ distribution with modes at 80 and 110, and antimode at 95. The investi- gators have been unable to find any variable capable of discriminating between children who responded to the program and those who did not (Gomez, personal com- munication). Thus, it would appear that the treatment acted in part as a splitting factor, particularly since it is evident from the published distributions that the longer the treatment the greater the gap between the two modes. This implies that the relationship between IQ and environment is fundamentally nonlinear in a sense not yet seen in contemporary models of intelligence. As this article goes to press this hypothesis is under investigation.

The second form of the stochastic cusp that we shall consider is the cusp epidemic in which u(x ) = x(l - x ) . The same tech- niques that we have used on prior stochastic models are applicable here and shall not be repeated. It is worth noting, however,. the application of this model to public opinion catastrophes in which a certain viewpoint long held by the minority, e.g., dis- approval of Richard Nixon, suddenly becomes the majority opinion.

If x measures the fraction of the population who hold this viewpoint, then the prob-

COBB ability density of x has the general shape shown in Fig. 5, assuming the dynamics resemble the cusp catastrophe. Each domain or mode is persistent, due to pressures of conformity. However, changes in the control parameters will change the relative height and location of the two modes. In- deed, the splitting factor can be identified as the social forces which cause public opinion to polarize, and the normal factor as the social forces which cause one viewpoint to predominate. The stochastic randomness parameter ( E ) determines the likelihood of a spontaneous change from one domain to the other. The Watergate crisis was characterized by increases in both factors, lead- ing to Nixon's precipitous loss of public support. Flay (1978) discusses some closely related attitudinal catastrophe models whose stochastic form can be easily visual- ized.

In many circumstances, particularly in models of population dynamics, it may be reasonable to suppose that the control variables a and b do not have specific values, but instead have probability densities of their own. When this is so, the unconditional equilibrium density of the behavioral variable may not present a clearly bimodal form even if the conditional densitiesr ( x I a, b) are bimodal for values of a and b that are in the bifurcation set. Formally, if h(a, b) represents the joint probability density of a and b, then the unconditional density of x is

(19) fib) = // f"(xI a, b)h(a, b)dadb.

The Watergate example of a public opinion catastrophe given earlier can be modi- fied to illustrate this point. Let x represent

FRACTION OF POPULATION--2

FIG. 5. Equilibrium density of a stochastic cusp epidemic. Both barriers, at 0 and 1, are repelling.



I individual's degree of support for Rich- td Nixon, not, as before, the fraction of ie population who support him. For fur- ier specificity, let b, the splitting factur, $present the degree of political polariza- on experienced by the individual and let , the normal factor, represent the amount ? information the individual has received mut the Nixon administration's malfea- me. Then at any given time during the latergate em, there was a joint density of and b in the sense that there were indi- tduals spread all across the control space. lith the passage of time, this joint density toved steadily toward higher values of'a, lforrnation about Watergate, with a con- ?quent steady stream of individuals leav- ig the bifurcation set and suffering a loss F confidence in Nixon. There was also a 2neral movement toward higher politid olarization, so that the sudden changes in pinion during the waning days of the lixon administration tended to be more 3ectacula.r. POUS taken in any given week, owever, showed wide variation in all three ariables, even at the end. This last example demonstrates the dif-

culties in the statistical detection of uspoid catastrophes when the control var- ibles actually vary in an uncontrolled fash- w through a population. The next section ; devoted to a consideration of other sta- stical difficulties.

STATXSTICi4.L ANALYSIS

Anyone who has seriously contemplated ie use of a catastrophe model in the anal- sis of empirical data will have noticed weral severe difficulties. In large measure nis explains the almost total lack of statis- ical papers in the catastrophe literature. 'his section is intended to identify these ifficulties and to propose suitable compro- k e s , assumptions, and procedures by Vhich they may be surmounted. We shall e almost exclusively concerned here with he statistical estimation of the parameters f a catastrophe model. Formal hypothesis ?sting lies beyond the scope of this paper. The first major difficulty is created by

he topological principles on which catas- rophe theory is founded. Without the oncept of topological equivalence, it is im- ossible to classify the local features of

gradient systems into the elementary catastrophes. Topological equivalence plays havoc with statistical model fitting because no measure of goodness of fit is invariant under the transformations permissible in differential topology.

The traditional least squares criterion is invariant only d e r linear transfonna- tions, i.e., orthogonal rotation and linear changes in scale of the coordinate system. However, any two coordinate systems are topologically equivalent if there exists a smooth, i.e., infinitely differentiable, map- ping with a smooth inverse between them. Such mappings are called Meomorphisms.

Obviously the class of diffeomorphisms is much larger than the class of linear transformations. The discovery of goodness of fit criteria that are invariant up to diffeomorphism would be a most valuable contri- bution to statistical theory. Although this seems to be an implicit goal of nonpara- metric statistical theory, it is clearly yet to be achieved.

In the absence of such criteria, it is in- cumbent upon the modeler to use such nonlinear transformations on the measured variables as appear appropriate to bring them to within a linear transformation of a canonical, i.e., elementary, catastrophe. This is no trivial task since it can be achieved only through an adequate theory of the substantive process at hand.

An instructive example of this is provided by the behavior of gases near the critical point of pressure, volume, and temperature. There is an empirically obvious discontinuity in behavior near the critical point for most gases, indicating a breakdown of Boyle's Law in this region. By no means is it immediately obvious how this discontinuity can be sensibly related to a canonical catastrophe. It happens, however, that the empirical description of this discontinuity (van der W d Equation) can be trans- formed into a cusp model if the behavioral variable is taken to be the reciprocal of volume, i.e., density, assuming the number of molecules is constant. The resulting equation is within a linear transformation and a change of origin of the canonical cusp. Therefore, data on pressure, density, and temperature should be far more amenable to empirical catastrophe model fitting than

ehavioral Science, Volume 23.1978

370 LOREN COBB data on pressure, volume, and temperature.

Even if these severe limitations are ac- cepted, there are still difficulties. An essen- tial feature of catastrophe models is that for each codiburation of the independent control variables, there may be more than one stable value for the dependent behavioral variable. Now almost all parameter estimation procedures are based on the minimization of the departures of empirical observations from the predictions of a model. Therefore, models which make several different predictions for the same configuration of independent variables cannot be analyzed without fundamental modifi- cations to the usual concept of estimation. This difficulty is irrelevant if time series data are available because in this case the dependent variable is the change in the behavioral variable (Ax), a single-valued polynomial function of the independent variables. If it can be assumed that the data take the form of a snapshot sample of a population of similar stochastic dynamic systems, then for each configuration of the control variables in the bifurcation set there will be a multimodal frequency distribution of the behavioral variable. As shown in the first section of this paper,

the parent probability density function for the empirical frequency distribution wil l be a member of the canonical exponential family of densities, which include most but not all of the major densities. In the next section we show how the traditional least squares technique can be adapted to the estimation of the parameters of this family.

STATISTICAL ESTIMATION

The multimodal probability densities that arise naturally from stochastic catastrophe systems in equilibrium present some unique statistical problems. As a descriptive statistic the mean loses most of its usefulness, for example, since it can be expected to fall between the two modes of a bimodal frequency distribution. Similarly, the standard deviation fails to describe the peakedness of multimodal data.

The failure of these traditional parameters results from the parametric complexity of the canonical exponential family of densities whose multimodal members typi- cally require at least four descriptive statis-

tics as compared with two for the typic. unimodal density. For example, the b modal density generated by a stochast cusp in statistical equilibrium requires pi rameters for bifurcation and symmetry i addition to location and scale. Furthe more, the location and scale parametei turn out to have very little relationship t the simple mean and standard deviation c the density.

In the statistical literature, the canonici exponential families have been closely a sociated with the theory of sufficient stati: tics (Fisher, 1922) rather than with mult modality. Roughly speaking, a set of suff cient statistics for a sample summarizes th whole of its information relevant to th parent probability density. For examplc the first two moments of a sample are SUI ficient for a normal parent, the first mc ment only is sufficient for an exponentir parent, while the first four moments ar sufficient for a parent of the cusp form, Ec (18).

The canonical exponential family o probability density functions has two pleas ant properties. Each family has a set o jointly sufficient statistics and maximun likelihood estimators exist for the param eters of each family. Unfortunately, exis tence theorems for these maximum like lihood estimators (Crain, 1976) do not si multaneously show how these estimator: are to be computed. The iterative proce dure suggested by Crain is extremely slov because each iteration requires a numerica integration. An entirely different estima tion procedure, which involves only the so lution of a system of linear equations, i presented next.

The task is to estimate the parameters oi the probability density function f ( x ) = exp[-@(x)], based on a set of independeni observations of a random variable X wit1 density f i Recall from Eq. (7) that (20) W ( X ) = -(m(x) - u’ (x ) ) /u (x ) , where rn and u are the drift and infinitesi. mal variance of the dynamic system, re. spectively. We shall restrict our attentior at the outset to the case in which u(x ) = I and (21) m(x) = p1 + pz x + p a 2 + . . . + pd-1,



where P n < 0 and n is even. We seek estimates for the { p k } or, alternatively, for the corresponding parameters { e k } of

(22) wx) = eo + elx + kx2 + . . . + e,Y, where P k = -A?& by differentiation. Notice that if n = 2, then f i s normal; if n = 4, then f is possibly bimodal; and if n = 6, then f is possibly trimodal. The parameter 80 is not free since it must be chosen so that the area under the density is unity. Once again, we note that in this special case Cg is the potential function for the stochastic gradient system because u is constant.

The exponential family defmed by Eq. (21) has a remarkable property, heretofore apparently unknown, which holds the key for the estimation problem. It is based on the observation that, under the above definitions, m(x) = -f '(x)/f(x). The property is stated in the following Lemma: If p(x ) is any polynomial, then

(23) Eb(X)m(X)) = EW(X)). Proof:

+m

E C ~ ( X ) ~ ( X ) I = I_. p(x )m(x) f (x )h

+m

= -1. p(x)f'(x)dx

= -p(x)f(x) II: + [ P' (x)f(x)dx

= E(p'(x)} , as claimed. This lemma enables the construction of a least squares estimation procedure for /3 = (PI, Pz, . . . , Pn). In essence, we are looking for a particular /3* which minimizes Q(/3), a quadratic criterion:

The vector /3* for which Q @ ) is at a minimum is a least squares estimator in the sense that the polynomial 81' + /?z* + . . . + P n ' 9 - I has the least expected squared de-

viation from m(x). This differs from the more usual minimum chi square sense.

For the purposes of the estimation theorem to follow, we make the following definitions:

(25) I", = E{X+'-2}, 1 s i, j s n, (26) at = (i - l)E{X-'}, 1 d i 5 n.

The matrix I' and the vector a are easily estimated by the moments of the observations of X. We are now ready to state and prove the least squares estimation theorem.

Theorem: The 8* which minimizes Q(P) is:

(27) p = P a .

Proof:

Q(B) = E M X ) - 1, PJ1-llz}

+ 2 X - I 2, pJ'-I}. * dQ/aP, = E{-W-'m(X)

dQ/dP, = 0 * E{X"-'m(X)} = 1, p,*E{xl+'-2}

=$ (i - l)E{x-z} = c p,*E{~+'-2}

* a = FP* * p = I%, as claimed.

It can be shown that the empirical moment matrix I' is nonsingular and hence inverti- ble as long as the observations of X include at lest 2n - 1 distinct values (Cramer, 1946, p. 131). To ensure reliable estimates, however, the sample size should be considerably larger than this since the procedure requires estimates of moments up to order 2n - 2.

It is instructive to study the application of this new theorem to the simplest non- trivial case, where n = 2. Using the theorem and definitions Eqs. (25) and (26), we obtain the matrix equation:

Expressing the solution in terms of p = E [ X ) = o2 = E { X 2 } - (E{X))', we find that m(x) = - ( x - p)/02, and therefore that

(29) f (x ) = exp[- W2) ( x - ~)~/43 .


372 LOREN COBB

Thus we find that the normal probability E { p } = 0.0440177 density is the member of canodcal exponential family for which n = 2, as previously claimed.

As a concrete demonstration of the sim- plicity and flexibility of the technique outlined above, consider its application to the empirical distribution of crude birth rates (CBR) throughout the nations of the world, depicted as the histogram in Fig. 6. For computers with only six decimal figures of accuracy, the usual, unfortunately, it is nec- essary to transform the data to roughly standard form. With the transformation X = (CBR -31)/13, the first ten empirical moments are:

E { p } = 1.28114 E { p ) = 0.155052 E { P } = 1.8093 E{X'} = 0.387787 E { P } = 2.7488 E { P } = 0.865522

E{X"} = 4.41053.

These moments are all that are required to fit canonical exponential densities up to n = 6. Using the estimation theorem on these moments for n = 2, n = 4, and n = 6, we obtain the approximations depicted in Fig. 6 in, respectively, a solid lineilong dashes, and short dashes. Note the improvement in detail as the parametric complexity of the

E { X * } = 0.00417208 E { p } = 1.00772

17 i

16 .

15 .

14.

13 -

12 -

I I .

10 .

9 -

0 8 - * z W

'0 7 -

," 6 - W

5 -

4 .

3 .

2 -

/-'\ I \

I/ ', ; \

n' !

I1 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

ANNUAL CRUDE BIRTH RATE

FIG. 6. The empirical histogram of crude birth rates (annual live births per loo0 population) for a sample of 59 countries (Weinstein, 1976, p. 88). Also shown are the estimated densities of Eqs. (28), solid line, (29), long dashes, and (30), short dashes.

Behavioral Science, Volume 23,1978


densities increases. The estimated densities are:

(28) h(x) = exp[.OO4x - .4962] (29) b(x) = exp[-2.142x + 6.4282

(30) h ( x ) = exp[1.22X + .846x3 - 2.659~~1

- .2005X2 - 2.252~~ + 3.345~~ + 1.025~~ - 1.680x6]

The estimation procedure outlined above should prove useful in those special circumstances in which no explanation can be found for the multimodality of an empirical frequency distribution. In this case it may be supposed that the stochastic dynamic system underlying the data is fundamentally nonlinear and that its stable and unstable equilibria correspond to the modes and antimodes of the distribution. The estimation procedure will yield the estimated potential function for the system from which the topologically equivalent catastrophe may be deduced. Thus the material presented in this section effectively comple- ments the material in the first section. It is possible to estimate a potential function from empirical observations of a system in statistical equilibrium and it is possible to predict equilibrium probability densities from specified potential functions. These are the fundamental requirements for a true statistical systems theory.

The statistical theory of nonlinear systems is obviously only in its infancy. One possible avenue into this theory is provided by nonlinear time series analysis, an approach ignored in this paper even though it is clearly superior for systems not in statistical equilibrium. The avenue afforded by the equilibrium approach leads to a variety of useful statistical concepts.

For example, a suitable modification of the quadratic criterion Eq. (24) results in a more general estimation theorem which is valid for stochastic systems with barriers (Cobb, 1978). This theorem links together the normal, gamma, and beta densities and all their multimodal analogs into a single least squares framework. Lastly, a multi- variate version has been developed which includes multiple regression as the linear

special case. Thus stochastic catastrophe theory may provide the key to a new gen- eration of fundamentally nonlinear parametric statistical procedures.

REFERENCES Bachelier, L. Theorie de la sp&ulation, Annales

Scientifiue de I%cole Normale Superieure, T Serie, Tome 17,21-86,1900.

Bury, K. V. Statistical models in applied science. New York Wiley, 1975.

Cobb, L. Multimalal exponential families: Least squares theory. Preaentad at the Institute of Mathematical Statistics Convention, San Dieqo, August 1978.

Crain, B. Exponential models, maximum likelihood estimation, and the Haar condition. J. Amer. stat. Ass., 1976, 71, 737-740.

Cram&, H. Mathematical methods of statistics. Princeton: Princeton Univ. Press, 1946.

Einstein, A. [Investigations on the theory of the Brownian movement] (R. Fiirth, Ed., and A.D. Cowper. trans.). New York Dover, 1956. (Orig- inally published, 1905.)

Elderton, W. P., &Johnson, N. L. Systems offrequency curves. London: Cambridge Univ. Press, 1969.

Farm, T. J. An introduction to catastrophes. Behau.

Feller, W. Diffusion processes in one dimension. Trans. Amer. math. he., 1954,97,1-31.

Ferguson, T. S. Mathematical statistics. New York Academic Press, 1967.

Fisher, R. A. On the mathematical foundations of theoretical statistics. London: Royal Society of Lonhn, Philosophical Transactions, 1922.

Fisher, R. A. Thegemtical theory of natural selection. Oxford Clarendon, 1930.

Flay, B. Applications of catastrophe theory to social psychology. Behav. Sci., 1978,23,335-350.

16, K. On s tocht ic differential equations. New York Amer. Math. SOC. (Memoirs, AMSM), 1951.

It6, K., & McKean, H. P. Diffusion processes and their sample pa.%. New York Springer-Ver- lag, 1965.

Kolmogorov, A. N. Ueber die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Math. Ann., 1931,104,415-458.

Lighthill, M. J. Fourier analysis and generalized functions. London: Cambridge Univ. Press, 1964.

Ludwig, D. Persistence of dynamical systems under random perturbations. S I N Rev., 1975, 17, 605660.

Mandl, P. Analjtical treatment of one-dimensional Markov processes. New York Springer-Verlag, 1968.

McKay, H., SiniSkrra, L., McKay, A., Gomez, H., & Lloreda, P. Improving cognitive ability in c~onically deprived children. Science, 1978, 200,270-278.

Pearson, K. Contributions to the mathematical theory of evolution. Phil. Tram. Royal Society, Lon- don A, 1894,185,71.

Sci, 1978,23,291-317.

Behavioral Science. Volume 23.1978

374 LOREN COBB Pearson, K. Contributions to the mathematical theory

of evolution, 11. Phil. Trans. Royal Society, London A, 1895,186,343.

Pearson, K. The problem of random walk. Nature, 1905, 72, 294, 342.

Ricciardi, L. M. Diffuswnprocesses and related topics in biology. New York Springer-Verlag, 1977.

Soong, T. T. Random differential equations in science & engineering. New York: Academic Press, 1973.

Weinstein, J. A. Demographic transition and social

change. Morristown, N.J.: General Learning, 1976.

Wiener, N. Differential space. J. math. phys. Mass. Znst. Tech., 1923,2, 131-174.

Wright, S. The distribution of gene frequencies under irreversible mutation. Proc. nut. acad. Sci., 1938,24,253-259.

(Manuscript received March 17, 1978; revised May 1978)

Behavioral Scirnce, Volume 23, 1978

stochastic catastrophe models and multimodal distributions

Documents