a multi-objective multi-modal optimization approach for

6
A Multi-Objective Multi-Modal Optimization Approach for Mining Stable Spatio-Temporal Patterns. Mich` ele Sebag 1 , Nicolas Tarrisson 1 , Olivier Teytaud 1 , Julien Lefevre 2 & Sylvain Baillet 2 (1) TAO, CNRS - INRIA - Universit´ e Paris-Sud, F-91405 - Orsay (2) LENA, CNRS, La Piti´ e Salp´ etri` ere, F-75651 - Paris Abstract This paper, motivated by functional brain imaging applications, is interested in the discovery of sta- ble spatio-temporal patterns. This problem is for- malized as a multi-objective multi-modal optimiza- tion problem: on one hand, the target patterns must show a good stability in a wide spatio-temporal re- gion (antagonistic objectives); on the other hand, experts are interested in finding all such patterns (global and local optima). The proposed algorithm, termed 4D-Miner, is empirically validated on arti- ficial and real-world datasets; it shows good per- formances and scalability, detecting target spatio- temporal patterns within minutes from 400+ Mo datasets. 1 Introduction Spatio-temporal data mining is concerned with finding spe- cific patterns in databases describing temporally situated spa- tial objects. Many approaches have been developed in sig- nal processing and computer science to address such a goal, ranging from Fourier Transforms to Independent Component Analysis [Hyvarinen et al., 2001], mixtures of models [Chu- dova et al., 2003] or string kernel machines [Saunders et al., 2004], to name a few. These approaches aim at particular pat- tern properties (e.g. independence, generativity) and/or focus on particular data characteristics (e.g. periodicity). How- ever, in some application domains, the above properties are not relevant to extract the patterns of interest. The approach presented in this paper is motivated by such an application domain, functional brain imaging. Non invasive techniques, specifically magnetoencephalography (MEG) [H¨ am¨ al¨ ainen et al., 1993], provide measures of the human brain activity with an unprecedented temporal resolution (time step 1 millisec- ond). This resolution enables researchers to investigate the timing of basic neural processes at the level of cell assem- blies [Pantazis et al., 2005]. A key task is to find out which brain areas are active and when, i.e. to detect spatio-temporal regions with highly correlated activities. It is emphasized that such regions, referred to as stable spatio-temporal pat- terns (STPs), are neither periodic nor necessarily indepen- dent. Currently, STPs are manually detected, which (be- sides being a tedious task) severely hampers the reproducibil- ity of the results. This paper addresses the automatic detec- tion of stable spatio-temporal patterns, i.e. maximal spatio- temporal regions with maximal correlation. This detection problem cannot be formalized as a multi-objective optimiza- tion (MOO) problem [Deb, 2001], because experts are in- terested in several active brain areas: an STP might be rel- evant though it corresponds to a smaller spatio-temporal re- gion, with a lesser correlated activity than another STP. The proposed approach thus extends MOO along the lines of multi-modal optimization [Li et al., 2002], defining a multi- objective multi-modal optimization framework (MoMOO). MoMOO is tackled by an evolutionary algorithm termed 4D- Miner, using a diversity criterion to relax the multi-objective search (as opposed to diversity enforcing heuristics in MOO, e.g. [Corne et al., 2000; Laumanns et al., 2002]; more on this in section 2.3). To the best of our best knowledge, both the extension of evolutionary algorithms to MoMOO, and the use of multi-objective optimization within spatio- temporal data mining are new, although MOO attracts in- creasing attention in the domain of machine learning and knowledge discovery (see, e.g., [Ghosh and Nath, 2004; Francisci et al., 2003]). Experimental validation on artifi- cial and real-world datasets demonstrates the good scalabil- ity and performances of 4D-Miner; sought STPs are found within minutes from medium sized datasets (450 Mo), on PC- Pentium 2.4 GHz. The paper is organized as follows. Section 2 formalizes the detection of stable spatio-temporal patterns as a multi-modal multi-objective optimization problem, and motivates the use of evolutionary algorithms [B¨ ack, 1995; Goldberg, 2002] for their detection. Section 3 gives an overview of 4D-Miner. Section 4 describes the experiment goals and setting. Section 5 reports on the extensive valida- tion of 4D-Miner on artificial and real-world data. Section 6 discusses the approach with respect to relevant work, and the paper ends with perspectives for further research. 2 Position of the problem ; notations, criteria 2.1 Notations and definitions Let N be the number of measure points. To the i-th measure point is attached a spatial position 1 M i =(x i ,y i ,z i ) and a 1 MEG measure points actually belong to a 2D shape (the sur- face of the skull). However, the approach equally handles 2D or 3D spatio-temporal data.

Upload: others

Post on 12-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

A Multi-Objective Multi-Modal Optimization Approachfor Mining Stable Spatio-Temporal Patterns.

Michele Sebag1, Nicolas Tarrisson1, Olivier Teytaud1, Julien Lefevre2 & Sylvain Baillet2

(1) TAO, CNRS − INRIA − Universite Paris-Sud, F-91405 - Orsay(2) LENA, CNRS, La Pitie Salpetriere, F-75651 - Paris

AbstractThis paper, motivated by functional brain imagingapplications, is interested in the discovery of sta-ble spatio-temporal patterns. This problem is for-malized as a multi-objective multi-modal optimiza-tion problem: on one hand, the target patterns mustshow a good stability in a wide spatio-temporal re-gion (antagonistic objectives); on the other hand,experts are interested in finding all such patterns(global and local optima). The proposed algorithm,termed 4D-Miner, is empirically validated on arti-ficial and real-world datasets; it shows good per-formances and scalability, detecting target spatio-temporal patterns within minutes from 400+ Modatasets.

1 IntroductionSpatio-temporal data mining is concerned with finding spe-cific patterns in databases describing temporally situated spa-tial objects. Many approaches have been developed in sig-nal processing and computer science to address such a goal,ranging from Fourier Transforms to Independent ComponentAnalysis [Hyvarinen et al., 2001], mixtures of models [Chu-dova et al., 2003] or string kernel machines [Saunders et al.,2004], to name a few. These approaches aim at particular pat-tern properties (e.g. independence, generativity) and/or focuson particular data characteristics (e.g. periodicity). How-ever, in some application domains, the above properties arenot relevant to extract the patterns of interest. The approachpresented in this paper is motivated by such an applicationdomain, functional brain imaging. Non invasive techniques,specifically magnetoencephalography(MEG) [Hamalainen etal., 1993], provide measures of the human brain activity withan unprecedented temporal resolution (time step 1 millisec-ond). This resolution enables researchers to investigate thetiming of basic neural processes at the level of cell assem-blies [Pantazis et al., 2005]. A key task is to find out whichbrain areas are active and when, i.e. to detect spatio-temporalregions with highly correlated activities. It is emphasizedthat such regions, referred to as stable spatio-temporal pat-terns (STPs), are neither periodic nor necessarily indepen-dent. Currently, STPs are manually detected, which (be-sides being a tedious task) severely hampers the reproducibil-

ity of the results. This paper addresses the automatic detec-tion of stable spatio-temporal patterns, i.e. maximal spatio-temporal regions with maximal correlation. This detectionproblem cannot be formalized as a multi-objective optimiza-tion (MOO) problem [Deb, 2001], because experts are in-terested in several active brain areas: an STP might be rel-evant though it corresponds to a smaller spatio-temporal re-gion, with a lesser correlated activity than another STP. Theproposed approach thus extends MOO along the lines ofmulti-modal optimization [Li et al., 2002], defining a multi-objective multi-modal optimization framework (MoMOO).MoMOO is tackled by an evolutionary algorithm termed 4D-Miner, using a diversity criterion to relax the multi-objectivesearch (as opposed to diversity enforcing heuristics in MOO,e.g. [Corne et al., 2000; Laumanns et al., 2002]; more onthis in section 2.3). To the best of our best knowledge,both the extension of evolutionary algorithms to MoMOO,and the use of multi-objective optimization within spatio-temporal data mining are new, although MOO attracts in-creasing attention in the domain of machine learning andknowledge discovery (see, e.g., [Ghosh and Nath, 2004;Francisci et al., 2003]). Experimental validation on artifi-cial and real-world datasets demonstrates the good scalabil-ity and performances of 4D-Miner; sought STPs are foundwithin minutes from medium sized datasets (450 Mo), on PC-Pentium 2.4 GHz. The paper is organized as follows. Section2 formalizes the detection of stable spatio-temporal patternsas a multi-modal multi-objective optimization problem, andmotivates the use of evolutionary algorithms [Back, 1995;Goldberg, 2002] for their detection. Section 3 gives anoverview of 4D-Miner. Section 4 describes the experimentgoals and setting. Section 5 reports on the extensive valida-tion of 4D-Miner on artificial and real-world data. Section 6discusses the approach with respect to relevant work, and thepaper ends with perspectives for further research.

2 Position of the problem ; notations, criteria2.1 Notations and definitionsLet N be the number of measure points. To the i-th measurepoint is attached a spatial position1 Mi = (xi, yi, zi) and a

1MEG measure points actually belong to a 2D shape (the sur-face of the skull). However, the approach equally handles 2D or 3Dspatio-temporal data.

temporal activity Ci(t), t ∈ [1, T ]. Let I = [t1, t2] ⊂ [1, T ]

100 300 500 700

-200

-100

0

100

200

Figure 1: Magneto-EncephalographyData (N = 151, T = 875)

be a time interval, and let CIi denote the average activity ofthe i-th measure point during I . The I-alignment σI (i, j) ofmeasure points i and j over I is defined as:

σI (i, j) = < i, j >I ×(

1− |CIi−CIj ||CIi|

), where

< i, j >I =

∑t2

t=t1Ci(t).Cj(t)√∑

t2

t=t1Ci(t)2 ×

∑t2

t=t1Cj(t)2

As the sought patterns do not need to be spheric, ellipsoidaldistances are considered. Only axis-parallel ellipsoids will beconsidered throughout the paper. For each weight vectorw =(a, b, c) (w ∈ IR+3), distance dw is defined on the measurepoints as:

dw(i, j) =√a(xi − xj)2 + b(yi − yj)2 + c(zi − zj)2

2.2 Multi-objective formalizationA pattern X = i, I, w, r is characterized by a center pointi, i ∈ [1, N ], a time interval I , an ellipsoidal distance dw, anda radius r > 0. The spatial neighborhoodB(i, w, r) is definedas the set of measure points j such that dw(i, j) is less than r.

The spatial amplitude of X , noted a(X), is the number ofmeasure points in B(i, w, r). The temporal amplitude of X ,noted `(X), is the number of time steps in interval I . Thespatio-temporal alignment of X , noted σ(X) is defined asthe average, over all measure points in B(i, w, r), of their I-alignment with respect to the center i :

σ(X) =1

a(X)

j∈B(i,w,r)

σI (i, j)

Interesting spatio-temporal patterns, according to the ex-pert’s first specifications, show maximal spatial and temporalamplitudes together with a maximal spatio-temporal align-ment. Specifically, a solution pattern is such that i) increas-ing its spatial or temporal amplitude would cause the spatio-temporal alignment to decrease; ii) the only way of increasingits spatio-temporal alignment is by decreasing its spatial ortemporal amplitude. It thus comes naturally to formalize theSTP detection in terms of multi-objective optimization prob-lem (see [Deb, 2001] and references therein; [de la Iglesia etal., 2003; Bhattacharyya, 2000] for related data mining ap-proaches).

The three a, ` and σ criteria induce a partial order on theset of patterns, referred to as Pareto domination.Definition 1. (Pareto domination)Let c1, ...cK be K real-valued criteria defined on domain Ω,and let X and Y belong to Ω.X Pareto-dominates Y , notedX Y , iff ck(X) is greater orequal ck(Y ) for all k = 1..K, and the inequality is strict forat least one k0.

(X Y ) ⇐⇒∀k = 1..K, (ck(X) ≥ ck(Y )) and∃k0 s.t. (ck0(X) > ck0(Y ))

The set of non-dominated solutions after a set of criteria isclassically referred to as Pareto front [Deb, 2001].

2.3 Multi-modal multi-objective formalizationHowever, the above criteria only partially account for theexpert’s expectations: a STP might have a lesser spatio-temporal alignment and amplitude than another one, and stillbe worthy, provided that it corresponds to another active brainarea. Therefore, not all sought STPs belong to the Paretofront.

Multi-modal optimization, interested in finding global andlocal optima of the fitness landscape, has been extensivelystudied in the EC literature [Li et al., 2002]. Based on themulti-modal framework, a new framework extending multi-objective optimization and referred to as multi-modal multi-objective optimization (MoMOO), is thus proposed. For-mally, let us first define a relaxed inclusion relationship, notedp-inclusion.Definition 2. (p-inclusion)Let A and B be two subsets of set Ω, and let p be a posi-tive real number (p ∈ [0, 1]). A is p-included in B, notedA ⊂p B, iff |A⋂B| > p× |A|.

Defining adequately the support of a candidate solution X(see below) multi-modal Pareto domination can be defined asfollows:Definition 3. (multi-modal Pareto domination)With same notations as in Def. 1, X mo-Pareto dominates Y ,noted X mo Y , iff the support of Y is p-included in that ofX , and X Pareto-dominates Y .

X mo Y ⇐⇒ [(Supp(Y ) ⊂p Supp(X)) and (X Y )]

In the case of STPs, the support of X = (i, I, w, r)is naturally defined as Supp(X) = B(i, dw, r) × I , withSupp(X) ⊂ [1, N ]× [1, T ].

In contrast with [Corne et al., 2000; Laumanns et al., 2002]who use diversity-based heuristics for a better sampling of thePareto front, diversity is thus used in MoMOO to redefine andextend the set of solutions.

2.4 DiscussionFunctional brain imaging sets two specific requirements onspatio-temporal data mining. First, the expert’s requirementsare subject to change. Much background knowledge is in-volved in the manual extraction of stable spatio-temporal pat-terns, e.g. about the expected localization of the active brainareas. A flexible approach, accommodating further criteria

and allowing the user to customize the search (in particular,tuning the thresholds on the minimal spatial or temporal am-plitudes, or spatio-temporal alignment) is thus highly desir-able.

Second, the approach must be scalable with respect to thedata size (number of measure points and temporal resolution).Although the real data size is currently limited, the computa-tional cost must be controllable in order to efficiently adjustthe user-supplied parameters; in other words, the mining al-gorithm must be an any-time algorithm [Zilberstein, 1996].

The approach proposed is therefore based on evolution-ary algorithms (EAs); these are widely known as stochas-tic, population-based optimization algorithms [Back, 1995;Goldberg, 2002] that are highly flexible. In particular, EAsaddress multi-modal optimization [Li et al., 2002] and theycan be harnessed to sample the whole Pareto front associatedto a set of optimization criteria, with a moderate overheadcost compared to the standard approach (i.e., optimizing aweighted sum of the criteria gives a single point of the Paretofront) [Deb, 2001]. Last, the computational resources neededby EAs can be controlled in a straightforward way throughlimiting the number of generations and/or the population size.

3 Overview of 4D-MinerThis section describes the 4D-Miner algorithm designed forthe detection of stable spatio-temporal patterns.

3.1 InitializationFollowing [Daida, 1999], special care is devoted to the ini-tialization of this evolutionary algorithm. The extremities ofthe Pareto front, where STPs display a high correlation (re-spectively, a low correlation) on a small spatio-temporal re-gion (resp. a wide region), do not fulfill the expert’s expecta-tions. Accordingly, some user-supplied thresholds are set onthe minimal spatio-temporal amplitude and alignment.

In order to focus the search on relevant STPs, the initialpopulation is generated using the initialization operator, sam-pling patterns X = (i, w, I, r) as follows:• Center i is uniformly drawn in [1, N ];• Weight vector w is set to (1, 1, 1) (initial neighborhoods

are based on Euclidean distance);• Interval I = [t1, t2] is such that t1 is drawn with uniform

distribution in [1, T ]; the length t2−t1 of Ij is drawn ac-cording to a Gaussian distribution N (min`,min`/10),where min` is a time length threshold2.• Radius r is deterministically computed from a user-

supplied threshold minσ, corresponding to the minimalI-alignment desired.

r = minkdw(i, k) s.t. σIi,k > minσ)

All patterns X whose spatial amplitude a(X) is less than auser-supplied threshold mina are non-admissible; they willnot be selected to generate new offspring. The user-suppliedthresholds thus govern the proportion of usable individuals inthe initial population.

2In case t2 is greater than T , another interval I is sampled.

The computational complexity is in O(P × N × min`),where P is the population size, N is the number of measurepoints and min` is the average length of the intervals.

3.2 Variation operatorsFrom parent X = (i, w, I, r), mutation generates an off-spring by one among the following operators:

• replacing center i with another measure point inB(i, w, r);

• mutating w and r using self-adaptive Gaussian mutation[Back, 1995];

• incrementing or decrementing the bounds of interval I ;

• generating a brand new individual (using the initializa-tion operator).

The crossover operator, starting from parent X , first selectsthe mate pattern Y = (i′, w′, I ′, r′) by tournament selection,Y minimizing the sum of the Euclidean distance between iand i′, and the distance between the center of I and I ′ amongK patterns uniformly selected in the population, where K isset to P/10 in all the experiments. The offspring is generatedby:

• replacing i with the center i′ of the mate pattern Y ;

• replacing w (resp. r) using an arithmetic crossover of wand w′ (respectively r and r′);

• replacing I by the smallest interval containing I and I ′.

An offspring is said admissible iff it satisfies the user-suppliedthresholds mentioned in the initialization step.

3.3 Selection schemeA Pareto archive is constructed with size L (set to 10 ×P inthe experiments).

A steady-state scheme is used; at each step, an admissibleparent X is selected among K uniformly drawn individualsin the population, by retaining the one which is dominated bythe fewest individuals in the archive (Pareto tournament [Deb,2001]). A single offspringY is generated fromX by applyinga variation operator among the ones mentioned above.

Offspring Y is evaluated by computing criteriaa(Y ), `(Y ), σ(Y ). Y is rejected if it is mo-Pareto dominatedin the population; otherwise, it replaces a non-admissibleindividual in the population if any; if none it replaces an in-dividual selected after anti-Pareto tournament (the individualout of K randomly selected ones in the population, that isdominated by the most individuals in the archive).

The archive is updated every P generations, replacing themo-Pareto dominated individuals in the archive with individ-uals selected from the population after Pareto tournament.

4 Experimental setting and goalThis section presents the goal of the experiments, describesthe artificial and real-world datasets used, and finally givesthe parameters of the algorithm and the performance mea-sures used to evaluate the algorithm.

4.1 Goals of experiments and datasetsThe initial goal is to provide the expert with a usable STP de-tection algorithm. The real datasets have been collected frompeople observing a moving ball. Each dataset involves 151measure points and the number of time steps is 875. As can benoted from Fig. 1, which shows a representative dataset, theamplitude of the activities widely vary along the time dimen-sion. The other goal is to assess the scalability and the perfor-mances of 4D-Miner, which is done using artificial datasets.

The artificial datasets are generated as follows. N mea-sure points are located in uniformly selected locations in the3D domain [0, 1]3. Activities are initialized from random cu-mulative uniform noise, with Ci(t + 1) = Ci(t) + ε, andε is drawn according to U(0, τ). Every target STP S =(i, w, I, r, δ) is defined from a center i, a weight vector w,a time interval I , a radius r, and a fading factor δ, used tobias the activities as detailed below. The activity CS of theSTP S is the average activity in the spatio-temporal regionB(i, w, r) × I . Thereafter, activities are biased according tothe target STPs: for each measure point j, for each STP Ssuch that j is influenced by S (dw(i, j) < r), the activityCj(t) is smoothed as

Cj(t) = (1− e−αi(j,t))Cj(t) + e−αi(j,t)CSwhere αi(j, t) = dw(i, j) + δ × d(t, I)

and d(t, I) is the distance of t to interval I (0 if t belongsto I , otherwise the minimum distance of t to the bounds ofI). The scalability of 4D-Miner is assessed from its empirical

Time

Activ

ity

0 1000 2000500 1500

0

1

0.5

1.5Artificial STP

Figure 2: An artificial STP (T = 2000, N = 2000)

complexity depending on the number N of measure pointsand the number T of time steps.

The performances of 4D-Miner are measured using thestandard criteria in information retrieval, recall and precision.The recall is the fraction of target STPs that are identified by4D-Miner, i.e. p-included in an individual in the archive; theprecision is the fraction of relevant individuals in the archive(p-included in a target STP). For each experimental setting(see below), the recall and precision are averaged over 21 in-dependent runs. The number of target STPs is set to 10 inall experiments. Each STP influences a number of measurepoints varying in [10,20], during intervals uniformly selectedin [1, T ] with length varying in [15, 25], and δ uniformly se-lected in [0, .05]. Each problem instance thus includes STPswith various levels of difficulty; the detection is hindered asthe spatio-temporal support (r and I) is comparatively lowand the δ parameter increases (the regularity is not visibleoutside interval I , as in Fig 2).

4.2 Experimental settingThe experiments reported in the next section considers a pop-ulation size P = 200, evolved along 8000 generations (8,000fitness evaluations per run). A few preliminary runs wereused to adjust the operator rates; the mutation and crossoverrates are respectively set to .7 and .3.

The number N of measure points ranges in500, 1000, 2000, 4000. The number T of time stepsranges in 1000, 2000, 4000, 8000.

The thresholds used in the initialization are: mina = 5(minimum number of curves supporting a pattern);min` = 5(minimum temporal amplitude of a pattern); minσ = .1(minimum spatio-temporal alignment of two curves in a pat-tern).

For computational efficiency, the p-inclusion is computedas: X is p-included in Y if the center i of X belongs to thespatial support of Y , and there is an overlap between theirtime intervals.

The maximal size of the datasets (T = 8000, N = 4000)is 456 Mo. Computational runtimes are given on PC-PentiumIV, 2.4 GHz. 4D-Miner is written in C++.

5 Experimental validationThis section reports on the experiments done using 4D-Miner.

5.1 Experiments on real datasets

Time

Activ

ity

100 300 500 700

-100

0

100

Stable spatio-temporal pattern

Figure 3: A stable spatio-temporal pattern, involving 8 mea-sure points within interval [289,297], alignment .2930

Typical STPs found in the real datasets are shown in Figs. 3and 4, showing all curves belonging to the STP plus the time-window of the pattern. The runtime is less than 1 minute (PCPentium 1.4 GHz). As discussed in section 2.3, many relevantSTPs are Pareto-dominated: typically the STP shown in Fig3 is dominated by the one in Fig 4.

These patterns are considered satisfactory by the expert.All experiments confirm the importance of the user-definedthresholds, controlling the quality of the initial population.Due to the variability of the data, these threshold parametersare adjusted for each new experiment.

The coarse tuning of the parameters can be achieved basedon the desired proportion of admissible individuals in the ini-tial population. However, the fine-tuning of the parameterscould not be automatized up to now, and it still requires run-ning 4D-Miner a few times. For this reason, the control ofthe computational cost through the population size and num-ber of generations is one of the key features of the algorithm.

Time

Activ

ity

100 300 500 700

0

100

-50

50

Stable spatio-temporal pattern

Figure 4: Another stable spatio-temporal pattern involving 9measure points within interval [644,664], alignment .3963

5.2 Multi-objective multi-modal optimizationThe good scalability of 4D-Miner is illustrated in Fig. 5.The empirical complexity of the approach is insensitive tothe number of time steps T and linear in the number N ofmeasure points. This computational robustness confirms theanalysis (section 3.1), the evaluation of each pattern has linearcomplexity in N . On-going work is concerned with exploit-ing additional data structures inspired from [Wu et al., 2004]in order to decrease the complexity in N . Table 1 reports the

N

runti

me (s

ec.)

0 1000 2000 3000 4000

0

100

200T = 1000T = 2000T = 4000T = 8000

Computational Cost

Figure 5: Computational cost vs N , for T =1000, 2000, 4000, 8000. (in seconds, on PC 2.4 GHz)

recall achieved by 4D-Miner over the range of experiments,averaged over 21 independent runs, together with the stan-dard deviation. On-line performances are illustrated on Fig

N T1,000 2,000 4,000 8,000

500 98 ± 5 93 ± 9 92 ± 7 79 ± 161000 96 ± 6 96 ± 6 82 ± 14 67 ± 122000 96 ± 5 87 ± 12 72 ± 14 49 ± 154000 89± 10 81 ± 13 56 ± 14 32 ± 16

Table 1: Recall achieved by 4D-Miner vs N and T (averagepercentage and standard deviation over 21 runs)

6. These results confirm the robustness of the proposed ap-proach: a graceful degradation of the recall is observed as Tand N increase. It must be noted that STPs occupy a negli-gible fraction of the spatio-temporal domain (circa 10−4 forT = 8000, N = 4000). The average precision is low, rang-ing from 12 to 20% (results omitted due to space limitations).However, post-pruning can be used to sort the wheat fromthe chaff in the final archive, and increase the precision up to

Fitness evaluations

Reca

ll

41. 10 43. 10 45. 10 47. 10

0

1

0.5

4D-Miner Recall, T = 4000

Figure 6: Recall achieved by 4D-Miner vs the number of fit-ness evaluations. (T = 4000, N = 500, 1000, 2000, 4000,average over 21 runs).

50% without decreasing the recall; the post-pruning straight-forwardly removes the STPs with small spatial or temporalamplitudes.Quite different effects are obtained when the archive is prunedalong the search (e.g. by increasing the thresholds on the min-imal spatial and temporal amplitudes), which decreases theoverall performances by an order of magnitude; interestingly,similar effects are observed in constrained evolutionary op-timization when the fraction of admissible solutions is verylow.

A final remark is that the performances heavily dependupon the user-supplied thresholds (section 3.1), controllingthe diversity and effective size of the population. Indeed, aparametric model of the dataset would enable automatic ad-justment of these parameters. It might also be viewed as a ad-vantage of 4D-Miner that it does not require any prior modelof the data, and that inexpensive preliminary runs can be usedto adjust the needed parameters.

6 Relevant workThis brief review does not claim exhaustiveness; the inter-ested reader is referred to [Shekhar et al., 2003; Roddickand Spiliopoulou, 2002] for comprehensive surveys. Spatio-temporal data mining applications (e.g., remote sensing, envi-ronmental studies, medical imaging) are known to be compu-tationally heavy; as most standard statistical methods do notscale up properly, new techniques have been developed, in-cluding randomized variants of statistical algorithms. Manydevelopments are targeted at efficient access primitives and/orcomplex data structures (see, e.g., [Wu et al., 2004]); an-other line of research is based on visual and interactive datamining (see, e.g., [Keim et al., 2004]), exploiting the unri-valed capacities of human eyes for spotting regularities in 2D-data. Spatio-temporal data-mining mostly focuses on clus-tering, outlier detection, denoising, and trend analysis. Forinstance, [Chudova et al., 2003] used EM algorithms fornon-parametric characterization of functional data (e.g. cy-clone trajectories), with special care regarding the invarianceof the models with respect to temporal translations. Themain limitation of such non-parametric models, includingMarkov Random Fields, is their computational complexity,sidestepped by using randomized search for model estimates.

7 Discussion and PerspectivesThis paper has proposed a stochastic approach for mining sta-ble spatio-temporal patterns. Indeed, a very simple alterna-tive would be to discretize the spatio-temporal domain andcompute the correlation of the signals in each cell of the dis-cretization grid. However, it is believed that the proposedapproach presents several advantages compared to the bruteforce, discretization-based, alternative. First of all, 4D-Mineris a fast and frugal algorithm; its good performances and scal-ability have been successfully demonstrated on medium sizedartificial datasets. Second, data mining applications specifi-cally involve two key steps, exemplified in this paper: i) un-derstanding the expert’s goals and requirements; ii) tuningthe parameters involved in the specifications. With regard toboth steps, the ability of working under bounded resources isa very significant advantage; any-time algorithms allow theuser to check whether the process can deliver useful results,at a low cost. Further research is concerned with extend-ing 4D-Miner in a supervised learning perspective (findingthe STPs that are complete − active in several persons un-dergoing the same experiment − and correct − not active inthe control experiment). The challenge is to directly handlethe additional constraints of completeness and correction inthe multi-objective multi-modal optimization framework pre-sented here.

AcknowledgmentsThe authors are partially supported by the PASCAL Networkof Excellence (IST-2002-506778); this study was supportedby a grant from the French Ministry of Research & Technol-ogy ”ACI, New Applications of Mathematics”.

References[Back, 1995] T. Back. Evolutionary Algorithms in theory

and practice. New-York:Oxford University Press,1995.[Bhattacharyya, 2000] S. Bhattacharyya, Evolutionary algo-

rithms in data mining: multi-objective performance mod-eling for direct marketing. In KDD’00, pp 465-473. ACMPress, 2000.

[Chudova et al., 2003] D. Chudova, S. Gaffney, E. Mjol-sness, and P. Smyth. Translation-invariant mixture mod-els for curve clustering. In KDD’03, pages 79–88. ACMPress, 2003.

[Corne et al., 2000] D. Corne, J. D. Knowles, and M. J.Oates. The Pareto envelope-based selection algorithm formulti-objective optimisation. In PPSN VI, pages 839–848.Springer Verlag, 2000.

[Daida, 1999] J.M. Daida. Challenges with verification, re-peatability, and meaningful comparison in Genetic Pro-gramming. In GECCO’99, pages 1069–1076. MorganKaufmann, 1999.

[Deb, 2001] K. Deb. Multi-Objective Optimization UsingEvolutionary Algorithms. John Wiley, 2001.

[Francisci et al., 2003] D. Francisci, M. Collard, Multi-Criteria Evaluation of Interesting Dependencies accordingto a Data Mining Approach, In CEC’03, Vol. 3, pp. 1568-1574, IEEE Press, 2003.

[Ghosh and Nath, 2004] A. Ghosh and B. Nath. Multi-objective rule mining using genetic algorithms. Inf. Sci.,163(1-3):123–133, 2004.

[Goldberg, 2002] D. Goldberg. The Design of Innovation:Lessons from and for Genetic and Evolutionary Algo-rithms. MIT Press, 2002.

[Hyvarinen et al., 2001] A. Hyvarinen, J. Karhunen, andE. Oja. Independent Component Analysis. Wiley NewYork, 2001.

[Hamalainen et al., 1993] M. Hamalainen, R. Hari, R. Il-moniemi, J. Knuutila, and O. V. Lounasmaa. Magnetoen-cephalography: theory, instrumentation, and applicationsto noninvasive studies of the working human brain. Rev.Mod. Phys, 65:413–497, 1993.

[de la Iglesia et al., 2003] B. de la Iglesia, M.S. Philpott, A.J.Bagnall, V.J. Rayward-Smith, Data Mining Using Multi-Objective Evolutionary Algorithms, In CEC’03, Vol. 3,pp. 1552-1559, IEEE Press, 2003.

[Keim et al., 2004] D. A. Keim, J. Schneidewind, andM. Sips. Circleview: a new approach for visualizing time-related multidimensional data sets. In Proc. of AdvancedVisual Interfaces, pages 179–182. ACM Press, 2004.

[Laumanns et al., 2002] M. Laumanns, L. Thiele, K. Deb,and E. Zitsler. Combining convergence and diversity inevolutionary multi-objective optimization. EvolutionaryComputation, 10(3):263–282, 2002.

[Li et al., 2002] J.-P. Li, M. E. Balazs, G. T. Parks, and P. J.Clarkson. A species conserving genetic algorithm for mul-timodal function optimization. Evolutionary Computation,10(3):207–234, 2002.

[Pantazis et al., 2005] D. Pantazis, T. E. Nichols, S. Baillet,and R.M. Leahy. A comparison of random field theory andpermutation methods for the statistical analysis of MEGdata. Neuroimage, to appear in 2005.

[Roddick and Spiliopoulou, 2002] J.F. Roddick andM. Spiliopoulou. A survey of temporal knowledgediscovery paradigms and methods. IEEE Trans. onKnowledge and Data Engineering, 14(4):750–767, 2002.

[Saunders et al., 2004] C. Saunders, D. R. Hardoon,J. Shawe-Taylor, and G. Widmer. Using string kernels toidentify famous performers from their playing style. InECML04, pages 384–395. Springer Verlag, 2004.

[Shekhar et al., 2003] S. Shekhar, P. Zhang, Y. Huang, andR. R. Vatsavai. Spatial data mining. In H. Kargupta andA. Joshi, eds, Data Mining: Next Generation Challengesand Future Directions. AAAI/MIT Press, 2003.

[Wu et al., 2004] K.L. Wu, S.K. Chen, and P.S. Yu. Intervalquery indexing for efficient stream processing. In 13thACM Conf. on Information and Knowledge Management,pages 88–97, 2004.

[Zilberstein, 1996] S. Zilberstein. Resource-bounded rea-soning in intelligent systems. Computing Surveys, 28(4),1996.