a tree for rocks—hierarchies in stratigraphic databases

8
Computers & Geosciences 29 (2003) 921–928 A tree for rocks—hierarchies in stratigraphic databases Robert Huber a , Jens Klump b, *, Stefan G . otz c a Fachbereich Geowissenschaften, Universit . at Bremen, Postfach 330 440, 28334 Bremen, Germany b GeoForschungsZentrum Potsdam, Telegrafenberg A3, Potsdam 14473, Germany c Geologisches Institut, Universit . at Karlsruhe, KaiserstraX e 12, Karlsruhe 76131, Germany Received 5 April 2002; received in revised form 19 March 2003; accepted 25 April 2003 Abstract In this paper we present a new data model for a computer-supported management of stratigraphic information in relational databases, with special consideration to the hierarchical order of stratigraphic units. The huge amount of data on stratigraphic units provided by earth scientists during the last decades requires data models which—besides names, ages and background information such as used taxa in biostratigraphic units—can store both the position of a single unit within a stratigraphic column as well as its position within the stratigraphic hierarchy. The hierarchy management in the present data model is based on a modification of the nested-set data model, which is the core of the stratigraphy.net stratigraphic database system (http://www.stratigraphy.net). The nested-set data model allows one to store alternative classifications of stratigraphic units with theoretically unlimited number of tree levels and unlimited amount of tree elements. Multiple hierarchies based on different disciplines such as biostratigraphy or lithostratigraphy can be stored in databases using the model presented here. In comparison to other models, the modified nested-set data model allows a high query performance. It permits an almost unlimited flexibility in describing the complex nature of stratigraphical information without limitations in the definition of standard query language queries. The application of this model is presented with detailed examples from the stratigraphy.net database. r 2003 Elsevier Ltd. All rights reserved. Keywords: Nested-set model; Data model; Lithostratigraphy; Biostratigraphy; Chronostratigraphy 1. Introduction Among the community of earth scientists there is a growing demand for modern, computer-aided manage- ment of stratigraphic information. Over the last two decades, the use of both traditional techniques, such as biostratigraphy and lithostratigraphy, and modern methods, such as isotope stratigraphy, has produced numerous well-defined stratigraphic and geochronologic units. Furthermore, the ongoing debate about the quality and the exact temporal position of these units and their boundaries resulted in a sizeable body of publications dealing with new interpretations and dating of stratigraphic and geochronologic units (e.g. Harland et al., 1990; Cande and Kent, 1992; Gradstein et al., 1995). Modern relational databases provide an excellent tool for the management of stratigraphic information. Relational databases consist of a set of tables that are interrelated by shared key attributes. The structure of these tables and how they relate to each other should ideally represent the relevant aspects of the ‘‘real world’’ and at the same time provide maximum flexibility when it comes to storing and retrieving information. Efforts have been undertaken already to find such a struc- ture for stratigraphic information (e.g. Diehl, 1992; Treworgy, 1992). A particular challenge for the implementation of an adequate database structure is the arrangement of hierarchically information. Hierarchies are very ARTICLE IN PRESS *Corresponding author. Tel.: +49-331-288-1702; fax: +49- 331-288-1703. E-mail address: [email protected] (J. Klump). 0098-3004/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0098-3004(03)00092-X

Upload: robert-huber

Post on 15-Sep-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Computers & Geosciences 29 (2003) 921–928

A tree for rocks—hierarchies in stratigraphic databases

Robert Hubera, Jens Klumpb,*, Stefan G .otzc

aFachbereich Geowissenschaften, Universit .at Bremen, Postfach 330 440, 28334 Bremen, GermanybGeoForschungsZentrum Potsdam, Telegrafenberg A3, Potsdam 14473, Germany

cGeologisches Institut, Universit .at Karlsruhe, KaiserstraX e 12, Karlsruhe 76131, Germany

Received 5 April 2002; received in revised form 19 March 2003; accepted 25 April 2003

Abstract

In this paper we present a new data model for a computer-supported management of stratigraphic information in

relational databases, with special consideration to the hierarchical order of stratigraphic units. The huge amount of

data on stratigraphic units provided by earth scientists during the last decades requires data models which—besides

names, ages and background information such as used taxa in biostratigraphic units—can store both the position of a

single unit within a stratigraphic column as well as its position within the stratigraphic hierarchy. The hierarchy

management in the present data model is based on a modification of the nested-set data model, which is the core of the

stratigraphy.net stratigraphic database system (http://www.stratigraphy.net). The nested-set data model allows one to

store alternative classifications of stratigraphic units with theoretically unlimited number of tree levels and unlimited

amount of tree elements. Multiple hierarchies based on different disciplines such as biostratigraphy or lithostratigraphy

can be stored in databases using the model presented here. In comparison to other models, the modified nested-set data

model allows a high query performance. It permits an almost unlimited flexibility in describing the complex nature of

stratigraphical information without limitations in the definition of standard query language queries. The application of

this model is presented with detailed examples from the stratigraphy.net database.

r 2003 Elsevier Ltd. All rights reserved.

Keywords: Nested-set model; Data model; Lithostratigraphy; Biostratigraphy; Chronostratigraphy

1. Introduction

Among the community of earth scientists there is a

growing demand for modern, computer-aided manage-

ment of stratigraphic information. Over the last two

decades, the use of both traditional techniques, such as

biostratigraphy and lithostratigraphy, and modern

methods, such as isotope stratigraphy, has produced

numerous well-defined stratigraphic and geochronologic

units. Furthermore, the ongoing debate about the

quality and the exact temporal position of these units

and their boundaries resulted in a sizeable body of

publications dealing with new interpretations and dating

of stratigraphic and geochronologic units (e.g. Harland

et al., 1990; Cande and Kent, 1992; Gradstein et al.,

1995).

Modern relational databases provide an excellent tool

for the management of stratigraphic information.

Relational databases consist of a set of tables that are

interrelated by shared key attributes. The structure of

these tables and how they relate to each other should

ideally represent the relevant aspects of the ‘‘real world’’

and at the same time provide maximum flexibility when

it comes to storing and retrieving information. Efforts

have been undertaken already to find such a struc-

ture for stratigraphic information (e.g. Diehl, 1992;

Treworgy, 1992).

A particular challenge for the implementation of

an adequate database structure is the arrangement of

hierarchically information. Hierarchies are very

ARTICLE IN PRESS

*Corresponding author. Tel.: +49-331-288-1702; fax: +49-

331-288-1703.

E-mail address: [email protected] (J. Klump).

0098-3004/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0098-3004(03)00092-X

common in stratigraphy. For example, chronostrati-

graphic units are arranged as Eonothem–Erathem–

System–Series–Stage, lithostratigraphic units are arranged

as Group–Formation–Member–Bed, etc. (Salvador, 1994;

Murphy and Salvador, 1999). These hierarchies can be

represented as a tree structure (Fig. 1).

In such a system, each stratigraphic or geochronologic

unit of a higher rank may contain one ore more units of

a lower rank, similarly, a unit of a lower rank belongs to

exactly one unit of a higher rank. Each higher-rank unit

is characterized by all features of its lower-rank units. In

addition, the nature of the stratigraphic column requires

the position of the single units within one level of the

tree to be fixed since that position represents the

chronological order of the units (Fig. 1).

Within the rules for establishing a stratigraphic

classification this basic hierarchical pattern is compli-

cated by a certain flexibility. For example, the establish-

ment of Sub- and Supersystems (e.g. the Pennsylvanian

and Mississippian Subsystem, see Remane et al., 2001) is

allowed in chronostratigraphic units (Salvador, 1994).

Additionally, some stratigraphic units are only defined

for specific regions, prominent examples being the

Ordovician or the Carboniferous (Remane et al., 2001).

Most earth scientists require stratigraphic-back-

ground information for their work, like ages of the

stratigraphic units, their properties, and their position

within the stratigraphic hierarchy. The international

stratigraphic guide (Salvador, 1994) gives an example of

the problems earth scientists might encounter: ‘‘Any

stratigrapher will readily understand if a colleague states

that he has been studying the Jurassic, or the Miocene or

the Turonian of some area. However, if only the name of

a formation, of a biostratigraphic zone, or of some other

type of more local stratigraphic unit is mentioned,

stratigraphers in other parts of the world may not be

able to recognize even approximately the position of the

unit within the stratigraphic column’’.

To facilitate better and easier communication and

handling of stratigraphic information, we present a new

data structure for a relational database which allows one

to store and retrieve all kinds of stratigraphic hierar-

chies, thus meeting the demands of modern information

management in the earth sciences.

1.1. State of the art: present database concepts on

stratigraphic hierarchies

1.1.1. The ‘‘code model’’

The easiest way to model stratigraphic hierarchies, as

proposed for example by Treworgy (1992) or Diehl

(1992), is to provide an attribute that holds an encoded

key representing the whole path from the root of the tree

down to a distinct tree element (Fig. 2). For example, an

8-digit key can hold a tree structure with four levels,

each containing at most 10 elements. An advantage of

this model is that the information contained in the key is

readable by the geoscientist, especially when previously

used stratigraphic codes can still be used in the newly

designed database (Diehl, 1992). However, this also

means that a certain degree of knowledge about this

code is needed when the data are being entered or

retrieved; the code must be known by the data producer

and the data user.

There are some disadvantages to the ‘‘code model’’.

Mainly its restriction to a distinct number of elements

and to a distinct number of hierarchical levels. This is

ARTICLE IN PRESS

Fig. 1. Representation of hierarchy of chronostratigraphic

units as a basic tree structure. Highest rank unit, Eonothem,

can be understood as root of a tree branching to its

corresponding lower-rank units, providing nodes branching to

next level, etc. Relation between units is expressed by arcs

between nodes. Stratigraphic units within a distinct level are not

laterally connected by arcs, their temporal order is not

expressed by a simple tree. In present example, temporal

arrangement of nodes belonging to a distinct level would follow

consecutive numbers of nodes from right to left side as

indicated by arrow.

Fig. 2. Hierarchical position of a stratigraphic stage expressed

by a numerical code in ‘‘code model’’. Each stratigraphic rank

is described by distinct digits of code. Here first digit encodes an

Eonothem, next two digits Erathem, etc. Thus, complete code

describes path from Eonothem down to Stage. In present

example code—taken from Giles et al. (1997)—structure’s

numbers represent Fennian stage.

R. Huber et al. / Computers & Geosciences 29 (2003) 921–928922

not desirable in the light of a permanently growing body

of progressively refined scientific data. Furthermore, the

‘‘code model’’ imposes restrictions on the use of

standard query languages (SQL). Special algorithms

for the analysis of the coded data within the key are

needed in addition to a standard database query

interface (Diehl, 1992; Al-Mishwat, 1997), which in

most cases requires additional software which frequently

is proprietary. It is not possible to query such a database

without an additional non-standard software package,

which in turn means stronger dependence on persons or

institutions who know how to decipher the code.

1.1.2. The ‘‘chain model’’

The ‘‘chain model’’ (Morris, 2000a, b) uses one table

for each level of the tree, e.g. one table for Eon, Era,

System, Series, etc. Each of these tables is related to the

table representing the higher level by a 1:n (one-to-

many) relation. As a result a cascade or chain of tables

represents the hierarchy of the tree (Fig. 3). This data

structure makes it easy to formulate queries in an SQL

because there is no more need to decipher codes.

However, the restriction to a distinct number of levels

can cause new problems: what if a Sub- or Superlevel is

introduced? Another problem occurs if one of the levels

required by the data model is missing in the real world.

For example, the recent edition of the International

Stratigraphic Chart (Remane et al., 2001) presents the

Subsystem Pennsylvanian as a member of the Carboni-

ferous System which is divided into several Stages. In

this case the Series level is missing, a situation that

cannot be handled by the chain model, because an entry

in table ‘‘Series’’ is required.

Stratigraphic information in a database is often

needed to identify that an entry of a database table

belongs to a distinct stratigraphic unit or age. For

example, if we want to assign an age to a fossil in a

collection, a table ‘‘Fossil’’ might have a column

‘‘Stratigraphic Age’’. The chain model requires the user

to always enter within the hierarchy the stratigraphic or

geochronologic unit of the lowest level (and thus the

highest stratigraphic precision). To be able to enter

fuzzy ages like ‘‘Phanerozoic’’ the table ‘‘Fossil’’ would

now need additional columns to hold information on

each level of the stratigraphic hierarchy. It is obvious

that such a data structure in turn very much complicates

queries, because the table and column which hold the

required data are not known from the start of a query

process.

1.1.3. The ‘‘parent–child’’ or ‘‘Adjacency model’’

In many relational databases, tree structures are

modeled by the ‘parent–child’ or ‘adjacency model’.

This model often is used in taxonomic databases (e.g.

Berendson, 1995, 1999; Morris, 2000a) where hierarchies

among taxonomic categories play an important role.

Such ‘phylogenetic’ lineages can be used to describe

stratigraphic hierarchies (Bain and Giles, 1997; Giles

et al., 1997). The simplest solution of this model (Fig. 4)

consists of a table which has a column which holds a

number (child) as a unique identifiers of each table row

and a column which holds the names of stratigraphic or

geochronologic units. Another column of this table

holds another number (parent) which has to follow two

rules: the number has to be used already as a ‘child’ in

another row and the number must be different from the

‘child’ number of the same row. By this construct, a

higher rank of the stratigraphic or geochronologic unit

can be found by finding the table row which holds the

same ‘parent’ number than the ‘child’ number of the

current row. This structure is called an adjacency matrix

(Celko, 1999).

The advantages of the adjacency model is the clarity

in the relation between parent and child elements. By

just reading the table, it would be possible for a

geoscientist to decide to which higher rank a strati-

graphic unit belongs. But the model gives only the

immediate higher rank of a stratigraphic unit. If we need

to know the top unit with the highest rank within a

stratigraphic hierarchy, we encounter some difficulties.

ARTICLE IN PRESS

Fig. 3. Database structure of a typical ‘‘chain model’’. Database tables are drawn as bold labeled rectangular boxes containing most

important columns of each table, primary and foreign keys. Names of tables and their positions within chain indicate rank of units.

Entries in primary key columns must be unique. Foreign key entries can occur once or repeatedly, but they cannot contain values other

than corresponding primary key. This one-to-many (1:n) relation between tables is further indicated by arrows which are always

pointing from foreign key to primary key column. To represent hierarchy as in Fig. 2, table ‘Eonothem’ would have entry

‘Eonothem id’=‘8’, table ‘Erathem’ would have ‘Erathem id’=‘80’ and ‘Eonothem id’=‘8’, and so on. Such tables typically have

additional columns such as ‘Eonothem name’ which would have entry ‘Phanerozoic’ in present example.

R. Huber et al. / Computers & Geosciences 29 (2003) 921–928 923

In a lithostratigraphic context, someone might need to

know to which Formation, Group or Supergroup a

lithological member belongs. As described above, we

can easily find the ‘parent’ of a distinct unit from the

database table. This ‘parent’ can then be used as the

starting point (‘child’) to get to the next higher-rank

unit, and so on. Because we cannot know how many

units are between the top unit and our starting point,

this task requires a recursive query until no more

‘parent’ can be found, which means, we cannot retrieve

this information with a standard SQL query.

1.1.4. The ‘‘nested-set’’ data model on stratigraphic

hierarchies in relational databases and its use in the

stratigraphy.net data model: description and discussion

The data model we introduce here is being used by the

stratigraphy.net information system and represents only

a subset of a much larger model. It can be found on the

internet at http://www.stratigraphy.net. The complete

stratigraphy.net data model consists of about 80 tables,

and describing the entire system is beyond the scope

of this paper. However, the subset we show in this

paper contains all necessary entities (tables) to suffi-

ciently describe stratigraphic units and their hierarchical

positions.

The core tables of the present model (Fig. 5) are

zone name and potential zone. Zone name simply holds

names of stratigraphic or geochronologic units such as

‘Jurassic’ or ‘Warwickshire Group’ and is introduced

here mainly to avoid redundancies (duplicates of

database entries) in case of, e.g., homonym names of

stratigraphic units. Further, authors of stratigraphic

literature can follow a different concept of a strati-

graphic unit while using the same name. Therefore table

potential zone was introduced. To track these concepts,

table potential zone relates to tables zone name and

zone rank which allows one to store different opinions

on the rank of a stratigraphic unit. To completely

describe a stratigraphic concept of an author, it is

essential to know the literature reference where the

concept was introduced. Because a citation can as well

describe more than one stratigraphic unit, the resulting

n:m (one-to-many) relation is resolved in our model by

the entity ‘‘zone documents’’. This separation of names

and concept was first proposed by Berendson (1995,

1999) for taxonomic databases.

The definition of stratigraphic or geochronologic units

additionally requires the user to assign boundaries to

each unit (Fig. 5) which is done by an entry in the table

‘‘boundary’’ where an upper and lower boundary can be

assigned to each ‘‘potential zone’’. Additional tables,

such as ‘‘boundary absolute ages’’ and ‘‘documents’’,

can store the assigned absolute ages of the boundaries

and their literature reference.

Information about the position of these potential

stratigraphic or geochronologic units within the strati-

graphic hierarchy are given in two separate ways. In the

easier case, hierarchy information can be assigned to a

ARTICLE IN PRESS

Fig. 4. (A) In ‘‘parent–child’’ model every lower-rank unit, ‘child’, knows to which higher-rank unit, ‘parent’, it belongs as indicated by

direction of arrows between nodes. (B) Same tree as in (A) as nested sets, drawn here as ovals. Root element is represented by set

containing all subordinate sets, parent–child relationship is shown here as set containment. Redrawn after Celko (see footnote 1). (C)

Database structure of parent–child model consists of one table where hierarchy is stored. Array which points from table columns

‘Child’ to ‘Parent’ indicates that table is referencing itself (self-reference). For example we introduced in Fig. 1 an entry ‘80’ in column

‘Child’ and entry ‘8’ in column ‘Parent’ which would mean that stratigraphic unit ‘8’ contains lower-rank unit ‘80’, or translated:

Phanerozoic contains Palaeozoic. Rank of units has to be stored in column ‘Rank’. Additional information on stratigraphic units such

as their names or ages can be stored in other tables referencing the ‘Stratigraphy’ table.

R. Huber et al. / Computers & Geosciences 29 (2003) 921–928924

stratigraphic unit by the use of a pointer to another

table called ‘‘zone rank’’ which holds the names of

the stratigraphic ranks, e.g. ‘‘Period’’, ‘‘Epoch’’, etc.

(Fig. 5). The ranking of a stratigraphic unit within the

stratigraphic hierarchy has to be solved separately by

means of a tree-like data structure.

Tree structures can be described as unidirectional

graphs that consist of a set of nodes that branch to other

nodes and which are connected by arcs. The top of the

tree is called the ‘‘root’’, and those nodes of the tree that

have no subtrees beneath them are called ‘‘leaf nodes’’.

In stratigraphic databases, the nodes would represent

stratigraphic units and the arcs would assign their

position within the hierarchy. An interesting method to

model such hierarchies for relational databases is the

‘‘nested-set model’’ proposed by Celko (1999) (see also

the author’s excellent articles on the web1,2,3,4 on

this subject). With some modifications, this model

ideally meets the needs of stratigraphic databases. In

the database model we present here, the tables

‘‘zone theme’’, ‘‘zone trees’’ and ‘‘zone hierarchy type’’

are used to represent the tree structure (Fig. 5).

The grammatical plural that is used in the name of the

table ‘‘zone trees’’ indicates that many trees can be

stored in this table. For example, beneath the tree

structure of the international stratigraphic chart (Re-

mane et al., 2001) additional hierarchies of lithostrati-

graphic zonations can be stored in the same table. The

table ‘‘zone theme’’ holds a description of these single

trees, and each node of a distinct tree points to this table

by the use of a foreign key (Fig. 5). The individual

stratigraphic units or ‘‘potential zones’’ in this context

are represented by the foreign key ‘‘potential zone id’’

which points to the table ‘‘potential zone’’ (Fig. 5).

To understand how a single table, ‘‘zone trees’’, can

store all necessary information of multiple hierarchies,

the attributes ‘‘tree left’’ and ‘‘tree right’’ are of

particular importance (Fig. 5). These two numerical

values are assigned to each stratigraphic unit within the

tree and are critical to the entire information about the

hierarchical organization of a tree. To understand how

the numbering of the nodes works, imagine a person

who was ordered to attach labels on the nodes of a tree.

Starting at the root, the person has to follow the path

around the tree and will visit each node twice, at the left

ARTICLE IN PRESS

Fig. 5. Stratigraphic hierarchy and information handling within stratigraphy.net data model. Database tables are drawn as rectangular

boxes containing most important columns of each table. Names of primary and foreign keys end with ‘ ID’. Relations between tables

are indicated by arrows pointing from foreign key to primary key column.

1Celko, J., 1996. A look at SQL trees. DBMS Magazine

online: http://www.dbmsmag.com/9603d06.html.2Celko, J., 1996. SQL lessons. DBMS Magazine online:

http://www.dbmsmag.com/9604d06.html.3Celko, J., 1996. Nontraditional databases. DBMSMagazine

online: http://www.dbmsmag.com/9605d06.html.4Celko, J., 1996. When good data goes bad data DBMS

Magazine online: http://www.dbmsmag.com/9606d06.html.

R. Huber et al. / Computers & Geosciences 29 (2003) 921–928 925

side as well as at the right side. Each time the person

reaches a side of the node, he attaches a label to the tree

and writes consecutive numbers on these labels (Fig. 6).

When we write these numbers to a database table

(Fig. 6), the way the nodes have been numbered

‘‘tree left’’ and ‘‘tree right’’ has some interesting

properties:

(i) The root will always have 1 as the ‘‘tree left’’ value

and the number of nodes multiplied by 2 as the

‘‘tree right’’ value.

(ii) The difference between the ‘‘tree right’’ and the

‘‘tree left’’ value of ‘‘leaf nodes’’ will always be 1.

(iii) Complete branches of the tree can be identified by

the ‘‘tree left’’ and ‘‘tree right’’ values of their

nodes which lay between the ‘‘tree left’’ and

‘‘tree right’’ values of the root of the branch

(Fig. 7). For example, if we would search all

lower-rank stratigraphic units which belong to a

distinct formation, the ‘‘tree left’’ and ‘‘tree right’’

values of these units would lay between the

‘‘tree left’’ and ‘‘tree right’’ values of this formation.

Furthermore, and most useful to stratigraphy, not

only the vertical hierarchy within the tree is covered by

the model, but also the lateral position within one level

of the tree is fixed and given by the order of the

‘‘tree left’’ and ‘‘tree right’’ values of this tree level. This

feature is especially useful for stratigraphic databases, as

we can now find the youngest member of a formation by

simply searching the lowest ‘‘tree left’’ or ‘‘tree right’’

value within one level of the tree (Fig. 7). This intrinsic

order of elements within one level of a tree is also

extremely useful to be able to store alternative opinions

or regional differences on stratigraphic subdivisions.

The regional subdivisions of the Ordovician provide a

good example for this problem. In Great Britain, the

Ordovician is subdivided by the stages Tremadocian,

Arenigian, Llanvirnian, Caradocian and Ashgillian,

whereas in China the stages Ichangian, Yushanian,

Zhejiangian, Neichian-shanian and Chientang-klangian

are used (Remane et al., 2001). These two subdivisions

are completely independent and do not correlate, but

both belong to the same System, the Ordovician. In our

data model, we have to treat the British as well as the

Chinese stages as components of the level below the

Ordovician, but assign an additional attribute ‘‘layer’’ in

table ‘‘zone trees’’ (Figs. 5 and 7) that hold information

about the regional subdivision to which they belong. For

example, all British stages would have an entry ‘‘1’’ for

this attribute and all Chinese the entry ‘‘2’’. The change

of this number from ‘‘1’’ to ‘‘2’’ then marks the position

where the British subdivision of the Ordovician ends and

the Chinese subdivision starts. At the same time, the

‘‘tree left’’ and ‘‘tree right’’ values still determine the

vertical position of the stages within each subdivision

(Fig. 7). This is called ‘‘folding’’ and allows a distinction

of entries in table ‘‘zone trees’’ at the point where the

value for ‘‘layer’’ changes.

ARTICLE IN PRESS

Fig. 6. Principle of numbering left and right values of nodes in

nested-set model. Footprints indicate direction of walk and x

marks stops where nodes have to be labeled using consecutive

numbers.

Fig. 7. Expanding nested-set model: lateral position (from left

to right) of each node within one level is given by order of left

and right values, indicated by dotted arrays. Fixed order of

nodes can be used to store stratigraphic (temporal) position of a

stratigraphic unit. In example, youngest unit within one given

level has highest left as well as right value. By introducing an

additional attribute (here: layer) this can be used to ‘fold’ and

divide one level at position where value of this attribute

changes. As a result alternative layers can be introduced as

indicated here by gray shaded circles.

R. Huber et al. / Computers & Geosciences 29 (2003) 921–928926

The features of our data model described above are

sufficient as long as the component units of a higher

rank are in chronological order. In lithostratigraphy,

however, component units can also change laterally: for

example, a basin facies can pass laterally into the margin

facies. It does not really matter in a hierarchical scheme,

if the members of one level are laterally or vertically

arranged. But it is essential for stratigraphical database

applications and the type of arrangement has to be

stored somewhere. In our data model we use a table

‘‘zone hierarchy type’’ (Fig. 5). The table ‘‘zone trees’’

points to this table where the information about the type

of hierarchical arrangement like ‘‘chronological order’’,

‘‘lateral order’’ of the components of a unit is stored.

Some example SQL code for the above examples is

described in detail by Celko (1999, see footnotes 1–4). It

can easily be rewritten and adapted to our model. The

use of SQL as the only query tool required and the

flexibility of the model for possible queries speak in

favor of this model. Because no more analyses of

number codes or recursive query procedures are needed,

the performance of queries is significantly improved in

comparison to other models.

The model we present here also has an operational

consequence when it comes to inserting or deleting

nodes. With every new node we add to the tree, all

entries in the columns ‘‘tree left’’ and ‘‘tree right’’ have

to be updated to ensure data integrity (see footnotes 2

and 3 for sample code). This complex action requires

that the addition of a node has to happen in one distinct

transaction. Therefore the database engine has to ensure

the safety of transactions, which is especially important

in a multi-user environment. This is a very critical point,

and limits the number of possible database engines that

can be used. Most modern relational databases ensure

this transactional safety and additionally offer the use of

‘‘check-conditions’’ and database triggers to ensure data

integrity. However, since the vast majority of queries

will only read from the database, this weakness is by far

compensated by the flexibility of the data model for

queries and analyses, when compared to other models.

2. Conclusions

Modeling information in tree-like structures is a

common problem in databases designed to hold

geoscientific information, and several attempts to solve

this problem have been published previously (Diehl,

1992; Treworgy, 1992). When compared to the ‘‘nested-

set model’’, most stratigraphic data models show some

serious restrictions either in compatibility, flexibility, or

in their capacity to handle advanced database queries,

they fail also when it comes to handling more complex

hierarchies.

The nested-set data model we present here allows

stratigraphic data to be stored in their hierarchical

context regardless of the complexity of their hierachical

arrangement, and to retrieve this data solely by the use

of SQL. No more procedural or recursive client-

language elements are needed. Therefore, the retrieval

of hierarchical information allows a better performance

of the database-management system than is possible in

the other models discussed above.

The natural order of stratigraphic units within one

hierarchical level in the nested-set model enables us to

store stratigraphic data not only with respect to their

hierarchical position but also with respect to their lateral

and thus temporal position.

The data model we present here and which is

implemented for the stratigraphy.net project further

enhances the capabilities of the nested-set model.

Multiple tree structures regardless of their stratigraphic

type (e.g. chronostratigraphy or lithostratigraphy) can

be stored. Further, alternative opinions or regionally

differing stratigraphic zonations can sufficiently be

handled by the model as well as lateral instead of

temporal zonations (e.g. facies changes) within the

hierarchy.

Acknowledgements

We would like to thank J.C. Matti and an anonymous

reviewer for their constructive and helpful comments.

References

Al-Mishwat, A.T., 1997. STASSAGE: a FORTRAN program

to decode stratigraphic ages from the international IGBA-

DAT database. Computers & Geosciences 23 (3), 305–315.

Bain, K.A., Giles, J.R.A., 1997. A standard model for storage

of geological map data. Computers & Geosciences 23 (6),

613–620.

Berendson, W.G., 1995. The concept of ‘‘potential taxa’’ in

databases. Taxon 44, 207–212.

Berendson, W.G., 1999. A comprehensive reference model for

biological collections and surveys. Taxon 48, 511–562.

Cande, S.C., Kent, D.V., 1992. A new geomagnetic polarity

time scale for the Late Cretaceous and Cenozoic. Journal of

Geophysical Research 197, 13917–13951.

Celko, J., 1999. SQL for Smarties: Advanced SQL Program-

ming. Morgan Kaufmann Publishers, San Francisco, CA,

576pp.

Diehl, P., 1992. Datenstrukturen und Abfragealgorithmen f .ur

stratigraphische Bohrprofile in einem relationalen DB-

System. Beitr.age zur Mathematischen Geologie und Geoin-

formatik 4, 134–139.

Giles, J.R.A., Lowe, D.J., Bain, K.A., 1997. Geological

dictionaries—critical elements of every geological database.

Computers & Geosciences 23 (6), 621–626.

ARTICLE IN PRESSR. Huber et al. / Computers & Geosciences 29 (2003) 921–928 927

Gradstein, F.M., Agterberg, F.P., Ogg, J.G., Herdenbol, J., van

Veen, P., Thierry, J., Huang, Z., 1995. A Triassic, Jurassic

and Cretaceous time scale. In: Bergren, W.A., Kent, D.V.,

Aubry, M.-P., Hardenbol, J. (Eds.), Geochronology Time

Scales and Global Stratigraphic Correlation. SEPM Special

Publications, Vol. 54. SEPM, Tulsa, Oklahoma, pp. 95–126.

Harland, W.B., Armstrong, R.L., Cox, A.V., Craig, L.E.,

Smith, A.G., Smith, D.G., 1990. A Geologic Time Scale

1989. Cambridge University Press, Cambridge, 131pp.

Morris, P.J., 2000a. A data model for invertebrate paleonto-

logical collections information. In: White, R.D., Allmon,

W.D. (Eds.), Guidelines for the Management and Curation

of Invertebrate Fossil Collections: Including a Data Model

and Standards for Computerization. Paleontological

Society Special Publications., Vol. 10. Paleontological

Society, Pittsburgh, PA, pp. 105–108.

Morris, P.J., 2000b. Appendix IX: entity relationship diagram;

Appendix X: entity documentation; Appendix XI: concep-

tual list of entites; Appendix XII: relationship cardinalities.

In: White, R.D., Allmon, W.D. (Eds.), Guidelines for the

Management and Curation of Invertebrate Fossil Collec-

tions: Including a Data Model and Standards for Compu-

terization. Paleontological Society Special Publications,

Vol. 10. Paleontological Society, Pittsburgh, PA, pp. 155–

260.

Murphy, M.A., Salvador, A., 1999. International stratigraphic

guide—an abridged version. Episodes 4 (22), 255–271.

Remane, J., Cita, M.B., Dercourt, J., Bouysse, B., Repetto,

F.L., Faure-Muret, A. (Eds.), 2001. International Strati-

graphic Chart, Suppl. 1. IUGS and UNESCO, Trondheim,

Norway, p. 16.

Salvador, A. (Ed.), 1994. International Stratigraphic Guide. A

Guide to Stratigraphic Classification, Terminology and

Procedure, 2nd Edition. IUGS and Geological Society of

America, Boulder, Colorado, 214pp.

Treworgy, C., 1992. Designing a stratigraphic database: an

example from the coal section of the Illinois State

Geological Survey. Geobyte 7, 33–37.

ARTICLE IN PRESSR. Huber et al. / Computers & Geosciences 29 (2003) 921–928928