discovering descriptive knowledge

13
Discovering Descriptive Knowledge Lecture 18

Upload: alize

Post on 25-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Discovering Descriptive Knowledge. Lecture 18. Descriptive Knowledge in Science. In an earlier lecture, we introduced the representation and use of taxonomies and laws. Informatics tools for working with taxonomies. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discovering Descriptive Knowledge

Discovering Descriptive Knowledge

Lecture 18

Page 2: Discovering Descriptive Knowledge

Descriptive Knowledge in Science

In an earlier lecture, we introduced the representation and use of taxonomies and laws.

Informatics tools for working with taxonomies• represent them as a collection of hypotheses about

categories and their is-a relationships;

• use them to organize knowledge and to classify new observations.

• represent them as hypotheses about quantitative and/or qualitative relationships among an object’s properties;

• use them to predict the static or dynamic properties of an entity or an interconnected system.

Informatics tools for working with laws

Page 3: Discovering Descriptive Knowledge

The Taxonomy Formation Task

Taxonomy formation consists of three tasks that may be solved separately or simultaneously:• the construction of categories;

• the organization of the categories into a hierarchy; and

• the explicit definition of the categories.

Informatics tools for taxonomy formation fall into two general categories:• those that analyze finite batches of observations and

create separate taxonomies for each batch; and

• those that incrementally construct and refine taxonomies based on an effectively continuous stream of data.

Page 4: Discovering Descriptive Knowledge

Cluster 3.0

Cluster is designed to construct and organize categories from a batch of gene expression data.

As input, Cluster takes gene expression levels from multiple experiments.The program clusters genes based on their expression patterns across experiments.

Scientists can select the clustering method and set the available parameters.

Cluster produces a text file that contains the taxonomy.

Page 5: Discovering Descriptive Knowledge

Cluster 3.0: Results

Viewing the taxonomy produced by Cluster requires a separate program, such as Tree View.

taxonomy selected section of the taxonomy

data gene annotations

Page 6: Discovering Descriptive Knowledge

ReTAX

ReTAX is an interactive environment that helps scientists revise taxonomies in response to new observations.

A taxonomy in ReTAX includes hierarchically organized categories and their definitions.

The data for ReTAX are a set of features, such as the size of a plant’s leaf, the type of its fruit, etc. and a category.

As a scientist enters data, ReTAX ensures that the new item’s features• match or specialize the category’s defining features; and

• distinguish it from other categories in the taxonomy.If the new item violates either of these rules, then ReTAX attempts to revise its taxonomy.

Page 7: Discovering Descriptive Knowledge

ReTAX

Andromeda

Ericaceae

GaultheriaPernettya …

A. uva-ursi P. tasmanica G. oppositifolia G. rupestrisG. antipoda

Working in the context of a botanical taxonomy like this one, ReTAX replicated historical revisions.

In the course of its use, ReTAX• identified descriptive features that were insufficient for

distinguishing members of two taxa;

• searched for new features to refine the taxa; and

• eventually suggested that the genera Pernettya and Gaultheria should be merged.

Page 8: Discovering Descriptive Knowledge

Qualitative Law DiscoveryQualitative laws fall into two primary categories:• those involving categorical statements about objects,

such as “all ravens are black”; and

• those describing qualitative changes, such as “temperature and pressure increase proportionately”.

Informatics tools that discover categorical relationships have received the majority of the attention in this area.

These tools typically address a supervised learning task:• data are described by multiple features (color = black,

wings = present);

• one of these features serves as a target for classification (species = C. corax); and

• the tool relates the features to the target.

Page 9: Discovering Descriptive Knowledge

RL

RL addresses the supervised learning task to produce qualitative laws that are expressed as logical rules.

The rules are qualitative laws such that if all the conditions are true of a datum, then it is assigned to the target class.

As input, RL takes a data set and information that controls the characteristics of the rules, such as

• taxonomies of the values for features,

• constraints among features in each rule,

• minimum accuracy, &

• maximum features.

Page 10: Discovering Descriptive Knowledge

RL

As an example, consider the task of finding law-like relationships that link medical findings to a disease class.

The data are patient findings, and the target is a syndrome that covers several ailments (lower respiratory syndrome).

RL produces rules that relate the findings to the syndrome.

Each rule has numeric measures of support.

RL has been applied• to identify carcinogens, and

• to determine parameters for crystallographic experiments.

Page 11: Discovering Descriptive Knowledge

Quantitative Law Discovery

Quantitative laws may describe:• algebraic relationships such as Newton’s second law of

motion, a=F/m; and

• dynamic responses such as the unbounded growth rate of a population, dP/dt = kP.

Informatics tools address both classes of laws through a variety of techniques.

BACON discovers quantitative, algebraic laws through problem space search guided by declarative heuristics.

Cubist discovers conditional, algebraic laws using techniques for linear regression.

Page 12: Discovering Descriptive Knowledge

LAGRAMGE

LAGRAMGE, and it’s precursor LAGRANGE, were the first in a line of law discovery systems for differential equations.

LAGRAMGE takes as input • time series for multiple variables,

• an indicator that identifies the dependent variable, and

• knowledge about the structure of plausible solutions.

As output, the system produces an algebraic or differential equation for the dependent variable.

LAGRAMGE has been applied in ecosystem dynamics, fjord hydrodynamics, and other domains.

Page 13: Discovering Descriptive Knowledge

Discovering Descriptive Knowledge: Summary

The computational scientific discovery has a long history particularly in the context of descriptive knowledge.

Such systems have played a large role in exploring, analyzing, and understanding data.

Work in this area laid the foundations for the field of data mining both in terms of research and applications.

However, the discovery of descriptive knowledge• can lead to a shallow interpretation of data;

• generally avoids statements of causality; and

• makes limited contact with the rich, theoretical content of a scientific discipline

Next we will discuss systems that address these concerns.