umbc an honors university in maryland 1 using the semantic web to support ecoinformatics andriy...

32
UMBC UMBC an Honors University in an Honors University in Maryland Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County http://ebiquity.umbc.edu/paper/html/id/319/Using- the-Semantic-Web-to-Support-Ecoinformatics Joint work with Tim Finin, Joel Sachs, Cynthia Sims Parr, Rong Pan, Lushan Han, Li Ding (UMBC), Allan Hollander (UCD), David Wang (UMCP) This research was supported by NSF ITR 0326460 and matching funds received from USGS National Biological Information Infrastructure

Upload: richard-lang

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Using the Semantic Web to

Support EcoinformaticsAndriy Parafiynyk

University of Maryland, Baltimore Countyhttp://ebiquity.umbc.edu/paper/html/id/319/Using-the-Semantic-Web-to-

Support-Ecoinformatics

Joint work with Tim Finin, Joel Sachs, Cynthia Sims Parr, Rong Pan, Lushan Han,

Li Ding (UMBC), Allan Hollander (UCD), David Wang (UMCP) This research was supported by NSF ITR 0326460

and matching funds received from USGS National Biological Information Infrastructure

Page 2: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 2

Invasive Species•Invasive species cost the U.S. economy over $138 billion per year [1].

•By various estimates, these species contribute to the decline of 35 to 46 percent of U.S. endangered and threatened species

•The invasive species problem is growing, as the number of pathways of invasion increases.

[1] Pimental et al. 2000 Environmental and economic costs associated with non-indigenous species in the United States. Bioscience 50:53-65.

[2] Charles Groat, Director U.S. Geological Survey, http://www.usgs.gov/invasive_species/plw/usgsdirector01.html

Page 3: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 3

Currently most common ways of dealing with data among

biologists:• Journal articles

• Excel spreadsheets

• Local databases

• Some information is on-line in HTML/XML

Page 4: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 4

Semantic Web can offer:• Ontologies to arrive to a common vocabulary and

define exactly what is what across disciplines (multiple ontologies with mappings possible)

• Constant on-line data availability with convenient ways of data acquisition and processing

• Data discovery (Swoogle)

• Data integration from different sources, queries on data from multiple sources

• Expanding the knowledge base by inferencing

• Data can be easily updated or added, users notified

Page 5: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 5

Collect dataORFind data tables in literatureor data registryOREmail author of data

Massage data manually

Write up metadata recordRegister dataset with data registry

Start over for next project

Run analyses

Publish paper

Post supplemental data file on web

Create local spreadsheet

Build automatically updatingdynamic dataset

Develop intelligent query for semantic web data

Download to local spreadsheet

Run analyses

Publish paper

Reanalyze using latest dataset

(Query and data already publicly available)

Page 6: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 6

An NSF ITR collaborative project with•University of Maryland, Baltimore County •University of Maryland, College Park•U. Of California, Davis•Rocky Mountain Biological Laboratory

An NSF ITR collaborative project with•University of Maryland, Baltimore County •University of Maryland, College Park•U. Of California, Davis•Rocky Mountain Biological Laboratory

Page 7: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 7

Food Webs• A food web models the trophic (feeding)

relationships between organisms in an ecology– Food web simulators are used to explore the

consequences of changes in the ecology, such as the introduction or removal of a species

– A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them.

• Goal: automatically construct a food web for a new location using existing data and knowledge

• ELVIS: Ecosystem Location Visualization and Information System

Page 8: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 8

East River Valley Trophic Web

http://www.foodwebs.org/

Page 9: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 9

Species List ConstructorClick a county, get a species list

Page 10: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 10

The problem

• We know which species exist in the location and can further restrict and fill in with other ecological models

• But we don’t know which of them might be eaten by a potential invasive, or which might eat the invasive

• We can reason from taxonomic data (similar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps.

Page 11: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 11

Food Web ConstructorPredict food web links using database and taxonomic reasoning.

In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected

Page 12: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 12

Evidence ProviderExamine evidence for predicted links.

Page 13: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 13

ELVIS• Final goal:

ELVIS

(Ecosystem Location Visualization and Information System) as an integrated set of web services for constructing food webs for a given location.

Page 14: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 14

Background Ontologies• SpireEcoConcepts:

– confirmed and potential food web links– bibliographic information of food web studies– ecosystem terms– taxonomic ranks

• California Wildlife Habitat Relationships Ontology– life history– geographic range– management information

• ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources

Page 15: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 15

Data representation: ETHAN Ontology

• ethan_animals.owl: phylogenetic information about organisms

• ethan_keywords.owl: geographic range, habitats, physical description, trophic information, reproduction, lifespan, behavioral information, conservation Status

• Information in triples: – “Esox lucius” is a subclass of “Esox”

– “Esox lucius” has max mass “1.4 kg”

– “Esox” eats “Actinopterygii”

Page 16: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 16

Using ETHAN and OWL inferencing

to predict success of invasive species• Known food web links: rabbit eats carrot

• What about hare?

Yes with high probability since both are subclasses of the same class in taxonomic hierarchy, have same habitat etc

yummy!!!

yummy???

Page 17: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 17

•http://swoogle.umbc.edu/•Running since summer 2004•1.8M RDF docs, 320M triples, 10K

ontologies,15K namespaces, 1.3M classes, 175K properties, 43M instances, 600 registered users

•http://swoogle.umbc.edu/•Running since summer 2004•1.8M RDF docs, 320M triples, 10K

ontologies,15K namespaces, 1.3M classes, 175K properties, 43M instances, 600 registered users

Page 18: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 18

Applications and use cases

Supporting Semantic Web developers– Ontology designers, vocabulary discovery, who’s using

my ontologies or data?, use analysis, errors, statistics, etc.

Searching specialized collections– Spire: aggregating observations and data from biologists

– InferenceWeb: searching over and enhancing proofs

– SemNews: Text Meaning of news stories

Supporting Semantic Web tools– Triple shop: finding data for SPARQL queries

1

2

3

Page 19: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 19

Search for ontologies which contain this terms

1

Page 20: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 20

746 ontologies were found that had these two terms

By default, ontologies are ordered by their ‘popularity’, but they can

also be ordered by date or size.

Page 21: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 21

We can also search for any RDF documents

containing these terms

Page 22: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 22

5,378 documents were found that had these two terms

Page 23: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 23

UMBC Triple Shop• http://sparql.cs.umbc.edu/tripleshop2/• Finding datasets in the absence of the FROM

clause• Constraints by URI domain or namespace

(more coming)• Reasoning (none/rdfs/owl)• Dataset persistence: queries and results can be

saved, tagged, annotated, shared, searched for, etc.

32

Page 24: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 24

. . . leaving out the FROM clause

What are body masses of fishes that

eat fishes?

Swoogle Triple Shop

Page 25: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 25

specify dataset

Page 26: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 26

RDF documents were found that might have

useful data

Page 27: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 27

We’ll select them all and add them to the

current dataset.

Page 28: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 28

We’ll run the query against this dataset to see if the results are as expected.

We’ll run the query against this dataset to see if the results are as expected.

Page 29: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 29

The results can be produced in any of several formats

The results can be produced in any of several formats

Page 30: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 30

Results

http://sparql.cs.umbc.edu/tripleshop2/

Page 31: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 31

Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.

Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.

Page 32: UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 32

Contributions• OWL ontologies for ecoinformatics domain

– data representation

– data sharing

– inferencing

• OWL data discovery

• Ability to automatically construct datasets relevant to the query

• Dataset storage/sharing