fox ci and x-informatics - csig 2008, aug 11 1 community cyberinfrastructure and x-informatics -...

66
Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X- informatics - Assessment of convergence and innovation based on project experience Peter Fox High Altitude Observatory, NCAR Work performed in part with Deborah McGuinness (RPI), Rob Raskin (JPL), Krishna Sinha (VT), Luca Cinquini (NCAR), Patrick West (NCAR), Stephan Zednik (NCAR), Paulo Pinheiro da Silva (UTEP), Li Ding (RPI) and others

Upload: basil-hoover

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

1

Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project experience

Peter FoxHigh Altitude Observatory,NCAR

Work performed in part with Deborah McGuinness (RPI), Rob Raskin (JPL), Krishna Sinha (VT), Luca Cinquini

(NCAR), Patrick West (NCAR), Stephan Zednik (NCAR), Paulo Pinheiro da Silva (UTEP), Li Ding (RPI) and

others

Page 2: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

2

Outline• Background and inevitabilities• Informatics -> e-Science• Informatics methodology e.g. Semantic

Web as a approach and a technology– Virtual Observatories: use cases, some

examples, and non-specialist use– Data ingest, integration, mining and

where we are heading• Discussion

Page 3: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

3

BackgroundScientists should be able to access a global, distributed

knowledge base of scientific data that:• appears to be integrated• appears to be locally available

But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed

And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

Page 4: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

4

But data has Lots of Audiences

From “Why EPO?”, a NASA internalreport on science education, 2005

More Strategic

Less Strategic

InformationInformation products have

SCIENTISTS TOO

Page 5: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

5

Shifting the Burden from the Userto the Provider

Page 6: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

6

The Astronomy approach; data-types as a service

… … … …

VO App1

VO App2VO App3

DB2 DB3DBn

DB1

VOTable

Simple Image

Access Protocol

Simple Spectrum

Access Protocol

Simple Time Access

Protocol

VO layer

Limited interoperability

Lightweight semantics

Limited meaning, hard coded

Limited extensibility

Under review

Open Geospatial Consortium:

Web {Feature, Coverage, Mapping} Service

Sensor Web Enablement:

Sensor {Observation, Planning, Analysis} Service

use the same approach

Page 7: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

7

Mind the Gap!

• As a result of finding out who is doing what,

sharing experience/ expertise, and substantial

coordination:

• There is/ was still a gap between science and the

underlying infrastructure and technology that is

available• Cyberinfrastructure is the new

research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.

Informatics - information science includes the

science of (data and) information, the practice

of information processing, and the engineering

of information systems. Informatics studies the

structure, behavior, and interactions of natural

and artificial systems that store, process and

communicate (data and) information. It also

develops its own conceptual and theoretical

foundations. Since computers, individuals and

organizations all process information,

informatics has computational, cognitive and

social aspects, including study of the social

impact of information technologies. Wikipedia.

Page 8: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

8

Progression after progression

IT Cyber

Infrastructure

Cyber Informatics

Core Informatics

Science Informatics,

aka

Xinformatics

Science, SBAs

Informatics

Page 9: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

9

Virtual ObservatoriesMake data and tools quickly and easily accessible to a

wide audience.

Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated

Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part IT, part CI, part Informatics

Page 10: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

10… … … …

VO Portal

Web Serv.

VO API

DB2 DB3DBn

DB1

Semantic mediation layer - VSTO - low level

Semantic mediation layer - mid-upper-level

Education, clearinghouses, other services, disciplines, et c.

Metadata, schema, data

Query, access and use of data

Semantic query, hypothesis and inference

Semantic interoperability

Added value

Added value

Added value

Added value

Mediation Layer• Ontology - capturing concepts of Parameters,

Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes

• Maps queries to underlying data• Generates access requests for metadata, data• Allows queries, reasoning, analysis, new

hypothesis generation, testing, explanation, et c.

Page 11: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

11

Semantic Web Methodology and Technology Development Process

• Establish and improve a well-defined methodology vision for Semantic Technology based application development

• Leverage controlled vocabularies, et c.

Use Case

Small Team, mixed skills

Analysis

Adopt Technology Approach

Leverage Technology

Infrastructure

Rapid Prototype

Open World: Evolve, Iterate,

Redesign, Redeploy

Use Tools

Science/Expert Review & Iteration

Develop model/

ontology

Page 12: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

12

Science and technical use casesFind data which represents the state of the neutral

atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity.

– Extract information from the use-case - encode knowledge– Translate this into a complete query for data - inference and

integration of data from instruments, indices and models

Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere-Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

Page 13: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

13

Inferred plot type and return required axes data

Page 14: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

14

But data has Lots of Audiences

From “Why EPO?”, a NASA internalreport on science education, 2005

More Strategic

Less Strategic

Page 15: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

15

What is a Non-Specialist Use Case?

Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”.

Someone should be able to query a virtual observatory without having specialist knowledge

Page 16: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

16

Teacher receives four groupings of search results:

1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.php and http://www.meted.ucar.edu/hao/aurora/

2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them

3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights

4) Did you mean?: Aurora Borealis or Aurora

Australis, etc.

What should the User Receive?

Page 17: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

17

Semantic Information Integration: Concept map for educational use of

science data in a lesson plan

Page 18: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

18

Page 19: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

19

• Scaling to large numbers of data providers and redefining the roles/ relations among them

• Branding and attribution (where did this data come from and who gets the credit, is it the correct version, is this an authoritative source?)

• Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …)

• Crossing discipline boundaries• Data quality, preservation, stewardship• Security, access to resources, policies

Informatics issues for Virtual Observatories

Page 20: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

20

Provenance

• Origin or source from which something comes, its intention for use, whom or what it was generated for, the manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery; documented in detail sufficient to allow reproducibility

Page 21: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

21

• Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter?

• What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO?

• Find all good images on March 21, 2008.• Why are the quick look images from March 21,

2008, 1900UT missing?• Why does this image look bad?

Use cases

Page 22: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

22

Page 23: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

23

Page 24: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

24

Page 25: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

25

Quick look browse

Yasukawa: Computer crash

Yasukawa: Computer crashYasukawa: Rain, cloud

Page 26: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

26

Page 27: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

27

Visual browse

Page 28: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

28

Page 29: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

29

Page 30: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

30

Search

Page 31: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

31

Page 32: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

32

A Better Way to Access DataThe ProblemScientists only use data from a single instrument because it is difficult to access, process, and understand data from multiple instruments. A typical data query might be:

“Give me the temperature, pressure, and water vapor from the AIRS instrument from Jan 2005 to Jan 2008”

“Search for MLS/Aura Level 2, SO2 Slant Column Density from 2/1/2007”

A SolutionUsing a simple process, SESDI allows data from various sources to be registered in an ontology so that it can be easily accessed and understood. Scientists can use only the ontology components that relate to their data. An SESDI query might look like:

“Show all areas in California where sulfur dioxide (SO2) levels were above normal between Jan 2000 and Jan 2007”

This query will pull data from all available sources registered in the ontology and allow seamless data fusion. Because the query is measurement related, scientists do not need to understand the details of the instruments and data types.

Page 33: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

33

Determine the statistical signatures of volcanic forcings on the height of the tropopause

Page 34: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

34

Detection and attribution relations…

Page 35: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

36

Page 36: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

37

Leveraged VSTO semantic framework indicating how volcano and atmospheric parameters and databases can immediately be plugged in to the semantic data framework to enable data integration.

Page 37: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

42

Discussion (1)

• Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability– Use cases– X-informatics– Core Informatics– Cyber Informatics

• Evolvable technical infrastructure

Page 38: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

43

Progression after progression

IT Cyber

Infrastructure

Cyber Informatics

Core Informatics

Science Informatics

Science, Societal Benefit

Areas, Edu

Informatics

One example:

•CI = OPeNDAP server running over HTTP/HTTPS

•Cyberinformatics = Data (product) and service ontologies, triple store

•Core informatics = Reasoning engine (Pellet), OWL, CMAP,

•Science (X) informatics = Use cases, science domain terms, concepts in an ontology

Page 39: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

44

Discussion (2)• The data and information challenges are (almost)

being identified as increasingly common• Data and information science is becoming the

‘fourth’ column (along with theory, experiment and computation)

• Semantics are a very key ingredient for progress in informatics

• A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production

Page 40: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

45

Summary• Informatics is playing a key role in filling the gap

between science (and the spectrum of non-expert) use and generation and the underlying cyberinfrastructure– This is evident due to the emergence of Xinformatics

(world-wide)• Our experience is implementing informatics as

semantics in Virtual Observatories (as a working paradigm) and Grid environments– VSTO is only one example of success– Data mining, data integration, smart search, provenance

• Informatics is a profession and a community activity and requires efforts in all 3 sub-areas (science, core, cyber) and must be synergistic

Page 41: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

46

More Information• Virtual Solar Terrestrial Observatory (VSTO):

http://vsto.hao.ucar.edu, http://www.vsto.org• Semantically-Enalbed Science Data Integration (SESDI):

http://sesdi.hao.ucar.edu • Semantic Provenance Capture in Data Ingest Systems

(SPCDIS): http://spcdis.hao.ucar.edu • SAM/Semantic Knowledge Integration Framework (SKIF):

http://skif.hao.ucar.edu • Conferences: numerous• Journals: Earth Science Informatics• Texts: <empty>, a few are in progress• Courses:

– Semantic e-Science, fall 2008 course at RPI– Geoinformatics, at Purdue

• Contact: Peter Fox [email protected]

Page 42: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

47

Spare room

Page 43: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

48

Translating the Use-Case - non-monotonic?

Input

Physical properties: State of neutral atmosphere

Spatial:

• Above 100km

• Toward arctic circle (above 45N)

Conditions:

• High geomagnetic activity

Action: Return Data

Specification needed for query to CEDARWEB

Instrument

Parameter(s)

Operating Mode

Observatory

Date/time

Return-type: data

GeoMagneticActivity has ProxyRepresentation

GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)

Kp is a GeophysicalIndex hasTemporalDomain: “daily”

hasHighThreshold: xsd_number = 8

Date/time when KP => 8

Page 44: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

49

VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, www.vsto.org

Web Service

Page 45: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

50

Partial exposure of Instrument class hierarchy - users seem to LIKE THIS

Semantic filtering by domain or instrument hierarchy

Page 46: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

51

Page 47: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

52

Semantic Web Services

Page 48: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

53

Semantic Web Services

OWL document returned using VSTO ontology - can be used both syntactically or semantically

Page 49: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

54

Semantic Web Services

Page 50: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

55

Semantic Web Services

Page 51: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

56

VSTO achievements • Conceptual model and architecture developed by combined

team; KR experts, domain experts, and software engineers• Semantic framework developed and built with a small,

cohesive, carefully chosen team in a relatively short time (deployments in 1st year)

• Production portal released, includes security, et c. with community migration (and so far endorsement)

• VSTO ontology version 1.2, (vsto.owl) in production, 2.0 in preparation

• Web Services encapsulation of semantic interfaces in use• Solar Terrestrial use-cases are driving the completion of the

ontologies (e.g. instruments)• Using ontologies and the overall framework in other

applications (volcanoes, climate, oceans, water, …)

Page 52: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

57

Semantic Web Basics• The triple: {subject-predicate-object}

Interferometer is-a optical instrument

Optical instrument has focal length

An ontology is a representation of this knowledge

• W3C is the primary (but not sole) governing organization for languages, specifications, best practices, et c.– RDF - Resource Description Framework – OWL 1.0 - Ontology Web Language (OWL 1.1 on the way)

• Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon

• Put semantics between/ in your interfaces, i.e. between layers and components in your architecture, i.e. between ‘users’ and ‘information’ to mediate the exchange

Page 53: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

58

Semantic Web Benefits• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time• Decreased input requirements for query: in one case reducing the

number of selections from eight to three• Generates only syntactically correct queries: which was not always

insurable in previous implementations without semantics• Semantic query support: by using background ontologies and a

reasoner, our application has the opportunity to only expose coherent query (portal and services)

• Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, now exposed as smart web services– understanding of coordinate systems, relationships, data synthesis,

transformations, et c.– returns independent variables and related parameters

• A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

Page 54: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

59

Example 1: Registration of Volcanic Data

SO2 Emission from Kilauea east rift zone -

vehicle-based (Source: HVO)Abreviations: t/d=metric tonne (1000 kg)/day, SD=standard deviation, WS=wind speed, WD=wind direction east of true north, N=number of traverses

Location Codes:• U - Above the 180° turn at Holei Pali (upper Chain of Craters Road)

• L - Below Holei Pali (lower Chain of Craters Road)

• UL - Individual traverses were made both above and below the 180° turn at Holei Pali

• H - Highway 11

Page 55: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

60

Registering Volcanic Data (1)

Page 56: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

61

Registering Volcanic Data (2)

• No explicit lat/long data

• Volcano identified by name

• Volcano ontology framework will link name to location

Page 57: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

62

Example 2: Registration of Atmospheric Data

Satellite data for SO2 emissions

Abbreviation: SCD: Slant Column Density (in Dobson Unit (DU))

Page 58: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

63

Registering Atmospheric Data (1)

Page 59: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

64

SAM Project ObjectivesS. Graves, R. Ramachandran

• To create a prototype Semantic Analysis and Mining framework (SAM) comprising:– Data mining and knowledge extraction web services– Linked ontologies describing the mining services, data

and the problem domain– Web-based client

• To allow users to discover and explore existing data and services, compose workflows for mining and invoke these workflows.– Semantic search– Automated web service invocation– Automated web service composition

Page 60: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

65

Data Mining Ontology: Design

Courtesy: R. Ramachandran

Page 61: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

66

Data Mining Ontology: Snapshot

Courtesy: R. Ramachandran

Page 62: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

67

The Information Era: Interoperability

• managing and accessing large data sets• higher space/time resolution capabilities • rapid response requirements• data assimilation into models• crossing disciplinary boundaries.

Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system:

Page 63: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

68

Virtual Observatories• Conceptual examples: • In-situ: Virtual measurements

– Related measurements

• Remote sensing: Virtual, integrative measurements– Data integration

• Managing virtual data products/ sets

Page 64: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

Fox CI and X-informatics - CSIG 2008, Aug 11

69

Virtual Solar Terrestrial Observatory• A distributed, scalable education and research

environment for searching, integrating, and analyzing observational, experimental, and model databases.

• Subject matter covers the fields of solar, solar-terrestrial and space physics

• Provides virtual access to specific data, model, tool and material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use

• 3 year NSF-funded (OCI/SCI) project - completed• Several follow-on projects

Page 65: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

70

Problem definition

• Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control

• Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision

• We often fail to capture, represent and propagate manually generated information that need to go with the data flows

• Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects

• The task of event determination and feature classification is onerous and we don't do it until after we get the data

Page 66: Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project

71

Building blocks

• Data formats and metadata: IAU standard FITS, with SoHO keyword convention, JPeG, GIF

• Ontologies: OWL-DL and RDF• The proof markup language (PML) provides an interlingua

for capturing the information agents need to understand results and to justify why they should believe the results.

• The Inference Web toolkit provides a suite of tools for manipulating, presenting, summarizing, analyzing, and searching PML in efforts to provide a set of tools that will let end users understand information and its derivation, thereby facilitating trust in and reuse of information.

• Capturing semantics of data quality, event, and feature detection within a suitable community ontology packages (SWEET, VSTO)