some thoughts on data and escience - university of...

28
Some Thoughts on Data and eScience Mike Conlon University of Florida [email protected]

Upload: others

Post on 26-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Some Thoughts on

Data and eScience

Mike Conlon

University of Florida

[email protected]

Page 2: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

What Does Data

Look Like?

Page 3: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and
Page 4: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and
Page 5: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

5. Harvesting

Page 6: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Get a world map showing

temperature sensors

Page 7: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and
Page 8: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

What are the

Data Processes?

Page 9: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and
Page 10: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Producing Data

Page 11: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Data Sharing

Photograph by J. G. Park. Flickr.com Photograph by Ell Brown Flickr.com

Page 12: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Creative Commons

Page 13: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Data Archive

Page 14: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

The Role of the

Archive• Collate data, final semantics, ready for

consumption

Page 15: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

A Consumption ScenarioFind all faculty members whose genetic work

is implicated in breast cancer

VIVO will store information about faculty and associate to genes.

Diseaseome associates genes to diseases.

Query resolves across VIVO and data sources it links to.

Page 16: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Data ReasoningData integration continues to be a serious bottleneck for the expectations

of increased productivity in the pharmaceutical and biotechnology

domain.

“Linked Life Data” integrates common public datasets that describe the

relationships between gene, protein, interaction, pathway, target, drug,

disease and patient and currently consist of more than 5 billion RDF

statements.

The dataset interconnects more than 20 complete data sources and

helps to understand the “bigger picture” of a research problem by linking

previously unrelated data from heterogeneous knowledge.

From the LarKC (Large Knowledge Collider) http://www.larkc.eu/overview/

Page 17: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Public, structured

linked data about

investigators

interests, activities

and

accomplishments,

and tools

to use that

data to

advance

science

Page 18: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Information is stored using the Resource Description

Framework (RDF) as subject-predicate-object “triples”

Jane

Smith

professor in

author of

has affiliation with

Dept. of

Genetics

College of

Medicine

Journal

article

Book

chapter

Book

Genetics

Institute

Subject Predicate Object

A Web of Data – The Semantic Web

Page 19: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and
Page 20: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and
Page 21: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

processOrg<-function(uri){

x<-xmlParse(uri)

u<-NULL

name<-

xmlValue(getNodeSet(x,"//rdfs:label")[[1]]

)

subs<-

getNodeSet(x,"//j.1:hasSubOrganization")

if(length(subs)==0)

list(name=name,subs=NULL)

else {

for(i in 1:length(subs)){

sub.uri<-

getURI(xmlAttrs(subs[[i]])["resource"])

u<-c(u,processOrg(sub.uri))

}

list(name=name,subs=u)

}

}

VIVO produces human

and machine readable

formats

Software reads

RDF from

VIVO and

displays

Page 22: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and
Page 23: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

VIVO Searchlight

Page 24: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Some Questions

Regarding Data Processes

Page 25: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Shared Understanding

of Data

Page 26: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Provenance

Page 27: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

Who pays?

Page 28: Some thoughts on data and escience - University of Floridaufdcimages.uflib.ufl.edu/IR/00/00/07/08/00001/Some...Some thoughts on data and escience Author Mike Conlon Subject Data and

http://vivo.ufl.edu/individual/mconlon