o'reilly webcast: organizing the internet of things - actionable insight through ontologies

Post on 31-Jul-2015

457 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ORGANIZING THE INTERNET OF THINGS

ACTIONABLE INSIGHT THROUGH ONTOLOGIES

Boris Adryanbadryan@gmail.com

• Computational biologist• Research group leader• Advisor at• 2015 Fellow of the

Who is@BorisAdryan

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT HOUR…(including questions!)

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 10 MINUTES

DNA = storage of a blueprint

RNA = ‘active copy’ of DNA

protein = the building blocks of cells and tissues

LIFE AS WE KNOW IT

transcription

translation

Gregor Johann Mendel,exhibited in the Library at the NIMR

‣ Reading DNA information

‣ Determining “the sequence of a gene” was a PhD in the early 1980s

‣ Data processing was mainly transcribing the observation into a research paper

BIOLOGY THEN AND NOWSEQUENCE INFORMATION

Sanger sequencing ca. 1980

http://www.eplantscience.com

189,739,230,107 bases base pairs on 15th April 2015(from 159,813,411,760 bases pairs in April 2015)

‣ We can sequence a human genome in half a day

‣ Sequence databases grow faster than storage capacity

‣ Data processing is the key step in scientific understanding

BIOLOGY THEN AND NOWSEQUENCE INFORMATION

1990: automation kilobases a day

2007: next-gen seq megabases a day

2015: 1000s of instruments world-wide

BIOLOGY THEN AND NOWGENE ACTIVITY INFORMATION

‣ When are genes needed?

‣ Classical molecular biology workflow, taking days…

‣ Data is semi-quantitative; testing one gene at the time

Northern blot, ca. 1995

‣ High-throughput gene expression profiling since mid-1990s

‣ Quantitative information for every gene in an organism

‣ Key challenge is the graphical representation and interpretation of the data

screenshot from FlyBase, today

26 ATP

‣ Signal transduction and metabolic pathways

‣ Characterisation of proteins and substrates that mediate chemical reactions

‣ Nobel prize material

BIOLOGY THEN AND NOWBIOCHEMISTRY

‣ We know about 250k metabolites

‣ 100k protein structures

‣ on the order of 10k different chemical reactions

BIOLOGY THEN AND NOWBIOCHEMISTRY

“The Robot Scientist”

“small molecules”(Organic & Biomolecular Chemistry Blog)

protein(via the Protein Databank, www.pdb.org)

‣Everything is connected ‣ Big, noisy, often

unstructured data

‣We are learning how biological entities depend on each other

DNA > RNA > proteins

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 5 MINUTES

‣ Everything is connected‣ Big, noisy, often

unstructured data

www.thingslearn.com

Analytics, context integration, machine learning and predictive modelling for the IoT.

0 clean shirt left +

washing machine estimates 97% of your last pack of powder used

+ it’s Wednesday, 23:55

+ the last four Thursdays had a

morning business meeting +

the car is parked 20 m from a shop +

last retail activity: 8 sec ago

Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse

“need identified” + “notification appropriate”

Actionable insight. From everything.

NO ANALYTICAL FLEXIBILITY IN M2M/IOTMatt Hatton, Machina Research The BLN IoT ‘14

Internet replaces wire

It’s all about the context

M2M

consumer

IoT

defined I-P-O like it’s 1975

context

context

context

context

context

context

context

Is this hot?

LIFE SCIENCE STRATEGIES DON’T WORK IN THE IOT- There are no commonly accepted

- ‘catalogue’ of things,- ‘ontology’ of things,- ‘data format’ of things,- ‘meta data’ for things.

- Most businesses are driven by revenue, not long-term strategic vision

- Service providers have no need to publish

- Data can be highly personal (cheap excuse)

unless they’re

Trojan Roomcoffee pot -

ca. 1993

Oct. 1995

“The Internet of Things”Kevin Ashton, ca. 1999

20 YEARS OF NON-CONVERGENT EVOLUTION

FIRST DATA POTENTIAL RECOGNISED TODAY’S REALITY

“ignorant coexistence”

➡ Commonly accepted platforms and formats for data exchange

➡ Meta-data deposition is a must

➡ Infrastructure provides entry point for computational knowledge inference

“designed to ask questions”

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 10 MINUTES

Oct. 1995

TOWARDS MIAMI STANDARD AND DATA REPOSITORIES

cf. IoTNov. 1993

MInimal Annotation for MIcroarray Info

META DATA, SHARING AND DATA REPOSITORIES

founded in Nov. 1999

But this is a complex and ambitious project, and is one of the biggest challenges that bioinformatics has yet faced. Major difficulties stem from the detail required to describe the conditions of an experiment, and the relative and imprecise nature of measurements of expression levels. The potentially huge volume of data only adds to these difficulties.

NatureFeb. 2000

Nov. 2000 Oct. 2002

Wide adoption as requirement for publication in scientific journals

META DATA, SHARING AND DATA REPOSITORIES

cf. IoT 2014

since 2003

http://en.wikipedia.org/wiki/Silo

THE LIFE SCIENCES FIXED THEIR KNOWLEDGE REPRESENTATION PROBLEM

FORMALISING KNOWLEDGE

FORMALISING KNOWLEDGE WITH GENE ONTOLOGY

CURRENT GOVERNMENT INVESTMENTS INTO GENE ONTOLOGY

NIH alone spent $44,616,906 on the ontology structure since 2001(I don’t have data for UK/EU spendings)

~100 full-time salaries for experts with domain-specific knowledge

~40,000 terms

story

measurements + meta data

open, public repositories

human curators

ontology terms

community

PUBLISH OR PERISH

ok?

journal

informal exchange - no credit!

funders

assessment

The majority of this infrastructure is paid for by governments and charities

industry!

OUR PROBLEM IS KNOWLEDGE

DATA != INSIGHT

WITHOUT ORGANISING IT

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 10 MINUTES

measurements + meta data

storage & provenance

human curators

ontology terms

user

PUBLISH OR YOU’RE NOT DOING IOT

ok?

Maybe the majority of this infrastructure should be paid for by governments?

companycloud

device registration

“ “

privileges dataadded value

WHAT IS AN ONTOLOGY?

used to establish conceptual connection between entities

knowledge inference

fingerontology structure

- body part - limb - arm - hand - thumb - fingerontology rules

‣controlled vocabulary‣clearly defined relationships

is a

is a

connects to

part of

with ontological reasoning, a computer can infer that “finger is a body part”, although we

haven’t explicitly defined it that way

ARE PEOPLE NOT ALREADY USING ONTOLOGIES IN THE IOT?

Semantic Sensor Network Ontology

“thermostat”

The idea is not new! Cf. extension of the semantic web with the Semantic Sensor Network.

‣catalogs‣conventions

http://www.w3.org/2005/Incubator/ssn/ssnx/ssn

ONTOLOGIES HAVE TO BE PRAGMATIC COMPROMISES

Gene Ontology annotation

15 years of research47 publications100+ authors

50+ PhDs

15 direct annotations~150 inferred annotations

THE THREE BRANCHES OF

Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352

Localization: Where is an entity acting?

Function: What does the entity do?

Process: When is the entity needed?

inferences on “is a”

“part of”

“regulates”

“has part”

from geneontology.org from Ashburner et al., Nat Genet. 2000, 25(1):25-9.

GO AND CONTEXT

THE BRANCHES OF GO AND THE IOTLocalization: inside, (my?) home, living room

Function:measures temperatureregulates temperature

interacts with user directlyinteracts with user via app

Process: regulation of temperaturemeasurement of ambient temperature

‘is proxy / is avatar’ forpresencefireice age

A LAST WORD ON PRAGMATISM

“perfect” ontology

The SSN Ontology allows for inference entirely on the basis of its structure and annotation.In reality, many parameters are difficult to establish and the effort to annotate things outweighs the utility.

“crude” ontology

A simplified structure allows for quick annotation even by non-specialists.The lack of details can lead to clashes in the ontology => more smartness has to go into software; more coding effort.

1 billlion

different things

1 milllion

use cases

0 clean shirt left +

washing machine estimates 97% of your last pack of powder used

+ it’s Wednesday, 23:55

+ the last four Thursdays had a

morning business meeting +

the car is parked 20 m from a shop +

last retail activity: 8 sec ago

Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse

“need identified” + “notification appropriate”

Actionable insight. From everything.

“not home”

“buying”

credit card: “highly personal device” ~ alive and awake

3% left and

not pressed

“indicator of esteem”

Today’s biology is a quantitative, data-

rich science.

Infrastructure for ‘big data’ was driven by

academics.

Data is only useful if it can be turned into knowledge.

Understanding of data requires ‘data about

the data’.

Meta-data should be in a universally

understood format.Ontologies provide

context.

Gene Ontology (GO) is a de facto

standard.

Human curation is key to GO.

Public funders and industry contribute significantly to GO.

Should governments be involved in IoT?

GO is not a ‘one fits all’, but has a few useful concepts.

What does the thing do? Thing function.

For what can the thing be an avatar? Thing process.

Where is the thing? Thing localization.

@BorisAdryan

top related