jack verhoosel | semantics in dairy farming: towards a common dairy ontology

23
SEMANTICS FOR BIG DATA APPLICATIONS IN SMART DAIRY FARMING Jack Verhoosel, Jacco Spek Presentation at the Semantics 2016 conference 14 September 2016

Upload: semanticsconference

Post on 07-Jan-2017

67 views

Category:

Technology


1 download

TRANSCRIPT

Page 2: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

2

aantallen NL

noemen

Collaboration project3 Cooperations 7 SME’s 5 Research institutes7 Real farmers

Timeline: SDF1: 2011 – 2014

Nothern part of the NetherlandsWebsite (in Dutch):

http://www.smartdairyfarming.nl/nl/

Goal of SDF:to support dairy farmers in the care of individual animals. with the specific goal of a longer productive stay at the farm due to improvement of individual health.

Challenge SDF2:more farmers: from 7 to 60 (and prepare for 2500)more sensor suppliers and more data consumersincorporate semantics and big data analysis

Numbers for the Dutch situation:• 15000+ farmers• in total more then 1.5 million milk cows• 20 to 200+ datafields per cow• many different stakeholders in the

chain

SDF 1.0 (2011 – 2014)SDF 2.0 (2015 – 2017)

Page 3: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

3

Starting point: Cow centric

thinking

Starting point: Farmer in control

“De boer aan het roer”

Real time analysis models (at different organisations)

Sensors from different suppliers: Lely, Delaval, Agis, Gallagher,…

Other data sources, CRV, FC, AgriFirm, Weather, Satellite

InfoBroker: Open platform for sharing (sensor) data producers and consumers

Cow specificsWorkinstructions (SOP)

This project is made possible by:

Data sharing in the dairy chain

Think big,start small 12GB sensordata per year for

7 farms => 310 GB triples

From 7 to 50 farms of 15.000 in NL

Page 4: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

4

InfoBroker concept

InfoBroker functionalities: Open interfaces for data exchange (API) Authentication

who are you (are you allowed to login) Permissions

which data may be used by whom to be set by the farmers

Namingservice location where the data can be found

– static data– cow-centric sensor data

Integration combining info from different sources

Pay-per-use fixed costs (connections) variable costs (used data)

So: no central datastore for (sensor)data! but indeed a broker and reduces/prevents duplication

cow specific work instructions (SOPs)

InfoBroker

cow centric data

cow centricdata

Cow centricSensor data

Static data (e.g. feed) Cow centric

Sensor data Static data

(e.g. date of birth)

Dashboard

ModelModel

Modelx 15.000+

Page 5: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

5

InfoBroker – Facts & Figures  Farm 1 Farm 2 Farm 3 Farm 4 Farm 5 Farm 6 Farm 7# cows/calves 459 186 315 239 706 202 351

Behaviour x       x    

Temperature x       x    

Activity x x x x x x  

Milk production x x     x x x

Food intake   x       x x

Weight x x x x x x x

Water intake     x x      

Milk intake     x x      

Date: february 2015NB1: this are “sensor data categories” at a farmNB2: not all animals are monitored for SDF (e.g. 3 and 4 only calves)

Page 6: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

InfoBroker – Facts & Figures

6

Number of cowsvs time

Number of sensorfieldsvs time

Page 7: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

7

WHY LINKED DATA AND SEMANTICS?

1. To make the various data sets accessible in an automatically linkable manner for easier integration

2. To align the semantics of the datasets in isolation as well as in combination using ontologies

3. To enable a rich set of questions to be queried on the datasets for better analysis

Page 8: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

AgriFirm:“How much feed did a group of cows at a dairy farm take in a certain period per type of feed and how strong is the correlation with milk yield?”

CRV:“What was the average weight per day over the last lactation period of a cow and what was the weight in/decrease over that period?

BIG DATA ANALYSIS QUESTIONS

Page 9: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

11

SIMPLE EXAMPLE OF LINKED DATA

Subject Objectpredicate

Cow Animal

is a (type)

Parcel

grazes on

Parcel

Grasslandis a (type)

40 ha

has surface

Page 10: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

12

LINKED DATA ROADMAP*

The four design principles of Linked Data (by Tim Berners Lee):1. Use Uniform Resource Identifiers

(URIs) as names for things. 2. Use HTTP URIs so that people can

look up those names. 3. When someone looks up a URI,

provide useful information, using the standards (RDF*, SPARQL).

4. Include links to other URIs so that they can discover more things.

LODRefine

*Based on PLDN LD roadmap

Page 11: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

13

….. sensor data….. sensor data

ONTOLOGY-BASED SDF

Cow specific data per farm per sensor equipment

Delaval sensor data

Lely sensor data

Visualization and analysis apps

Common Dairy Ontology

MS-ontology

Measurements Triples Static Triples

ST-ontology

Static data (e.g. date of birth)

Cow specific data per farm

mapping mapping

Nedap sensor data

Agis sensor data

Page 12: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

14

COMMON DAIRY ONTOLOGY

“What was the average

weight per day and

weight in/decrease over

the last lactation period

of a cow in a group ?”

Page 13: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

15

ONTOLOGY MAPPING

CommonDairy Ontology

rdfs:label = “Activity2hours” rdfs:label = “BodyWeight”

Measurement ontology

Page 14: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

16

PLASIDO: OUR BIG, LINKED DATA PLATFORMPowerful server: 128GB memory, 5TB storageTriplestores:

Marmotta Triplestore with Relational TripleDBJena Fuseki Triplestore with Native GraphDBVirtuoso Relational ClassicalDB with SPARQL-2-SQL interface

All SDF data of 2014 and 2015 retrieved from InfoBrokerConverted into triples using LODRefine and RDF generatorFrom 12GB to 310GB, increase of factor 25Stored in Marmotta and Fuseki for comparison

Application development:Angular-Javascript JSON converterGoogle Visualization

LODRefine

Page 15: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

17

POC DEMO: AGRIFIRM AND CRV

Page 16: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

18

POC DEMO: STATIC COW INFO

Page 17: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

19

POC DEMO: MILKYIELD / FEEDINTAKE

Correlation?How strong?

AI data analytics!

Page 18: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

20

POC DEMO: FEEDINTAKE PER FEED

Page 19: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

21

PLASIDO PERFORMANCE TESTS1. All 2014 SDF triple data from Infobroker into Apache

Marmotta triplestore12GB of CSV data turned into +/- 310 GB of RDF triplesMarmotta makes use of classical relational database to store triplesSimple queries with only one parameter can be easily answeredMore complex queries let to unacceptable response timeMain reason is inefficient access to underlying RDB

2. Next step: switch all data to Apache Jena Fuseki triplestoreStill +/- 310 GB of RDF triplesFuseki makes use of modern graph database to store triplesSimple queries with only one parameter can be easily answeredMore complex analysis queries lead to long, but still acceptable response time, upto 15 minutesSo, an acceptable performance.See next slide for some numbers.

Page 20: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

PLASIDO PERFORMANCEWith set-up 2 using Apache Jena Fuseki triplestore with graphDB

Page 21: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

23

PLASIDO PERFORMANCE TESTSConclusion of previous 2 tests: working with large triplesets is acceptableHowever, is it really necessary to put all data into triples?Why not use only the CDO as semantic interface and leave all data in classical table database format?

3. Next step: put all data into tables and use RDB of Virtuoso Only +/- 10 GB of raw dataMakes use of classical relational database to store raw data in table formSPARQL interface based on CDOMapping between SPARQL and SQL to translate queries towards RDBSimple queries with only one parameter can be easily answeredMore complex analysis queries lead to errors, because the SPARQL-2-SQL mapping generates too large SQL queries and results that cannot be handled by VirtuosoSo, an unacceptable performance for this Virtuoso-based solution.

Page 22: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

24

CONCLUSION AND NEXT STEPS

CDO ontology application in SDF architectureFurther enhancement of CDO to cover the dairy domain extensivelyArchitectural study to enable the use of CDO with InfoBrokerCDO as semantic interface for InfoBrokerApply AI deep learning algorithms to perform data analytics

More extensive performance studiesOther linked data platforms to deal with big data: D2RQOther possibilities for improvements: Linked Data FragmentsDealing with heterogeneous sensordata: differently measuredEnabling analysis based on incomplete sensordataRDF stream processing for dealing with streaming sensor data

Page 23: Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

25

THANK YOU FOR YOUR ATTENTION!