linked-data based data management for data.gov.sg

23
Aravind Sesagiri Raamkumar KM CoP April 4 th 2015

Upload: aravind-sesagiri-raamkumar

Post on 17-Jul-2015

221 views

Category:

Internet


2 download

TRANSCRIPT

Aravind Sesagiri Raamkumar

KM CoP April 4th 2015

• KM deals with…

Knowledge Creation

Knowledge Accumulation

Knowledge Transfer

Knowledge Sharing

Knowledge Utilization

• All the above for better decision making in

organizations

• Facts are connected to form information

• Information when contextualized becomes Knowledge

• The underlying structure in our brain is similar to a network or a graph with nodes connected by edges

• Nodes are data

• Edges are relations

• Nodes + Edges = Information

• A group of Nodes and Edges form Hyper Graph (KNOWLEDGE)

• Q: How is data represented in the real world?

• Ans: ROWS AND COLUMNS (classic Relational Algebra)

Data Silo

No natural

integration

Re-invention of

wheel

Cumbersome

process

Resource Description Framework (RDF)

Subject-Predicate -Object

Jurong belongs to the West Zone

Linked Data Representation Format

http://data.gov.sg/resource/area/Jurong_West

http://data.gov.sg/ontology/property/has_zone

http://data.gov.sg/resource/zone/West

Subject

Predicate

Object

http://w3.org/2003/01/geo/wgs84_pos#/lat

http://w3.org/2003/01/geo/wgs84_pos#/long12.5555

0.21222

Resource

Vocabulary

& Resource

Resource

Ontology (Vocabulary)

•This structure can be re-used

•Rules can be included

•Similar to database schema

• Dbpedia (http://dbpedia.org/) is the linked data representation of Wikipedia

• Wiki page: http://en.wikipedia.org/wiki/Boon_Lay_(Jurong_West)

• DBpedia page: http://dbpedia.org/page/Boon_Lay_(Jurong_West)

• Example for Querying Linked Data from Dbpedia using SPARQL

http://dbpedia.org/sparql

select ?actor, count(?a) where

{

?a <http://dbpedia.org/ontology/starring> ?actor .

?actor <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Singapore> .

}

order by desc(count(?a))

Linked Open Data Cloud (Web of Data)

Wait there’s more….

• Corporate and private data is a no-go

• Public data

• Freely available data

• Data of global importance

• Data that doesn’t require much domain knowledge

• Ideal Data: Government Open Data

http://wheredoesmymoneygo.org/bubbletree-map.html#/~/grand-total--

2010-

UK Linked Data http://data.gov.uk/linked-data/who-is-doing-what

iDA Singapore launched data.gov.sg portal and mGov@SG public services during

June 2011

Data.gov.sg provides 5000+ public data sets from 50 government agencies

Purpose: Building applications, research and for creating applications using the

data

data.gov.sg

Why did we do this project?

To prescribe a migrational framework for linked data for

data.gov.sg (DGS) data sets

First hand view of the required migration activities

Issues anticipated at each step

Evaluation & Recommendation on Linked Data tools

To help IDA in understanding the benefits of Linked Data

ABC Water Proj (R)

Agency Websites

Singstat publicationsMINISTRIES

XLS

HTML

PDF

Accountant-General's DepartmentAccounting and Corporate Regulatory Authority

Agency For Science, Technology & ResearchAttorney-General’s Chambers

Building & Construction AuthorityCentral Narcotics Bureau

Central Provident Fund Board Civil Aviation Authority of Singapore

Department of StatisticsEconomic Development Board

Energy Market AuthorityHealth Sciences Authority

Housing & Development BoardImmigration & Checkpoints Authority

Infocomm Development Authority of SingaporeInland Revenue Authority of Singapore

Institute of Technical EducationIntellectual Property Office of Singapore

JTC CorporationJudiciary, Subordinate CourtsJudiciary, Supreme CourtLand Transport AuthorityMajlis Ugama Islam Singapura

Maritime & Port Authority of Singapore

Monetary Authority of SingaporeNanyang Polytechnic

National Environment AgencyNational Heritage Board

National Library Board National Parks Board

Ngee Ann Polytechnic People's Association

Public Service DivisionPublic Transport Council

Public Utilities Board Republic Polytechnic

Sentosa Development Corporation Singapore Civil Defence Force

Singapore Customs Singapore Land Authority

Singapore Police ForceSingapore Polytechnic

Singapore Sports CouncilSingapore Workforce Development Agency

Spring Singapore Temasek Polytechnic

Urban Redevelopment Authority

Ministry of Community Development, Youth & Sports

Ministry of Education

Ministry of Foreign Affairs

Ministry of Health

Ministry of Law –Community Mediation Unit

Ministry of Manpower

Ministry of Transport

Media Development Authority

BFABuildings(C)GreenBuilding(E)

C- CommunityCul - Culture

E- EnvironmentEmp- Employment

Edu - EducationH- HealthF- Family

R- RecreationS- Sports

Breast Screen (H)Cervical Screen (H)Healthier Dining (H)

Quit Centers (H)

Infocomm Access (C)Silver infocomm (C)

Wireless Hotspots (R)Child care (F)Disability (F)Elder care (F)

Family (F)Family Friendly Estab (F)

Student Care (F)Comm Mediation Center (C)

After Death Facilities (E)Funeral Palours (E)Dengue Cluster (H)Hawker Center (E)

NEA Offices (E)Recycling Bins (E)

Waste Disposal Site (E)

Waste Treatment (E)

Heritage sites(Cul)Monuments(Cul)

Museums(Cul)

Libraries (Cul)Streets and Places(Cul)

CD Councils (C)Community Clubs (C)

Constituency offices (C)Other facilities (C)

Other Pan networks (C)PA head quarters (C)

Residents Committee(C)Water Venture (C)

National Parks (R)Skyrise greenery (E)

Sports clubs (S)

CET Centers(Emp)WDA Service points(Emp)

Kindergartens (Edu)Get TokenAddress SearchAgency Data SearchStatic Map

Get Layer InfoMashupGet Related Data

Get DirectionsPublic Transportation

Reverse Geocode

Map-related APIs from various agencies

Traffic-related APIs from Land Transport Authority

Tourism-related APIs from the Singapore Tourism Board

Environment-related APIs from the National Environment Agency

Library-related data feeds & web services from National Library Board

DGS Eco System

SG DATA

TEXTUAL

SPATIAL

API

THEMES OPERATIONSCATEGORIES

UNSTRUCTURED DATA

STRUCTURED DATA

STRUCTURED DATA

STATUTORY BOARDS

SG Government Data Eco System

Proposed Linked Data Migrational

Framework for DGS

Specification Identfication Analysis

Object Modeling

Ontology Modeling

URI Naming

RDF Creation

External Linking

Datasets Publication

Discovery & Exploitation

Re-use Create

S2R D2R A2R

\

Govt Agencies and IDA

Govt Agencies Domain Matter Experts

Ontology Modelers

IDA and Web Architects

Developers

Developers and Domain Experts

Developers

Web Architects

ObjectivesSpecifications

Project Duration

Dataset PrioritizationDataset License SettingImpln Mode Selection

RoadmapArchitecture

Overview

Relational ModelDataset Overview

Drawing Objects in Whiteboard

Conceptual View

Conceptual ViewPublic Vocabularies

Re-use of Existing Vocabularies

Creation of New Vocabularies

OWL, RDFS, RDF Vocabulary files

Resources Class and Properties

Visualization of URI mining process

URI AdministrationURI Lifecycle

ER ModelSpreadsheets,

DBMS, API

Conversion to RDF triples using Mapping files

RDF Triples

Government and external data sets

Linking based on Similarity Algorithms

Outbound Links

RDF TriplesOntologies

SPARQL, API

Data InsertionVOID ModelingData Retrieval

API to SPARQL conversion

VOID TriplesJSON data

Actual DataExisting Apps

GamificationCrowdsourcing

Catalog RegistrationExternal Reference

New Apps

INPU

T

PR

OC

ES

S

OUTPU

T

INPU

T

PR

OC

ES

S

OUTPU

T

INPU

T

PR

OC

ES

S

OUTPU

T

INPU

T

PR

OC

ES

S

OUTPU

T

INPU

T

PR

OC

ES

S

OUTPU

T

INPU

T

PR

OC

ES

SOUTPU

T

INPU

T

PR

OC

ES

S

OUTPU

T

INPU

T

PR

OC

ES

S

OUTPU

T

Resource

Allocation

10

Resource

Allocation

15

Resource

Allocation

15

Resource

Allocation

5

Resource

Allocation

20

Resource

Allocation

5

Resource

Allocation

15

Resource

Allocation

15

• 1) Specification

• 2) Object Modeling

• 3) Ontology Modeling

• 4) URI Mapping

• 5) RDF Creation

• 6) External Linking

• 7) Datasets Publication

• 8) Discovery & Exploitation

• Linked Data’s representation of data sets eases

Knowledge Creation and facilitates Knowledge

Transfer

• Data in LD format becomes universally

recognizable and consumable (ex: http://live.dbpedia.org/page/Lee_Kuan_Yew)

• The prescribed framework outlines the steps to

be executed for data conversion to LD format for

SG Govt Datasets

• Probable issues and best Linked Data software

libraries and tools were also put forth

Thank

You

Link to Original CI presentation

http://www.slideshare.net/aravindsraamkumar/proposed-linked-data-migration-

framework-for-singapore-government-datasets

Linking Enterprise Data by David Wood (2010)

• http://www.springer.com/gp/book/9781441976642

Beginner book:

Linked Data: Evolving the Web into a Global Data Space

• http://linkeddatabook.com/editions/1.0/