hdi iii - healthdata.gov - now, next and challenges

Post on 26-Jan-2015

111 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a presentation that will be given at the 2012 Health Datapalooza (http://hdiforum.org), describing the new healthdata.gov site, its PaaS/DaaS direction, and related i2/ONC developer challenges.

TRANSCRIPT

healthdata.govnow and next

challenges overview

hhs ocio, health datapalooza 2012

2

session agenda

• now– tools and features

• next– target architecture

• challenges– explanations in sequence

3

now – tools and features

• Drupal – publishing workflow and community engagement

• Solr – faceted search

• CKAN– ‘on demand resources’ (RESTful API and feeds)

• EC2– powered by GovCloud

• github.com/hhs – public repo’s coming soon!

4

publishing workbench

• insert interesting workbench screenshot

5

community engagement

• insert interesting community engagement

screenshot

• question and/or ideas example

7

hub.healthdata.gov/api/rest/dataset

step 1: HTTP GET/dataset

collection as JSON

(GUID or name)

8

hub.healthdata.gov/api/rest/dataset/{name}

step 2: HTTP GET

each/dataset

(as JSON, RDF/XML, or N3)

9

hub.healthdata.gov/api/search/dataset?q=medicare+costs

JSON results for ‘medicare’and ‘costs’

search query

10

hub.healthdata.gov/feeds/dataset.atom

atom feed for all

datasets (including recent

updates and changes)

11

hub.healthdata.gov/feeds/custom.atom?q=medicare+cost

custom search query result

atom feed(anything with

‘medicare+cost’)

12

next – target architecture

• linked data– (closed) google knowledge graph

– open health knowledge graph

• integration framework– top down modeling

– bottom up mapping

– social curation

13

#gkg – (closed) ‘things, not strings’

“The Knowledge Graph helps us understand the relationships between things [… that are]

linked in our graph. […] It’s not just a catalog of objects; it also

models all these inter-relationships.” source

15

health.data.gov/id/hospital/393303

16

clinical quality linked data (HDI II)

17

lifting and enrichment

18

Linked Data Integration Framework

GKG/Watson/Siri/… healthdata.gov

HKG

PCAST DEAS

Health Data Actor

Variety Volume Velocity

19

social meta/data – graph curation

20

i2 challenges

• two types– three domain specific

• improve the integration and liquidity of data made available

– four platform specific• enhance the capabilities of the technology components

• 3 release rounds– sequenced to leverage dependencies

• round 1: June through October 2102• round 2: November 2012 through May 2013• round 3: June through December 2013

21

round 1 challenges

• June 2012 through October 2012

– domain specific • [1.1] cross domain and domain specific metadata

–voluntary consensus standards organizations, defacto standards, other

– platform specific• [1.2] Simplified Sign On (SSO)

–WebID identity provider and relying parties, HDP infrastructure components

– $35K: $20K 1st, $10K 2nd, $5K 3rd place prizes

22

round 2 challenges

• November 2012 through May 2013

– domain specific • [2.3] Mapping, Reconciliation and Correlation

–structural variety, authoritative URI’s, linking heuristics

– platform specific• [2.4] Faceted Browsing and Visualization

–D3 (backbone, jQuery, etc.)• [2.5] Custom API

–Linked Data API ‘configurator’ for dataset resources

»each of these builds on [1.1] results

23

round 3 challenges

• June 2013 through December 2013

– domain specific • [3.6] Correlating HHS and NHS Classifications

–structural variety, authoritative URI’s, linking heuristics

– platform specific• [3.7] Linked Data API based Data Element Access Services

– ‘securing the data, not just the device’»builds on [1.1], [1.2], and [2.5]

24

domain challenge [1.1]

• Metadata– requests the application of existing voluntary

consensus standards for metadata common to all open government data

– and invites new designs for health domain specific metadata to classify datasets in our growing catalog, creating entities, attributes and relations

– that form the foundations for better discovery, integration and liquidity.

• 374 on challenge.gov

25

W3C SKOS – concept schemes

26

W3C DCAT – data catalogs

27

hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf

rdf/xml output uses dublin core and dcat metadata

(mapping issues to work out, N3 output is incomplete, etc.)

28

https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf

ckan script that creates dc and dcat

metadata tags / values

(thanks @JoshData! public github repo

soon :-)

29

W3C Data Cube – statistics

refactor CQLD vocabs/data?start here and follow imports

30

W3C Provenance – change mgmt

apply to CKAN /revisions

31

hub.healthdata.gov/revision

32

W3C org – organization

33

quantity, units, dimensions, time

34

OGC GeoSPARQL – geospatial

36

CQLD domain specific

37

platform challenge [1.2]

• WebID based SSO– will improve community engagement – by providing simplified sign on (SSO) for external

users interacting across multiple HDP technology components,

– making it easier for community collaborators to contribute,

– leveraging new approaches to decentralized authentication.

• 375 on challenge.gov

38

relying party WebID login

39

identity provider WebID login

40

edit WebID property ACL at IdP

41

property is now visible to the RP

42

domain challenge [2.3]

• Mapping, Reconciliation and Correlation– builds on the Metadata domain challenge [1.1]– begins by acknowledging disparate open government publishing

practices – and seeks the demonstration of an innovative and automated

solution for transforming semi-structured data into structured data,– reconciles decentralized distributions about the same data entity

against the master identity of an authoritative source, – and correlates these master identities when multiple authoritative

sources exist, – enabling the network effect by introducing strong identity resolution

techniques that ease the ability to aggregate different data about the same entities from independent publishers.

43

automating structural transformations

44

‘reconciling’ strings to things

45

result: turtle is the new JSON!

46

link automation heuristics editor

47

platform challenge [2.4]

• Faceted Browsing and Visualization– builds on the Metadata domain challenge [1.1]– uses the most popular browser based UI frameworks and libraries

to realize novel exploration and discovery techniques for traversing large amounts of interrelated data,

– contributing to a growing collection of open source widgets that make it easy for third parties to create new applications and embed health data in their content.

48

surfing the domain schemata

no domain knowledge required to discover

entities and relationships

49

agents construct e/r queries

Siri, which {LA County} Hospitals have the best {Heart Attack} stats?

50

d3 (jQuery, backbone, etc.)

51

platform challenge [2.5]

• Custom API– also builds on the Metadata domain challenge [1.1]– makes it possible to tune programmatic access in accordance

with dataset metadata, leveraging an existing ‘Web 3.0’ framework and Linked Data API (LDA) implementation to provide specialized interfaces

52

a ‘Web 3.0’ API ‘configurator’

• Linked Data API (LDA)– http://code.google.com/p/linked-data-api/

• open source impl here

– http://code.google.com/p/puelia-php/

• example usage here

– http://reference.data.gov.uk/doc/department

• example api reference docs here

– http://environment.data.gov.uk/lab/doc/api-bwq-reference-v0.2.html

• commercialization example here

– http://kasabi.com/tour

53

domain challenge [3.6]

• Correlating HHS – NHS Classifications– builds on both the Metadata [1.1] and Mapping, Reconciliation and

Correlation [2.3] domain challenges, – and uses the US and UK health domain specific classification

schemes to exercise the capabilities demonstrated by the automated solution to [2.3],

– resulting in better international integration of frameworks for understanding societal outcomes and their corresponding health statistics.

54

platform challenge [3.7]

• Linked Data API based Data Element Access Services– builds on the Metadata domain challenge [1.1], and the Web ID

based SSO [1.2], and Custom API [2.5] platform challenges – augmenting WebID based authentication with metadata driven

authorization, – introducing an innovative security and privacy implementation of

‘data element access services’ (DEAS) as described by the PCAST Health IT Report,

– resulting in a Custom API configured by domain specific metadata that governs fine grained access to provide the right data to the right user.

• ‘secure the data, not just the devices’

55

LDA + PPO = DEAS

56

Privacy Preference Ontology (PPO)

57

user 1 AuthZ ‘1101’ all attributes

58

multiple machine readable formats

59

user 2 AuthZ ‘1101’ no attributes

60

thanks!

@prefix drm: <http://vocab.data.gov/def/drm#>

@prefix sdo: <http://schema.org/>

@prefix vcard: <http://www.w3.org/2006/vcard/ns#>

@prefix dc: <http://purl.org/dc/terms/>

<http://hhs.gov/staff/georgethomas#>

rdf:type drm:DataSteward , sdo:Person ;

vcard:email “george dot thomas 1 at hhs dot gov” ;

dc:contributor <healthdata.gov>,

<data.gov/semantic> .

top related