hdi iii - healthdata.gov - now, next and challenges
DESCRIPTION
This is a presentation that will be given at the 2012 Health Datapalooza (http://hdiforum.org), describing the new healthdata.gov site, its PaaS/DaaS direction, and related i2/ONC developer challenges.TRANSCRIPT
![Page 1: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/1.jpg)
healthdata.govnow and next
challenges overview
hhs ocio, health datapalooza 2012
![Page 2: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/2.jpg)
2
session agenda
• now– tools and features
• next– target architecture
• challenges– explanations in sequence
![Page 3: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/3.jpg)
3
now – tools and features
• Drupal – publishing workflow and community engagement
• Solr – faceted search
• CKAN– ‘on demand resources’ (RESTful API and feeds)
• EC2– powered by GovCloud
• github.com/hhs – public repo’s coming soon!
![Page 4: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/4.jpg)
4
publishing workbench
• insert interesting workbench screenshot
![Page 5: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/5.jpg)
5
community engagement
• insert interesting community engagement
screenshot
• question and/or ideas example
![Page 7: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/7.jpg)
7
hub.healthdata.gov/api/rest/dataset
step 1: HTTP GET/dataset
collection as JSON
(GUID or name)
![Page 8: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/8.jpg)
8
hub.healthdata.gov/api/rest/dataset/{name}
step 2: HTTP GET
each/dataset
(as JSON, RDF/XML, or N3)
![Page 9: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/9.jpg)
9
hub.healthdata.gov/api/search/dataset?q=medicare+costs
JSON results for ‘medicare’and ‘costs’
search query
![Page 10: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/10.jpg)
10
hub.healthdata.gov/feeds/dataset.atom
atom feed for all
datasets (including recent
updates and changes)
![Page 11: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/11.jpg)
11
hub.healthdata.gov/feeds/custom.atom?q=medicare+cost
custom search query result
atom feed(anything with
‘medicare+cost’)
![Page 12: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/12.jpg)
12
next – target architecture
• linked data– (closed) google knowledge graph
– open health knowledge graph
• integration framework– top down modeling
– bottom up mapping
– social curation
![Page 13: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/13.jpg)
13
#gkg – (closed) ‘things, not strings’
“The Knowledge Graph helps us understand the relationships between things [… that are]
linked in our graph. […] It’s not just a catalog of objects; it also
models all these inter-relationships.” source
![Page 14: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/14.jpg)
14
open health knowledge graph
![Page 18: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/18.jpg)
18
Linked Data Integration Framework
GKG/Watson/Siri/… healthdata.gov
HKG
PCAST DEAS
Health Data Actor
Variety Volume Velocity
![Page 20: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/20.jpg)
20
i2 challenges
• two types– three domain specific
• improve the integration and liquidity of data made available
– four platform specific• enhance the capabilities of the technology components
• 3 release rounds– sequenced to leverage dependencies
• round 1: June through October 2102• round 2: November 2012 through May 2013• round 3: June through December 2013
![Page 21: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/21.jpg)
21
round 1 challenges
• June 2012 through October 2012
– domain specific • [1.1] cross domain and domain specific metadata
–voluntary consensus standards organizations, defacto standards, other
– platform specific• [1.2] Simplified Sign On (SSO)
–WebID identity provider and relying parties, HDP infrastructure components
– $35K: $20K 1st, $10K 2nd, $5K 3rd place prizes
![Page 22: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/22.jpg)
22
round 2 challenges
• November 2012 through May 2013
– domain specific • [2.3] Mapping, Reconciliation and Correlation
–structural variety, authoritative URI’s, linking heuristics
– platform specific• [2.4] Faceted Browsing and Visualization
–D3 (backbone, jQuery, etc.)• [2.5] Custom API
–Linked Data API ‘configurator’ for dataset resources
»each of these builds on [1.1] results
![Page 23: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/23.jpg)
23
round 3 challenges
• June 2013 through December 2013
– domain specific • [3.6] Correlating HHS and NHS Classifications
–structural variety, authoritative URI’s, linking heuristics
– platform specific• [3.7] Linked Data API based Data Element Access Services
– ‘securing the data, not just the device’»builds on [1.1], [1.2], and [2.5]
![Page 24: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/24.jpg)
24
domain challenge [1.1]
• Metadata– requests the application of existing voluntary
consensus standards for metadata common to all open government data
– and invites new designs for health domain specific metadata to classify datasets in our growing catalog, creating entities, attributes and relations
– that form the foundations for better discovery, integration and liquidity.
• 374 on challenge.gov
![Page 27: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/27.jpg)
27
hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf
rdf/xml output uses dublin core and dcat metadata
(mapping issues to work out, N3 output is incomplete, etc.)
![Page 28: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/28.jpg)
28
https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf
ckan script that creates dc and dcat
metadata tags / values
(thanks @JoshData! public github repo
soon :-)
![Page 29: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/29.jpg)
29
W3C Data Cube – statistics
refactor CQLD vocabs/data?start here and follow imports
![Page 30: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/30.jpg)
30
W3C Provenance – change mgmt
apply to CKAN /revisions
![Page 35: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/35.jpg)
35
OMG BMM – business motivation
image source
![Page 37: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/37.jpg)
37
platform challenge [1.2]
• WebID based SSO– will improve community engagement – by providing simplified sign on (SSO) for external
users interacting across multiple HDP technology components,
– making it easier for community collaborators to contribute,
– leveraging new approaches to decentralized authentication.
• 375 on challenge.gov
![Page 40: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/40.jpg)
40
edit WebID property ACL at IdP
![Page 41: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/41.jpg)
41
property is now visible to the RP
![Page 42: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/42.jpg)
42
domain challenge [2.3]
• Mapping, Reconciliation and Correlation– builds on the Metadata domain challenge [1.1]– begins by acknowledging disparate open government publishing
practices – and seeks the demonstration of an innovative and automated
solution for transforming semi-structured data into structured data,– reconciles decentralized distributions about the same data entity
against the master identity of an authoritative source, – and correlates these master identities when multiple authoritative
sources exist, – enabling the network effect by introducing strong identity resolution
techniques that ease the ability to aggregate different data about the same entities from independent publishers.
![Page 47: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/47.jpg)
47
platform challenge [2.4]
• Faceted Browsing and Visualization– builds on the Metadata domain challenge [1.1]– uses the most popular browser based UI frameworks and libraries
to realize novel exploration and discovery techniques for traversing large amounts of interrelated data,
– contributing to a growing collection of open source widgets that make it easy for third parties to create new applications and embed health data in their content.
![Page 48: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/48.jpg)
48
surfing the domain schemata
no domain knowledge required to discover
entities and relationships
![Page 49: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/49.jpg)
49
agents construct e/r queries
Siri, which {LA County} Hospitals have the best {Heart Attack} stats?
![Page 51: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/51.jpg)
51
platform challenge [2.5]
• Custom API– also builds on the Metadata domain challenge [1.1]– makes it possible to tune programmatic access in accordance
with dataset metadata, leveraging an existing ‘Web 3.0’ framework and Linked Data API (LDA) implementation to provide specialized interfaces
![Page 52: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/52.jpg)
52
a ‘Web 3.0’ API ‘configurator’
• Linked Data API (LDA)– http://code.google.com/p/linked-data-api/
• open source impl here
– http://code.google.com/p/puelia-php/
• example usage here
– http://reference.data.gov.uk/doc/department
• example api reference docs here
– http://environment.data.gov.uk/lab/doc/api-bwq-reference-v0.2.html
• commercialization example here
– http://kasabi.com/tour
![Page 53: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/53.jpg)
53
domain challenge [3.6]
• Correlating HHS – NHS Classifications– builds on both the Metadata [1.1] and Mapping, Reconciliation and
Correlation [2.3] domain challenges, – and uses the US and UK health domain specific classification
schemes to exercise the capabilities demonstrated by the automated solution to [2.3],
– resulting in better international integration of frameworks for understanding societal outcomes and their corresponding health statistics.
![Page 54: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/54.jpg)
54
platform challenge [3.7]
• Linked Data API based Data Element Access Services– builds on the Metadata domain challenge [1.1], and the Web ID
based SSO [1.2], and Custom API [2.5] platform challenges – augmenting WebID based authentication with metadata driven
authorization, – introducing an innovative security and privacy implementation of
‘data element access services’ (DEAS) as described by the PCAST Health IT Report,
– resulting in a Custom API configured by domain specific metadata that governs fine grained access to provide the right data to the right user.
• ‘secure the data, not just the devices’
![Page 55: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/55.jpg)
55
LDA + PPO = DEAS
![Page 57: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/57.jpg)
57
user 1 AuthZ ‘1101’ all attributes
![Page 58: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/58.jpg)
58
multiple machine readable formats
![Page 59: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/59.jpg)
59
user 2 AuthZ ‘1101’ no attributes
![Page 60: HDI III - Healthdata.gov - Now, Next and Challenges](https://reader036.vdocuments.us/reader036/viewer/2022062617/54c5e3f54a7959bd458b45a9/html5/thumbnails/60.jpg)
60
thanks!
@prefix drm: <http://vocab.data.gov/def/drm#>
@prefix sdo: <http://schema.org/>
@prefix vcard: <http://www.w3.org/2006/vcard/ns#>
@prefix dc: <http://purl.org/dc/terms/>
<http://hhs.gov/staff/georgethomas#>
rdf:type drm:DataSteward , sdo:Person ;
vcard:email “george dot thomas 1 at hhs dot gov” ;
dc:contributor <healthdata.gov>,
<data.gov/semantic> .