prov-o-viz: interactive provenance visualization

55
PROV-O-Viz Interactive Provenance Visualization Rinke Hoekstra and Paul Groth VU University Amsterdam/University of Amsterdam [email protected] TM to 2 Data Semantics Semantics for Scientific Data Publishers From Data Many slides courtesy of Paul Groth

Upload: rinke-hoekstra

Post on 19-Jan-2015

342 views

Category:

Technology


4 download

DESCRIPTION

Prov-O-Viz is a visualisation service for provenance graphs expressed using the W3C PROV vocabulary. It uses the Sankey-style visualisation from D3js. See http://provoviz.org

TRANSCRIPT

Page 1: Prov-O-Viz: Interactive Provenance Visualization

PROV-O-Viz Interactive Provenance Visualization

Rinke Hoekstra and Paul GrothVU University Amsterdam/University of Amsterdam

[email protected]

TM

to2Data Semantics

Semantics for Scientific Data PublishersFrom Data

Many slides courtesy of Paul Groth

Page 2: Prov-O-Viz: Interactive Provenance Visualization

Provenance?

Page 3: Prov-O-Viz: Interactive Provenance Visualization

Provenance by Jennifer Compton

http://stillcraic.blogspot.nl/2014/01/tuesday-poem-provenance-by-jennifer.html

Page 4: Prov-O-Viz: Interactive Provenance Visualization

Definition(Oxford English Dictionary)

• The fact of coming from some particular source or quarter; origin, derivation;

• the history or pedigree of a work of art, manuscript, rare book, etc.;

• concretely, a record of the passage of an item through its various owners.

Page 5: Prov-O-Viz: Interactive Provenance Visualization

Provenance

Page 6: Prov-O-Viz: Interactive Provenance Visualization

Provenance

Page 7: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Page 8: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Page 9: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Compliance and auditing of business processes

Page 10: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Compliance and auditing of business processes

Page 11: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Licensing and attribution of combined information

Compliance and auditing of business processes

Page 12: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Licensing and attribution of combined information

Compliance and auditing of business processes

Page 13: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Licensing and attribution of combined information

Liability, trust and privacy in open government data

Compliance and auditing of business processes

Page 14: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Licensing and attribution of combined information

Liability, trust and privacy in open government data

Compliance and auditing of business processes

Page 15: Prov-O-Viz: Interactive Provenance Visualization

ProvenanceMaking trust judgements on the Web

Licensing and attribution of combined information

Liability, trust and privacy in open government data

Compliance and auditing of business processes

Safeguarding quality, reproducibility and integrity of the scientific process

Page 16: Prov-O-Viz: Interactive Provenance Visualization

“Web Design Issues”“At the toolbar (menu, whatever) associated with a document there is a button marked “Oh, yeah?”. You press it when you lose that feeling of trust. It says to the Web, “so how do I know I can trust this information?”. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons.”

Tim Berners-Lee, Web Design Issues, September 1997

Page 17: Prov-O-Viz: Interactive Provenance Visualization

Provenance in Web Documents

Page 18: Prov-O-Viz: Interactive Provenance Visualization

Provenance in Web Documents

Standards for ethical aggregation?

Curator’s code for attributing discovery?

Page 19: Prov-O-Viz: Interactive Provenance Visualization

Provenance in Open Government

Need provenance for data integration and reusediversity of data sourcesvarying qualitydifferent scopedifferent assumptions

“Provenance is the number one issue that we face when publishing

government data in data.gov.uk”John Sheridan, UK National Archives, data.gov.uk

Page 20: Prov-O-Viz: Interactive Provenance Visualization

Provenance in Science“We need a paradigm that makes it simple […] to perform and publish reproducible computational research. […] a Reproducible Research Environment (RRE) […] provides computational tools together with the ability to automatically track the provenance of data, analysis, and results and to package them (or pointers to persistent versions of them) for redistribution.”

Jill Mesirov, Chief Informatics Officer of the MIT/ Harvard Broad Institute, in Science, January 2010

Need provenance for reproducibility and verification of processes

Page 21: Prov-O-Viz: Interactive Provenance Visualization
Page 22: Prov-O-Viz: Interactive Provenance Visualization

W3C Working Group

Provenance is a record that describes the people, institutions, entities, and activities, involved in

producing, influencing, or delivering a piece of data or a thing.

http://www.w3.org/TR/prov-overview

Luc Moreau & Paul Groth

Page 23: Prov-O-Viz: Interactive Provenance Visualization

Provenance?• Provenance = Metadata?

Provenance can be seen as metadata, but not all metadata is provenance

• Provenance = Trust?Provenance provides a substrate for deriving different trust metrics

• Provenance = Authentication?Provenance records can be used to verify and authenticate amongst users

Page 24: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

Page 25: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording

Page 26: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating

Page 27: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

Page 28: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

scalability

Page 29: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

scalability interoperability

Page 30: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

scalability interoperability

trust

Page 31: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

scalability interoperability

trust accountability

Page 32: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

scalability interoperability

trust accountability compliance

Page 33: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

scalability interoperability

trust accountability compliance explanation

Page 34: Prov-O-Viz: Interactive Provenance Visualization

Three Dimensions

• ContentCapturing and representing provenance information

• ManagementStoring, querying, and accessing provenance information

• UseInterpreting and understanding provenance in practice

recording annotating workflow systems

scalability interoperability

trust accountability compliance explanation debugging

Page 35: Prov-O-Viz: Interactive Provenance Visualization

Basic Idea

Page 36: Prov-O-Viz: Interactive Provenance Visualization

What you can do…

Page 37: Prov-O-Viz: Interactive Provenance Visualization
Page 38: Prov-O-Viz: Interactive Provenance Visualization

Warning: provenance is about history!

Page 39: Prov-O-Viz: Interactive Provenance Visualization

Visualization Anyone?

Page 40: Prov-O-Viz: Interactive Provenance Visualization

Naive Approaches

InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf

Orbiter has several limitations. It does not have capabilities for query subgraph high-lighting, regular expression filters, process grouping, annotations, or programmable views[16].Furthermore, the structure of each summary node, where child nodes are grouped withinparents and are hidden until the parent is expanded, benefits queries earlier in the depen-dency chain. Initial overviews often correspond with system bootup, and appear very similaracross di↵erent traces (time slices of system activity).

Figure 10: In these screenshots of Orbiter, the presence of edges overwhelms the visibility ofnodes. By relying on a node-link graph layout and using spatial location to encode objectrelationships, Orbiter’s graph layout algorithm must draw many long edges to communi-cate node connections. Without edge bundling or opacity variation, the meanings of theserelationships are obscured.

Another one of Orbiter’s weaknesses is its node-link diagram layout. As a result, eachnode’s position in the X-Y plane and the length and angle of connecting lines are wastedattributes. The chosen graph layout algorithm (dot by default) arranges nodes to minimizeedge crossings and total edge lengths. However, depending on the interrelationships amongnodes, it may be impossible to find an optimal layout. In this case, undesirable designs withdense quantities of long edges may emerge, as seen in Figure 10. At the scale of a typicalprovenance graph, related nodes may be drawn far apart. This weakens the e↵ectiveness ofedges as “connections” that show relationships between nodes.

2.4 Large Graph Visualization

While a complete survey of graph and tree visualization is beyond the scope of this paper,I will summarize some notable approaches. See Herman et. al for a more detailed overviewof graphs and information visualization[27], or see Ellis and Dix for an overview of clutterreduction techniques for visualization of large data sets[20].

There is a variety of current e↵orts to visualize large graphs. Many of these tools weredesigned for social network or genomics data sets, for which there is a motivation to seeboth patterns in the data set at large, as well as node-level detail. Visualization attemptsfor large graphs mostly fall within three categories — summary node-link diagrams, tree

17

Figure 11: (Top): A screenshot of the portion of the graph generated by GraphViz for atrace of the third provenance challenge. (Bottom): A zoomed-in view of the same graph.The horizontal black bars across the images are dense collections of edges.

E↵ective large graph visualizations present the user with a summary view that can beexplored, filtered, and expanded interactively.

2.5 Tree Visualization

While trees are a subcategory of graphs, because of their hierarchical composition, tree visu-alization forms its own subfield of research. A survey of over two-hundred tree visualizationsis given at Hans-Jrg Schulz’s treevis.net. Visitors can narrow down by dimensionality(2D, 3D, or mixed), representation (explicit node-link diagram, implicit treemap, or combi-nation), alignment (XY plot, radial layout, or free diagram)[55]. These categories are shownby the icons in Figure 13.

19

Figure 12: Left : Pajek uses various summary node-link and matrix-based representationsdepending on the structure of the supplied data set. Pictured is a main core subgraphextracted from routing data on the Internet. Right : TopoLayout optimizes the choice ofvisualization display depending on the underlying graph structure. The right column isTopoLayout’s output, while the left and middle columns are the outputs of the GRIP andFM graph layout algorithms.

Figure 13: treevis.net defines di↵erent categories for tree maps. Tree maps can be cate-gorized by dimensionality (2D, 3D, or mixed), representation (explicit, implicit, or mixed),or alignment (XY, radial, or spring).

Tree visualizations are either explicit or implicit. Explicit representations resemble node-link diagrams. An example of an implicit representation is a tree map, a diagram where theentire tree is inscribed in a rectangle representing the root node. This root is subdividedhierarchically into more rectangles, which represent child nodes, and each child node issubdivided into more child nodes. Treemaps are excellent for displaying hierarchical orcategorical data[57]. One famous example, shown in Figure 14, is the “Map of the Market”from SmartMoney.com, which displays in red and green the changes in market value ofpublicly-traded companies, grouped by market sector, with cell size proportional to marketcapitalization[64].

TreePlus is an example of a tree-inspired graph visualization tool (Figure 15). It usesthe guiding metaphor of “plant a seed to watch it grow” to summarize navigation of its tree-based large graph visualization tool[42]. The visual interface displays a tree, starting fromthe graph root or a user-specified starting node. Nodes at the same level are listed vertically;parents and children are listed to the left or right. When the user hovers over displayed

20

Page 41: Prov-O-Viz: Interactive Provenance Visualization

InProv

InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf

6 Final Design

Figure 30: A view of a cluster of system activity. This particular timeslice shows the activityof the init.sh and mount processes.

This visualization was designed with the Visual Information-Seeking Mantra in mind -“overview first, zoom and filter, then details-on-demand”[56].

Nodes are colored according to their type. Processes are dark green, files are light

42

Page 42: Prov-O-Viz: Interactive Provenance Visualization

D3.js

Visualize the magnitude of flow between nodes in a network

Page 43: Prov-O-Viz: Interactive Provenance Visualization

PROV-O-Vizhttp://provoviz.org

Page 44: Prov-O-Viz: Interactive Provenance Visualization

PROV-O-Vizhttp://provoviz.org

Insert any PROV-O RDF

Or connect to a SPARQL endpoint

Page 45: Prov-O-Viz: Interactive Provenance Visualization
Page 46: Prov-O-Viz: Interactive Provenance Visualization
Page 47: Prov-O-Viz: Interactive Provenance Visualization

Width of activities and entities is based on information flow

Activities and entities are extracted from an ego graph

Page 48: Prov-O-Viz: Interactive Provenance Visualization

Move activities and entities around

Hover over interesting dependencies

Page 49: Prov-O-Viz: Interactive Provenance Visualization

Embed graph into your own webpage

Page 50: Prov-O-Viz: Interactive Provenance Visualization

Tom de Nies (Ghent University)Sara Magliacane (VU University Amsterdam)

Page 51: Prov-O-Viz: Interactive Provenance Visualization
Page 52: Prov-O-Viz: Interactive Provenance Visualization
Page 53: Prov-O-Viz: Interactive Provenance Visualization
Page 54: Prov-O-Viz: Interactive Provenance Visualization
Page 55: Prov-O-Viz: Interactive Provenance Visualization

Discussion• Provenance is vital in many areas

government, science, industry, …

• PROV is the W3C standard for expressing provenance

• Provenance graphs can be overwhelming and complex

• PROV-O-Viz builds intuitive Sankey-style visualizations

• … for any provenance trace expressed using PROV

to2Data Semantics

Semantics for Scientific Data PublishersFrom Data

http://semweb.cs.vu.nl/provoviz

Thanks to: Paul Groth, Provenance XG, WG, Luc Moreau, James Cheney, Paolo Missier, Olaf Hartig, Satya Sahoo