linked data visualization matt bernier joey murphy david coleman
TRANSCRIPT
Linked Data VisualizationLinked Data Visualization
Matt BernierMatt Bernier
Joey MurphyJoey Murphy
David ColemanDavid Coleman
Needs AnalysisNeeds Analysis
Allow users to view data sets Allow users to view data sets graphically using intuitive and graphically using intuitive and efficient controlsefficient controls
Specifically to view links among data Specifically to view links among data pointspoints
Contemporary methods include: Contemporary methods include: diagrams, graphs, and listsdiagrams, graphs, and lists
Enable users to perform analysis on Enable users to perform analysis on their datatheir data
Market AnalysisMarket Analysis
Linked data is present in several Linked data is present in several environments:environments:• Search engines (page ranking)Search engines (page ranking)• Social networks (recreational, academic, Social networks (recreational, academic,
professional)professional)• Other database-driven sitesOther database-driven sites• Computer networksComputer networks
Market AnalysisMarket Analysis
Market UsersMarket Users• Website owners (50-100 million active Website owners (50-100 million active
domains, multiple sites per domain)domains, multiple sites per domain)• Enterprise internal site managersEnterprise internal site managers• Social networks operators and users Social networks operators and users
(more than 200 sites online)(more than 200 sites online)• Network administratorsNetwork administrators
BackgroundBackground
Web sites supporting large amounts Web sites supporting large amounts of users are very popularof users are very popular
Finding common usage statistics can Finding common usage statistics can be very beneficialbe very beneficial• Purchasing similar productsPurchasing similar products• Participating in common discussionsParticipating in common discussions• Common browsing habitsCommon browsing habits
BackgroundBackground
Showing Web links Showing Web links • How websites link togetherHow websites link together• Visualizing the webVisualizing the web
Visualizing any linked data setsVisualizing any linked data sets
Linked Data ExampleLinked Data Example
An Existing ApplicationAn Existing Application
Create Random Nodes
Goals and ObjectivesGoals and Objectives
Overall goal is to create an intuitive Overall goal is to create an intuitive web based tool that allows users to web based tool that allows users to see links within their datasee links within their data
Allow users to analyze and infer Allow users to analyze and infer information from the linksinformation from the links
Making it easy for web programmers Making it easy for web programmers to implement the graph on their site to implement the graph on their site using a PHP class structureusing a PHP class structure
ToolsTools
HTML,CSS (data presentation)HTML,CSS (data presentation) PHP (data objects, processing)PHP (data objects, processing) JavaScript (graph creation, JavaScript (graph creation,
interaction)interaction)• JSViz (framework for dynamic views, JSViz (framework for dynamic views,
Force-directed algorithm creates a Force-directed algorithm creates a graph that is graph that is aesthetically pleasingaesthetically pleasing))
System DiagramSystem Diagram
Literature ReviewLiterature ReviewGeneral IdeasGeneral Ideas
Building graphs from data setsBuilding graphs from data sets Displaying dataDisplaying data Data analysisData analysis
• Examining and inferring relationships Examining and inferring relationships • PredictionPrediction• Application to real worldApplication to real world
Literature ReviewLiterature Review
Presenting data to usersPresenting data to users• Tree structures, Data -> InformationTree structures, Data -> Information• ““Inducing the chosen mental model in Inducing the chosen mental model in
the mind of the observer”the mind of the observer”• Easy to understandEasy to understand• Allows for more information to be Allows for more information to be
absorbed by observersabsorbed by observers
Aaron Kershenbaum and Keitha Murray. In Journal of Circuit Systems and Computers
Literature ReviewLiterature Review
Many theories and techniques for Many theories and techniques for graph analysis, but not constructiongraph analysis, but not construction
Choice of nodes and linksChoice of nodes and links• What is represented by a node?What is represented by a node?• What is represented by a link?What is represented by a link?• Greatly influence meaning in a linked Greatly influence meaning in a linked
data displaydata display• e.g. hyperlinks, Enron email datasete.g. hyperlinks, Enron email dataset
A. Badia and M. Kantardzic. In Proceedings of the 3rd international workshop on Link discovery LinkKDD '05
J. Shetty and J. Adibi. In KDD ’05
Literature ReviewLiterature Review
Link Mining – analyzing linksLink Mining – analyzing links• Makes use of descriptive and predictive Makes use of descriptive and predictive
modeling (data mining)modeling (data mining)• e.g. determining webpage relevance e.g. determining webpage relevance
based on anchor text and surrounding based on anchor text and surrounding text of incoming hyperlinkstext of incoming hyperlinks
• e.g. segregating website users into e.g. segregating website users into groups based on common behavioursgroups based on common behaviours
L. Getoor. In ACM SIGKDD Explorations Newsletter, Vol. 5, Issue 1, 2003
Literature ReviewLiterature Review
Link predictionLink prediction• Uses node proximityUses node proximity• ““Information about future interactions Information about future interactions
can be extracted from network topology can be extracted from network topology alone”alone”
• Predicting links that represent online Predicting links that represent online social interaction can help to determine social interaction can help to determine the feasibility of adding new interaction the feasibility of adding new interaction features to a sitefeatures to a site
D. Liben-Nowell and J. Kleinberg. In CIKM '03
Patent AnalysisPatent Analysis
Computer-implemented system and Computer-implemented system and method for handling linked data views, method for handling linked data views, Patent number 7,068,267, held by SAS Patent number 7,068,267, held by SAS Institute Inc.Institute Inc.• A first view and a second view are used to A first view and a second view are used to
display at least a portion of the data display at least a portion of the data observations contained in the data model. observations contained in the data model. Conditional data that is associated with the Conditional data that is associated with the second view specifies how the second view's second view specifies how the second view's display is modified based upon a selection of a display is modified based upon a selection of a data observation within the first view. data observation within the first view.
TimelineTimeline
AdvantagesAdvantages
Design allows for customization Design allows for customization Custom data objectsCustom data objects Almost all visual aspects of the graph Almost all visual aspects of the graph
are easily changed or left as default are easily changed or left as default settingssettings
DisadvantagesDisadvantages
Requires a network connection and a Requires a network connection and a browserbrowser• Or an Apache and PHP installation on a local Or an Apache and PHP installation on a local
machinemachine As dataset grows larger, application As dataset grows larger, application
performance may degradeperformance may degrade Possible Browser compatibility issuesPossible Browser compatibility issues
• These are typical web issues with HTML, These are typical web issues with HTML, JavaScript, and CSS renderingJavaScript, and CSS rendering
Requirements AnalysisRequirements Analysis
Functionality (performance)Functionality (performance) FlexibilityFlexibility
• Allow users and developers to customize Allow users and developers to customize and deploy application as they see fitand deploy application as they see fit
ReliabilityReliability• Provide an accurate data representationProvide an accurate data representation
QualityQuality• Provide a meaningful, visual Provide a meaningful, visual
representation of datarepresentation of data
Requirements AnalysisRequirements Analysis
Operating EnvironmentOperating Environment• Scripts:Scripts:
Run on a webserver with PHP (4.0+) Run on a webserver with PHP (4.0+) installationinstallation
Can interface with databasesCan interface with databases
• Users:Users: Cross-systemCross-system Cross-BrowserCross-Browser
Requirements AnalysisRequirements Analysis
InterfacesInterfaces• A PHP class is provided, and the data to A PHP class is provided, and the data to
be visualized is added by the user.be visualized is added by the user. Performance RequirementsPerformance Requirements
• Time required to produce display varies Time required to produce display varies with size of datasetwith size of dataset
• 1-10 seconds1-10 seconds• Restrict size of datasets to prevent Restrict size of datasets to prevent
browser/computer from sufferingbrowser/computer from suffering
Requirements AnalysisRequirements Analysis
ResourcesResources• Design was conceived prior to Design was conceived prior to
undertaking project, 10 man-hours to undertaking project, 10 man-hours to refine designrefine design
• Coding – 20 man-hoursCoding – 20 man-hours• Testing – 15 man-hoursTesting – 15 man-hours
DemoDemo
Example
FutureFuture
More complex displayMore complex display• Hyperlinks and/or pictures as nodesHyperlinks and/or pictures as nodes• Re-centering graph by clicking a nodeRe-centering graph by clicking a node• Mouse-over events for more detailMouse-over events for more detail