visual analytic tools for monitoring and understanding the emergence and evolution of innovations in...
DESCRIPTION
Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology. Links from this talk: bit.ly/ stmwant. Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab, University of Maryland [email protected] - PowerPoint PPT PresentationTRANSCRIPT
1
Visual analytic tools for monitoring and understanding the emergence and evolution
of innovations in science & technologyCody Dunne
Dept. of Computer Science and Human-Computer Interaction Lab,
University of [email protected]
OECD KNOWINNO WorkshopNovember 14-15, 2011 Alexandria, VA, USA
Links from this talk:
bit.ly/stmwant
2
Outline
1. Academic literature exploration2. Case study: Tree visualization techniques3. Case study: Business intelligence news4. Case study: Pennsylvania innovations5. STICK approach
3
1. Academic literature exploration
Users are looking for:1. Foundations2. Emerging research topics3. State of the art/open problems4. Collaborations & relationships between
Communities5. Field evolution6. Easily understandable surveys
4
Action Science Explorer
5
User requirements• Control over the paper collection– Choose custom subset via query, then iteratively drill down,
filter, & refine• Overview either as visualization or text statistics– Orient within subset
• Easy to understand metrics for identifying interesting papers– Ranking & filtering
• Create groups & annotate with findings– Organize discovery process– Share results
6
Action Science Explorer
• Bibliometric lexical link mining to create a citation network and citation context
• Network clustering and multi-document summarization to extract key points
• Potent network analysis and visualization tools
www.cs.umd.edu/hcil/ase
7
2. Case study: Tree visualization
• Problem: Traditional 2D node-link diagrams of trees become too large
• Solutions:– Treemaps: Nested Rectangles– Cone Trees: 3D Interactive Animations– Hyperbolic Trees: Focus + Context
• Measures:– Papers, articles, patents, citations,…– Press releases, blog posts, tweets,…– Users, downloads, sales,…
8
Treemaps: nested rectangles
www.cs.umd.edu/hcil/treemap-history
9
Smartmoney MarketMap Feb 27, 2007
smartmoney.com/marketmap
10
Cone trees: 3D interactive animations
Robertson, G. G., Card, S. K., and Mackinlay, J. D., Information visualization using 3D interactive animation, Communications of the ACM, 36, 4 (1993), 51-71.
Robertson, G. G., Mackinlay, J. D., and Card, S. K., Cone trees: Animated 3D visualizations of hierarchical information, Proc. ACM SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York, (April 1991), 189-194.
11
Hyperbolic trees: focus & context
Lamping, J. and Rao, R., Laying out and visualizing large trees using a hyper-bolic space, Proc. 7th Annual ACM symposium on User Interface Software and Technology, ACM Press, New York (1994), 13-14.
Lamping, J., Rao, R., and Pirolli, P., A focus+context technique based on hy-perbolic geometry for visualizing large hierarchies, Proc. SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York (1995), 401-408.
12
Tree visualization publishingTM=TreemapsCT=Cone TreesHT=Hyperbolic Trees
Trad
e Pr
ess
Artic
les
Acad
emic
Pa
pers
Pate
nts
13
Tree visualization citationsTM=TreemapsCT=Cone TreesHT=Hyperbolic Trees
Acad
emic
Pa
pers
Pate
nts
14
Insights
• Emerging ideas may benefit from open access• Compelling demonstrations with familiar
applications help• Many components to commercial success• 2D visualizations w/spatial stability successful• Term disambiguation & data cleaning are hard
Shneiderman, B., Dunne, C., Sharma, P. & Wang, P. (2011), "Innovation trajectories for information visualizations: Comparing treemaps, cone trees, and hyperbolic trees", Information Visualization. http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-16
15
3. Case study: Business intelligence newsProquest 2000-2009
Term Frequency Term Frequency
hyperion 3122 decision support system 39
data mining 889 business process reengineering 36
business intelligence 434 data mart 29
knowledge mgmt. 221 business analytics 21
data warehouse 207 text mining 19
data warehousing 139 predictive analytics 18
cognos 112 business performance mgmt 6
competitive intelligence 86 online analytical processing 5
electronic data itrch. 69 knowledge discovery in database 1
meta data 69 ad hoc query 1
PQ Business Intelligence 2000-2009Co-occurrence of concepts with organizations
Year
Freq
uenc
y
Data Mining• National Security Agency• NSA• White House• FBI• AT&T• American Civil Liberties Union• Electronic Frontier Foundation• Dept. of Homeland Security• CIA
18
Business Intelligence2000-2009Matrix showing Co-Occurrence of concepts and orgs.
19
Business Intelligence2000-2009:(subset)
20
Business Intelligence2000-2009:Data mining• NSA• CIA• FBI• White House• Pentagon• DOD• DHS• AT&T• ACLU• EFF• Senate Judiciary
Committee
21
Business Intelligence2000-2009:Tech1 • Google• Yahoo• Stanford• AppleTech2• IBM, Cognos• Microsoft• OracleFinance• NASDAQ• NYSE• SEC• NCR• MicroStrategy
22
Business Intelligence2000-2009:• Air Force• Army• Navy• GSA• UMD*
23
Insights
• Useful groupings in PQ BI terms based on events and long-term collaborators
• Interactive line charts useful for looking at co-occurrence relationships over time
• Clustered heatmaps useful for overall co-occurrence relationships
stick.ischool.umd.edu
24
4. Case study: Pennsylvania innovations
• Innovation relationships during 1990– State & federal funding– Patents (both strong and weak ties)– Location
• Connecting– State & federal agencies– Universities– Firms– Inventors
PatentTech
SBIR (federal)PA DCED (state)
Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
PatentTech
SBIR (federal)PA DCED (state)
Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
PatentTech
SBIR (federal)PA DCED (state)
Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
Pittsburgh Metro
Westinghouse Electric
Pharmaceutical/Medical
No Location Philadelphia
Navy
PatentTech
SBIR (federal)PA DCED (state)
Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
Pittsburgh Metro
Westinghouse Electric
Pharmaceutical/Medical
No Location Philadelphia
Navy
29
Insights
• Meta-layouts useful for showing:– Groups (clusters, attributes, manual)– Relationships between them
• User comments– “We've never been able to see anything like this“– “This is going to be huge"
www.terpconnect.umd.edu/~dempy/
5. STICK approach
• NSF SciSIP Program– Science of Science & Innovation Policy– Goal: Scientific approach to science policy
• The STICK Project– Science & Technology Innovation Concept
Knowledge-base– Goal: Monitoring, Understanding, and Advancing
the (R)Evolution of Science & Technology Innovations
31
STICK approach cont…
• Scientific, data-driven way to track innovations– Vs. current expert-based, time consuming
approaches (e.g., Gartner’s Hype Cycle, tire track diagrams)
• Includes both concept and product forms– Study relationships between
• Study the innovation ecosystem– Organizations & people– Both those producing & using innovations
stick.ischool.umd.edu
32
STICK Process (overview)
• News • Dissertation• Academic
• Patent
• Blogs
• Identify concepts• Business intelligence, cloud
computing, customer relationship management, health IT, web 2.0, electronic health records, biotech
• Query data sources• Processing
• Automatic entity recognition• Crowd-sourced verification• Co-occurrence networks
• Visualizing & analyzing• Overall statistics• Co-occurrence networks• Network evolution
• Sharing results
Process
1. Collecting2. Processing3. Visualizing & Analyzing4. Collaborating
Cleaning
Collecting
Identify Concepts• Begin with target concepts
– Business Intelligence– Health IT– Cloud Computing– Customer Relationship
Management– Web 2.0– Personal Health Records– Nanotechnology
• Develop 20-30 sub concepts from domain experts, wikis
Data Sources• News • Dissertation• Academic
• Patent
• Blogs
Collecting (2)• Form & Expand Queries
ABS("customer relationship management" OR"customers relationship management" OR"customer relation management"
) OR TEXT(…) OR SUB(…) OR TI(…)
• Scrape Results
ProcessingAutomatic Entity Recognition• BBN IdentiFinder
Crowd-Sourced Verification• Extract most frequent 25%• Assign to CrowdFlower
– Workers check organization names and sample sentences
Processing (2)• Compute Co-Occurrence Networks– Overall edge weights– Slice by time to see network evolution
• Output
CSV GraphML
Visualizing & AnalyzingSpotfire• Import CSV, Database• Standard charts• Multiple coordinated views• Highly scalable
NodeXL• CSV, Spigots, GraphML• Automate feature
– Batch analysis & visualization• Excel 2007/2010 template
39
Shared data & analysis repositories
stick.ischool.umd.edu/community
• Online Research Community• Share data, tools, results
– Data & analysis downloads– Spotfire Web Player
• Communication• Co-creation, co-authoring
Ongoing WorkCollecting: Additional data sources and queries
Processing: Improving entity recognition accuracy
Visualizing & Analyzing:
Visualizing network evolution• Co-occurrence network sliced by time
Collaborating: Develop the STICK Open Community site• Motivate user participation• Improve the resources available• Invitation-only testing
41
Outline
1. Academic literature exploration– Citation networks and text summarization
2. Case study: Tree visualization techniques– Papers, patents, and trade press articles
3. Case study: Business intelligence news– News term co-occurrence
4. Case study: Pennsylvania innovations– Patents, funding, and locations
5. STICK approach– Tracking innovations across papers, patents, news articles, and
blog posts
Take Away Messages
• Easier scientific, data-driven innovation analysis:– Automatic collection & processing of innovation data– Easy access to visual analytic tools for finding clusters,
trends, outliers– Communities for sharing data, tools, & results
43
Visual analytic tools for monitoring and understanding the emergence and evolution
of innovations in science & technologyCody Dunne
Dept. of Computer Science and Human-Computer Interaction Lab,
University of [email protected]
This work has been partially supported by NSF grants IIS 0705832 (ASE) and
SBE 0915645 (STICK)
Links from this talk:
bit.ly/stmwant