visual analytic tools for monitoring and understanding the emergence and evolution of innovations in...

Post on 25-Feb-2016

27 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology. Links from this talk: bit.ly/ stmwant. Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab, University of Maryland cdunne@cs.umd.edu - PowerPoint PPT Presentation

TRANSCRIPT

1

Visual analytic tools for monitoring and understanding the emergence and evolution

of innovations in science & technologyCody Dunne

Dept. of Computer Science and Human-Computer Interaction Lab,

University of Marylandcdunne@cs.umd.edu

OECD KNOWINNO WorkshopNovember 14-15, 2011 Alexandria, VA, USA

Links from this talk:

bit.ly/stmwant

2

Outline

1. Academic literature exploration2. Case study: Tree visualization techniques3. Case study: Business intelligence news4. Case study: Pennsylvania innovations5. STICK approach

3

1. Academic literature exploration

Users are looking for:1. Foundations2. Emerging research topics3. State of the art/open problems4. Collaborations & relationships between

Communities5. Field evolution6. Easily understandable surveys

4

Action Science Explorer

5

User requirements• Control over the paper collection– Choose custom subset via query, then iteratively drill down,

filter, & refine• Overview either as visualization or text statistics– Orient within subset

• Easy to understand metrics for identifying interesting papers– Ranking & filtering

• Create groups & annotate with findings– Organize discovery process– Share results

6

Action Science Explorer

• Bibliometric lexical link mining to create a citation network and citation context

• Network clustering and multi-document summarization to extract key points

• Potent network analysis and visualization tools

www.cs.umd.edu/hcil/ase

7

2. Case study: Tree visualization

• Problem: Traditional 2D node-link diagrams of trees become too large

• Solutions:– Treemaps: Nested Rectangles– Cone Trees: 3D Interactive Animations– Hyperbolic Trees: Focus + Context

• Measures:– Papers, articles, patents, citations,…– Press releases, blog posts, tweets,…– Users, downloads, sales,…

8

Treemaps: nested rectangles

www.cs.umd.edu/hcil/treemap-history

9

Smartmoney MarketMap Feb 27, 2007

smartmoney.com/marketmap

10

Cone trees: 3D interactive animations

Robertson, G. G., Card, S. K., and Mackinlay, J. D., Information visualization using 3D interactive animation, Communications of the ACM, 36, 4 (1993), 51-71.

Robertson, G. G., Mackinlay, J. D., and Card, S. K., Cone trees: Animated 3D visualizations of hierarchical information, Proc. ACM SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York, (April 1991), 189-194.

11

Hyperbolic trees: focus & context

Lamping, J. and Rao, R., Laying out and visualizing large trees using a hyper-bolic space, Proc. 7th Annual ACM symposium on User Interface Software and Technology, ACM Press, New York (1994), 13-14.

Lamping, J., Rao, R., and Pirolli, P., A focus+context technique based on hy-perbolic geometry for visualizing large hierarchies, Proc. SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York (1995), 401-408.

12

Tree visualization publishingTM=TreemapsCT=Cone TreesHT=Hyperbolic Trees

Trad

e Pr

ess

Artic

les

Acad

emic

Pa

pers

Pate

nts

13

Tree visualization citationsTM=TreemapsCT=Cone TreesHT=Hyperbolic Trees

Acad

emic

Pa

pers

Pate

nts

14

Insights

• Emerging ideas may benefit from open access• Compelling demonstrations with familiar

applications help• Many components to commercial success• 2D visualizations w/spatial stability successful• Term disambiguation & data cleaning are hard

Shneiderman, B., Dunne, C., Sharma, P. & Wang, P. (2011), "Innovation trajectories for information visualizations: Comparing treemaps, cone trees, and hyperbolic trees", Information Visualization. http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-16

15

3. Case study: Business intelligence newsProquest 2000-2009

Term Frequency Term Frequency

hyperion 3122 decision support system 39

data mining 889 business process reengineering 36

business intelligence 434 data mart 29

knowledge mgmt. 221 business analytics 21

data warehouse 207 text mining 19

data warehousing 139 predictive analytics 18

cognos 112 business performance mgmt 6

competitive intelligence 86 online analytical processing 5

electronic data itrch. 69 knowledge discovery in database 1

meta data 69 ad hoc query 1

PQ Business Intelligence 2000-2009Co-occurrence of concepts with organizations

Year

Freq

uenc

y

Data Mining• National Security Agency• NSA• White House• FBI• AT&T• American Civil Liberties Union• Electronic Frontier Foundation• Dept. of Homeland Security• CIA

18

Business Intelligence2000-2009Matrix showing Co-Occurrence of concepts and orgs.

19

Business Intelligence2000-2009:(subset)

20

Business Intelligence2000-2009:Data mining• NSA• CIA• FBI• White House• Pentagon• DOD• DHS• AT&T• ACLU• EFF• Senate Judiciary

Committee

21

Business Intelligence2000-2009:Tech1 • Google• Yahoo• Stanford• AppleTech2• IBM, Cognos• Microsoft• OracleFinance• NASDAQ• NYSE• SEC• NCR• MicroStrategy

22

Business Intelligence2000-2009:• Air Force• Army• Navy• GSA• UMD*

23

Insights

• Useful groupings in PQ BI terms based on events and long-term collaborators

• Interactive line charts useful for looking at co-occurrence relationships over time

• Clustered heatmaps useful for overall co-occurrence relationships

stick.ischool.umd.edu

24

4. Case study: Pennsylvania innovations

• Innovation relationships during 1990– State & federal funding– Patents (both strong and weak ties)– Location

• Connecting– State & federal agencies– Universities– Firms– Inventors

PatentTech

SBIR (federal)PA DCED (state)

Related patent

2: Federal agency

3: Enterprise

5: Inventors

9: Universities

10: PA DCED

11/12: Phil/Pitt metro cnty

13-15: Semi-rural/rural cnty

17: Foreign countries

19: Other states

PatentTech

SBIR (federal)PA DCED (state)

Related patent

2: Federal agency

3: Enterprise

5: Inventors

9: Universities

10: PA DCED

11/12: Phil/Pitt metro cnty

13-15: Semi-rural/rural cnty

17: Foreign countries

19: Other states

PatentTech

SBIR (federal)PA DCED (state)

Related patent

2: Federal agency

3: Enterprise

5: Inventors

9: Universities

10: PA DCED

11/12: Phil/Pitt metro cnty

13-15: Semi-rural/rural cnty

17: Foreign countries

19: Other states

Pittsburgh Metro

Westinghouse Electric

Pharmaceutical/Medical

No Location Philadelphia

Navy

PatentTech

SBIR (federal)PA DCED (state)

Related patent

2: Federal agency

3: Enterprise

5: Inventors

9: Universities

10: PA DCED

11/12: Phil/Pitt metro cnty

13-15: Semi-rural/rural cnty

17: Foreign countries

19: Other states

Pittsburgh Metro

Westinghouse Electric

Pharmaceutical/Medical

No Location Philadelphia

Navy

29

Insights

• Meta-layouts useful for showing:– Groups (clusters, attributes, manual)– Relationships between them

• User comments– “We've never been able to see anything like this“– “This is going to be huge"

www.terpconnect.umd.edu/~dempy/

5. STICK approach

• NSF SciSIP Program– Science of Science & Innovation Policy– Goal: Scientific approach to science policy

• The STICK Project– Science & Technology Innovation Concept

Knowledge-base– Goal: Monitoring, Understanding, and Advancing

the (R)Evolution of Science & Technology Innovations

31

STICK approach cont…

• Scientific, data-driven way to track innovations– Vs. current expert-based, time consuming

approaches (e.g., Gartner’s Hype Cycle, tire track diagrams)

• Includes both concept and product forms– Study relationships between

• Study the innovation ecosystem– Organizations & people– Both those producing & using innovations

stick.ischool.umd.edu

32

STICK Process (overview)

• News • Dissertation• Academic

• Patent

• Blogs

• Identify concepts• Business intelligence, cloud

computing, customer relationship management, health IT, web 2.0, electronic health records, biotech

• Query data sources• Processing

• Automatic entity recognition• Crowd-sourced verification• Co-occurrence networks

• Visualizing & analyzing• Overall statistics• Co-occurrence networks• Network evolution

• Sharing results

Process

1. Collecting2. Processing3. Visualizing & Analyzing4. Collaborating

Cleaning

Collecting

Identify Concepts• Begin with target concepts

– Business Intelligence– Health IT– Cloud Computing– Customer Relationship

Management– Web 2.0– Personal Health Records– Nanotechnology

• Develop 20-30 sub concepts from domain experts, wikis

Data Sources• News • Dissertation• Academic

• Patent

• Blogs

Collecting (2)• Form & Expand Queries

ABS("customer relationship management" OR"customers relationship management" OR"customer relation management"

) OR TEXT(…) OR SUB(…) OR TI(…)

• Scrape Results

ProcessingAutomatic Entity Recognition• BBN IdentiFinder

Crowd-Sourced Verification• Extract most frequent 25%• Assign to CrowdFlower

– Workers check organization names and sample sentences

Processing (2)• Compute Co-Occurrence Networks– Overall edge weights– Slice by time to see network evolution

• Output

CSV GraphML

Visualizing & AnalyzingSpotfire• Import CSV, Database• Standard charts• Multiple coordinated views• Highly scalable

NodeXL• CSV, Spigots, GraphML• Automate feature

– Batch analysis & visualization• Excel 2007/2010 template

39

Shared data & analysis repositories

stick.ischool.umd.edu/community

• Online Research Community• Share data, tools, results

– Data & analysis downloads– Spotfire Web Player

• Communication• Co-creation, co-authoring

Ongoing WorkCollecting: Additional data sources and queries

Processing: Improving entity recognition accuracy

Visualizing & Analyzing:

Visualizing network evolution• Co-occurrence network sliced by time

Collaborating: Develop the STICK Open Community site• Motivate user participation• Improve the resources available• Invitation-only testing

41

Outline

1. Academic literature exploration– Citation networks and text summarization

2. Case study: Tree visualization techniques– Papers, patents, and trade press articles

3. Case study: Business intelligence news– News term co-occurrence

4. Case study: Pennsylvania innovations– Patents, funding, and locations

5. STICK approach– Tracking innovations across papers, patents, news articles, and

blog posts

Take Away Messages

• Easier scientific, data-driven innovation analysis:– Automatic collection & processing of innovation data– Easy access to visual analytic tools for finding clusters,

trends, outliers– Communities for sharing data, tools, & results

43

Visual analytic tools for monitoring and understanding the emergence and evolution

of innovations in science & technologyCody Dunne

Dept. of Computer Science and Human-Computer Interaction Lab,

University of Marylandcdunne@cs.umd.edu

This work has been partially supported by NSF grants IIS 0705832 (ASE) and

SBE 0915645 (STICK)

Links from this talk:

bit.ly/stmwant

top related