visual analytics - empowering humans for knowledge discovery in big data (lero talk, 3rd december...

36
Visual Analytics: Empowering Humans for Knowledge Discovery in Big Data Dr. Nikola S. Nikolov Department of CSIS, UL 3 rd December 2014

Upload: nikola-s-nikolov

Post on 07-Aug-2015

94 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

Visual Analytics: Empowering Humans for

Knowledge Discovery in Big Data

Dr. Nikola S. NikolovDepartment of CSIS, UL

3rd December 2014

Page 2: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

2

Information Retrieval

Data Mining

Statistical Modelling

Knowledge Discovery

Machine Learning

Information Visualisation

Data Analytics

Data Visualisation

Predictive Analytics

Page 3: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

3

OutlineO Visual Analytics Overview

O What is visual analytics?O The visual analytics process and method

O Visual Analytics at CSISO Network VisualisationO Geospatial Visual AnalysisO Visual Text Mining

O Discussion

Page 4: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

4

I. Visual Analytics

Page 5: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

5

Exploding Digital Universe

http://www.emc.com/collateral/analyst-reports/idc-digital-universe-2014.pdf

Page 6: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

6

Exploding Digital Universe

O Problem: Management of Big Data

O Popular solution: Apache Hadoop software library which "is a framework that allows for the distributed processing of large data sets across clusters of computers… It is designed to scale up from single servers to thousands of machines, each offering local computation and storage." (hadoop.apache.org)

Page 7: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

7

Exploding Digital Universe

O Opportunity: Build more precise descriptive and predictive models of virtually all human activities and natural phenomena to…O satisfy our curiosityO take well-informed decisionsO improve quality of life

O Solution: data mining, i.e. "extraction of implicit, previously unknown and potentially useful information from data." (Witten and Frank, 2005)

Page 9: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

9

Data MiningO Input:

O single table with data, e.g. comma separated values

O data rows represent independent from each other instances/examples of a particular concept

O Output:O structural patterns (knowledge) discovered in

the dataO compact description of the conceptO summary of the data in novel ways that are both

understandable and useful to the data owner

Page 10: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

10

Data Mining

O Predictive/SupervisedO Classification techniquesO Numeric prediction techniques

O Descriptive/UnsupervisedO Association learning techniquesO Clustering techniques

AI approach

HCI approach

Page 11: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

11

Data Mining

O Predictive/SupervisedO Classification techniquesO Numeric prediction techniques

O Descriptive/UnsupervisedO Association learning techniquesO Clustering techniques

AI approach

HCI approach

InformationRetrieval

Page 12: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

12

Visual AnalyticsO Visual Analytics is the science of analytical

reasoning supported by a highly interactive visual interface (Thomas & Cook, 2005)

Minority Report, 2002 (Twentieth Century Fox)

Page 13: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

13

Visual AnalyticsO Visual Analytics is the science of analytical

reasoning supported by a highly interactive visual interface (Thomas & Cook, 2005)

DataKnowledg

e

Visualisation

Models

transformation

feedback loop

user interaction

refine

Page 14: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

14

Visualisation ProblemsO Intuitively the more data you have,

the better…

O Problems when visualising big data:O ClutterO PerformanceO Information lossO Limited cognition

Page 15: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

15

Visual Analytics Method

O Visual Information Seeking Mantra (Shneiderman, 1996)O Overview FirstO Zoom and FilterO Details on Demand

O Visual Analytics Mantra (Keim et al., 2010)O Analyse FirstO Show the ImportantO Zoom, Filter and Analyse FurtherO Details on Demand

Page 16: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

16

Visual Analytics Triangle

VisualAnalytic

s

Visualisation

InteractionData Analysis

Page 17: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

17

II. Our Work

Page 18: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

18

Visual Analytics at CSIS

O Network analysis: social, biological, technological networks

O Geospatial analyticsO [Personalised] Information RetrievalO Text Mining/Sentiment Analysis

Page 19: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

19

Network VisualisationO Also known as Graph DrawingO Probably the earliest and most successful

branch of Information Visualisation to be scientifically researched (since the 1980s).

O The Information Visualisation community appeared a bit laterO emerged "from research in human-

computer interaction, computer science, graphics, visual design, psychology and business methods". (Bederson and Shneiderman, 2003)

Page 20: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

20

Network VisualisationO Methods (two very successful among

many):O Force-directed drawing

O the graph modelled as a mechanical system of particles with forces of attraction and repulsion between them

O let the vertices/particles move so that the system reaches mechanical equilibrium

O Hierarchical drawing (Sugiyama method)O distribute vertices among multiple levelsO order vertices within each levelO finely tune the positions of the vertices

Page 21: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

21

Synchronisation Dynamics-Driven Graph Drawing

O Work of Farshad Ghassemi Toosi (started PhD in May 2013)

O General idea:O Assign random scalar values (dynamic values)

to all vertices of a graphO Simulate synchronisation dynamics on the graph

according to a variation of the Kuramoto modelO Use the evolution of the dynamic values to

compute a layout of the graphO Initial results published in the proceedings of

the international symposium of Graph Drawing 2014

Page 22: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

22

Synchronisation Dynamics-Driven Graph Drawing

Page 23: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

23

Synchronisation Dynamics-Driven Graph Drawing

Page 24: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

24

Synchronisation Dynamics-Driven Graph Drawing

Page 25: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

25

Synchronisation Dynamics-Driven Graph Drawing

Page 26: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

26

Synchronisation Dynamics-Driven Graph Drawing

O Properties of our layouts:O Exact Circular shapeO Even distribution of vertices over the

drawing areaO Synchronisation reveals the structure

of a complex network at various scales (Arenas, 2006)O Thus, synchronisation-driven

visualisation can be particularly suitable for visual analytics

Page 27: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

27

Visualisation of GitHub Data

O Work of Cathal Cronin (FYP, 2013/14)O http://language-connectivity.herokuap

p.com/

O Goal: visual analytics solution for monitoring how popular certain programming languages are and highlighting what combinations of programming languages are most used amongst the GitHub community.

Page 28: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

28

Geospatial Visual Analytics

O Work of Eimhear O'Brien (started PhD in October 2013)

O GIS analysis is a process for looking at geographic patterns in your data and at the relationships between features (Mitchel, 2005)

O Goal: Novel visual analytics solution for geospatial data

Page 29: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

29

Geospatial Visual Analytics

O Work to dateO Survey on big data managementO Survey on network visualisation algorithms

for cartographyO Identified an algorithm for a pilot study:

MapSetsO Dataset selection for a pilot experiment

O Irish Soil data (public dataset at teagasc.ie)O High number of features (multidimensional)O High volume and varietyO Suitable for assessing the MapSets technique for

cluster visualisation

Page 30: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

30

MapSet Steps

Algorithmic pipeline of MapSets (Efrat et al., 2014)http://www.cs.arizona.edu/~kobourov/mapsets.pdf

Page 31: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

31

Visual Text MiningO Azalden Alakrot (started PhD in

September 2014)O Performed an initial survey on text

miningO Initial goal: Visual analytics solution

for crime detection in online conversations (emails, comments, tweets, etc.).O Possibly focusing on cyberbullying

detection as a form of crime.

Page 32: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

32

Visualise Text?O Text visualisations:

O Network of words (textexture.com)O Word cloud (wordle.net)O ThemeRiverTM: thematic variations

over time within a large collection of documents (Havre et al, 2000)

Page 33: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

33

Word cloud for this presentation

Page 35: Visual Analytics - Empowering Humans for Knowledge Discovery in Big Data (Lero Talk, 3rd December 2014)

35

BibliographyO Eades. P: On the future of graph drawing. Invited talk at the 18th

International Symposium on Graph Drawing (September 24 2010), http://www.graphdrawing.org/gd2010/invited.html.

O Keim, D., Mansmann, F. and Thomas, J.: Visual analytics: how much visualization and how much analytics?. SIGKDD Explor. Newsl. 11, 2 (May 2010), pp. 5 - 8. 

O Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations, Visual Languages, 1996. Proceedings., IEEE Symposium on (1996), pp.336 - 343,

O Roberto Tamassia, editor. Handbook of Graph Drawing and Visualization, vol. 81 of Discrete Mathematics and Its Applications. Chapman and Hall/CRC (2013).

O Thomas, J., Cook, K.: Illuminating the Path: Research and Development Agenda for Visual Analytics. IEEE-Press (2005).

O Witten, I. H., Frank, E. and Hall, M. A., Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann (2011).