percon : a personal digital library for heterogeneous data center for the study of digital libraries...

31
PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas A&M University

Upload: cecilia-hardy

Post on 29-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

PerCon : A Personal Digital Library for Heterogeneous Data

Center for the Study of Digital LibrariesDepartment of Computer Science & Engineering

Texas A&M University

Page 2: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

2

Outline

• Background

• Motivation

• Objective

• Approach

• System Evaluation

• Results

• Conclusion

• Appendix

Page 3: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

3

Buzzword in CS

Big Data

Hadoop

Distributed computing

Machine Learning

Cloud

Multithread

Multicore

Web services

Social network

Platform

Crowdsourcing

Information retrieval

Agile Algorithm

Data science

Data mining

Page 4: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

4

Background

• Data explosion / Data-intensive scientific discovery

• Interdisciplinary researches• Advances in devices/sensors, software• …

• More data of more data types

• Demands on collecting, managing, and interpreting heterogeneous data

Page 5: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

5

Motivation

• Data management and analysis • Domain-specific representations, visualizations,

interfaces, tools, etc.• Separate “silos” for data of each data type

• Needs for a heterogeneous data environment • Ingesting, processing, and indexing data • Searching, browsing, visualizing, annotating, and

annotating data• Representing and sharing information and

knowledge • Facilitating interactions between a user and a

system with heterogeneous data

Page 6: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

6

Objective

• A digital library that supports the collection, management, and interpretation of unanticipated collections of data types

• PerCon: Personalized and Contextual Data Environment• A personal or small group digital library system for

data management and analysis

Page 7: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

7

PerCon: Designed Workflow

Data /QueryProcessing

Query Parser

T4T4T3 T3 T2T2T1T1Timestamp:

User Interface

Data Analysis

Database Repository

Application

Data Ingestion

Heterogeneous Scientific Dataset

User

System Resource

User Information/Knowledge SpaceSystem

Web Server

DataProcessing

Domain-dependent feature space

Cross-domain feature space

Personalized feature space

Feature / Knowledge Space : Data flow

: Query flow

Page 8: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

8

System Architecture

Resource Layer: Original data objects, computed/filtered datasets, and metadata.

Middleware Layer: Data ingestion, access, automated analysis, visualization, workspace, etc.

Application Layer: User interfaces, external systems access

Page 9: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

9

System Interfaces

Menu &Toolbars

Visualworkspace

Repository Viewer

SuggestionHistory

Page 10: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

10

Workspace

• Exploration and representation of data with visual and spatial attributes

• Translation of data into information in multiple representations

• Knowledge discovery from information • Data object model : multiple applicable data

visualization

System Base (Object) Panel- Visual and spatial attributes- User expression for data

interpretation

User Application (Object) Panel- Individual visualization/application- Application-specific interaction

Page 11: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

11

• Interoperation with history mechanism

• Event records used for mixed-initiative interaction

• Representation of any Java application as individual data objects

Integrated Visual Workspace

Page 12: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

12

Mixed-Initiative Interaction

• User-control system

• Menu, toolbar, button, etc.

• System-control system

• Automated system

• Ex) Call center

• Mixed-control system

• Turn-taking & Alternating control

• High computation + high interpretation

• Recommender system

Page 13: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

13

Recommendation in PerCon

• Inference of user interests depending user behaviors/events/tasks/goals

• Location and recommendation of related data within the current collection

Page 14: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

14

Procedures

1. Building feature space

(Understanding relationships in data)

2. Recording workspace events (history)

3. Inferring user interests

using probabilistic networks

4. Selecting relevant data

5. Recording user’s acceptance/rejection

6. Adopting user feedback

Page 15: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

15

Data Analysis Agent

• Internal process for mixed-initiative interaction

Relationship(Similarity, Distance, etc.)

Metadata

AgentUser

Dataset

Inference(Probabilistic)

Network

Suggestion

Feedback

User Events

TrainingInference

Processing

Suggestion History

Feedback Recording

Mixed-initiative interaction

Update

Index

Index

Data Matrix

Data ID TypeDate

User Activity / History

Visual AttributesSpatial Attributes

Annotation

Query Exploration

Workspace Monitoring

Suggestion Request

Page 16: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

16

Probabilistic Inference Network

River level – river levelPrecipitation - precipitationRiver level – Precipitation …. . .

Hidden variable

ADD_SYMBOLDELETE_SYMBOLMOVE_SYMBOL RESIZE_SYMBOLCHANGE_BORDER_COLOR …

Data source(s) that a user is interested in

Visual attribute1 ( Background color)

Activity in workspace

Data relationships between data objectsexplored

Data application

PlotTimelineMultimedia playerXML viewerDB viewerCalendar viewer. . .

BlueRedBlackGreen. . .

Visual attribute2 (Border Color)

BlueRedBlackGreen. . .

Data source creation

River level River dischargePrecipitationTemperatureHumidity…

River level River dischargePrecipitationTemperature Humidity…

Annotation

Yes No

Observable variable

Page 17: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

17

System Evaluation - User Study

• Hypotheses

• H1: Visual workspace helps a user to manage data and to translate data into

knowledge about the domain

• H2: Mixed-initiative recommendations improve a user’s ability to explore and

analyze data

• 24 Participants

• 1 undergraduate, 4 Masters, 16 PhD students, and 3postdoctoral researchers

• Age from 24 to 36

• Various disciplines

• Computer science, computer engineering, electrical engineering, soil hydrology,

biomedical engineering, industrial engineering, and management information systems

Page 18: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

18

Domain Data for Participant Analysis

• Two years of weather and river data (from 2011 to 2013)• Weather data from NOAA

• Temperature, precipitation, relative humidity, wind speed, and wet bulb temperature

• River data from Brazos River Authority in Texas • River level and discharge

• Two equivalent “weather and river” datasets • Dataset 1 collected from College Station, Waco, and Temple• Dataset2 collected from South Bend, Seymour, and Fort Griffin

Upstream

Downstream

Page 19: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

19

Tasks

• Task1 (20 minutes): Classifying and organizing data

• Organize and classify river level and precipitation data according to

common trends, quantities, durations, or other user-perceived criteria.

• Task2 (10 minutes): Investigating and identifying data correlation

• Investigate what and how weather factor(s) affects river level.

• Investigate how rivers at different places are correlated.

• Task3 (5 minutes): Interpreting and estimating river data events/causes

• Estimate the (average) time delay regarding the flow if you find any

• Explain the changes considering weather factors and other river stream

flows

Page 20: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

20

System Conditions

Configuration 1 Configuration 2

Configuration 3 Configuration 4

WO/ Visual Workspace

WO/ Mixed-Initiative Recommendation W/ Mixed-Initiative Recommendation

W/ Visual Workspace

Page 21: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

21

System Conditions

Page 22: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

22

Task Procedure (120 minutes)

• Participants learning with a user manual + 10 minute video clip• 5 minutes trial of how to use PerCon• Tasks (Task 1,2, and 3)

Group Subgroup Tasks with dataset 1 Tasks with dataset 2

Group 1A Configuration 1 Configuration 2

B Configuration 2 Configuration 1

Group 2 A Configuration 1 Configuration 3

B Configuration 3 Configuration 1

Group 3A Configuration 1 Configuration 4

B Configuration 4 Configuration 1

Group 4 A Configuration 2 Configuration 3

B Configuration 3 Configuration 2

Group 5A Configuration 2 Configuration 4

B Configuration 4 Configuration 2

Group 6A Configuration 3 Configuration 4

B Configuration 4 Configuration 3

Page 23: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

23

Responses to Questions Related to Workspace

Q1 Q2 Q3 Q40

1

2

3

4

5

6

7

Visual Workspace

config 1confg 2config 3config 4Sc

ore

Statements

Q1I had enough support to understand data content in the workspace

Q2 I had enough support to express relationships in the way I wanted

Q3 It was easy to interpret and characterize given/created objects in the workspace

Q4I had enough support to effortlessly / quickly browse and select data

Page 24: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

24

Responses to Questions Related to Recommendations

Q5 Q6 Q7 Q80

1

2

3

4

5

6

7

Mixed-initiative interaction

Config 2Config 4

Scor

e

Statements

Q5 I was satisfied with the data suggested

Q6 I was satisfied with the suggestion request

Q7 I had enough support to find and interpret data I was interested in

Q8 I had enough support to find correlations within the dataset

Page 25: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

25

Participant Work Practices

Config 1 Config 2 Config 3 Config 40

5

10

15

20

25

30

35

40

45

50

Avg. number of data objects classified/analyzed

Avg. # of data objects

Avg

. # o

f dat

a ob

ject

s

Page 26: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

26

Distribution of Activities

• Ordering of user events in repository browser and workspace shows distinct patterns of work

User 1

User 2

User 3

User 4

Config. 1 (without visual workspace) Config. 3 (with visual workspace)

Page 27: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

27

A Sequence of Recommendation Events

User1

User2

User3

User4

User5

User6

User7

User8

User9

User10

User11

User12

User-Requested System-Triggered

Page 28: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

28

History in PerCon

Storing and processing data Visualizing data

Managing and analyzing data Human activities of locating, annotating, and interpreting data / Data platform

Page 29: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

29

Conclusion

• Workspace has a large effect on data (analysis) practice• Recommendation overcomes the difficulty of locating

data for users • Visual workspace

• Facilitates information representation • Aids in identification and interpretation of relationships

between datasets• Helps users learn, solve problems, and make decisions

• Mixed-initiative interaction(recommendation)• Encourages users to explore data • Leads to identify more evidence of correlation among

datastreams• Is valuable for data analysis

Page 30: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

30

Future Work

• Improvement of workspace interactions

• More dynamic and tailorable data visualization

• Expansion of recommendation subsystem

• Cross-domain/cross-data-type similarities in the workspace

• Various similarity metrics

• Recommendation algorithms

• Exploration the user of PerCon

• In new domains

• With new user communities.

Page 31: PerCon : A Personal Digital Library for Heterogeneous Data Center for the Study of Digital Libraries Department of Computer Science & Engineering Texas

31

Question?