the role of data in is research

54
Click to edit Master title style The Role of Data in IS Research Frank Hopfgartner University of Glasgow @OkapiBM25

Upload: frank-hopfgartner

Post on 16-Apr-2017

304 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: The Role of Data in IS Research

Click to edit Master title style

The Role of Data in IS Research

Frank Hopfgartner

University of Glasgow

@OkapiBM25

Page 2: The Role of Data in IS Research

Click to edit Master title styleQuestion

Do you use a

dataset for your

research?

Page 3: The Role of Data in IS Research

Click to edit Master title styleIntended Learning Outcome

• By the end of this session, you will be able to

– Explain the need for datasets for scientific research

– List components that comprise test collections

– Identify appropriate datasets to answer research hypotheses

– Create your own test collections

Page 4: The Role of Data in IS Research

Click to edit Master title styleOutline

• Importance of Data

• Getting Data

• Using Datasets for IS Research

Page 5: The Role of Data in IS Research

Click to edit Master title styleWhy do we use data?

Because it helps us

to understand our

world

Page 6: The Role of Data in IS Research

Click to edit Master title styleExample:

Ngram Viewer

Source: https://books.google.com/ngrams

Page 7: The Role of Data in IS Research

Click to edit Master title styleExample:

Online publishing

D. Corney, D. Albakour, M. Martinez, S. Moussa

“What do a Million News Articles look like?” in Proc. NewsIR’16, pp. 42-47, 2016.

Sampling from over 93,000 different news sources recorded in September 2015

Large-scale main News outlets

Single-author Blogs

Page 8: The Role of Data in IS Research

Click to edit Master title styleSummarising:

Types of data

Quantitative & Qualitative

Numeric and Textual

Comparison (like with like)

Context

Point-in-time

Longitudinal (series and interval)

Page 9: The Role of Data in IS Research

Click to edit Master title styleOutline

• Importance of Data

• Getting Data

• Using Datasets for IS Research

Page 10: The Role of Data in IS Research

Click to edit Master title styleExample:

Opening UK Government

Source: https://data.gov.uk/

Page 11: The Role of Data in IS Research

Click to edit Master title styleExample:

UK Data Archive

Over 5,000 data

collections

Largely economic

and social

Founded in 1967

Office of National

Statistics

Medical Research

Council

http://www.data-archive.ac.uk/

Page 12: The Role of Data in IS Research

Click to edit Master title styleExample:

UK Data Service

https://www.ukdataservice.ac.uk

large-scale

government surveys

international

macrodata

business microdata

qualitative studies

census data from

1971 to 2011

Page 13: The Role of Data in IS Research

Click to edit Master title styleNon-Public Data

Example: Google Trends

https://www.google.com/trends/home/all/GB

Page 14: The Role of Data in IS Research

Click to edit Master title styleQuestion

But what if I want to

analyse non-public

data?

Page 15: The Role of Data in IS Research

Click to edit Master title styleSome people just hack…

http://www.theguardian.com/news/2016/apr/03/what-you-need-to-know-about-the-panama-papers

Disclaimer: This is not an appeal to perform any illegal activities.

Page 16: The Role of Data in IS Research

Click to edit Master title styleCreate your own data

• Record data, e.g.,

– Log files of users using information access systems

– Sensor records

– Digitise documents (accepting copyright)

– …

Page 17: The Role of Data in IS Research

Click to edit Master title styleExample:

Campus wide IPTV provider

• Campus wide IPTV provider

• Live and VoD content

• 16 genres

• 33 channels

• Over 7000 different programme names

• Over 500 unique users

J. Yuan, F. Sikrivaya, F. Hopfgartner, A. Lommatzsch, M. Mu. Context-Aware LDA: Balancing Relevance and Diversity in TV Content

Recommenders. In Proc. RecSysTV workshop, Vienna, Austria, 2015.

Page 18: The Role of Data in IS Research

Click to edit Master title style

1

2

3

4

5

6

7

0246810121416182022

ARTS

CHILDRENS

COMEDY

DRAMA

ENTERTAINMENT

FACTUAL

FILM

LEARNING

LIFESTYLE

MUSIC

NEWS

NULL

RELIGIONANDETHICS

SPORT

SPORTS

WEATHER

day of w eek

Category Distribution

time of day

cate

gories

categories chosen count

20

40

60

80

100

120

140

Example:

Log user interaction data

J. Yuan, F. Sikrivaya, F. Hopfgartner, A. Lommatzsch, M. Mu. Context-Aware LDA: Balancing Relevance and Diversity in TV Content

Recommenders. In Proc. RecSysTV workshop, Vienna, Austria, 2015.

Page 19: The Role of Data in IS Research

Click to edit Master title styleExample:

Video retrieval platform

F. Hopfgartner, D. Scott, H. Wang, Y. Yang, Z. Zhang, M. Zhou, C. gurrin. Helping the Helpers: How Video Retrieval Can Assist

Special Interest Groups. In Proc. MMM'13: 19th International Conference on Multimedia Modeling, pp. 493-495, 2013.

Page 20: The Role of Data in IS Research

Click to edit Master title style

F. Hopfgartner and J. M. Jose. Semantic User Profiling Techniques for personalised multimedia recommendation. Multimedia Systems 14(4-5):255-

274, 2010.

F. Hopfgartner and J. M. Jose. An experimental evaluation of ontology-based user profiles. Multimedia Tools and Applications 73(2):1029-1051,

2014.

Page 21: The Role of Data in IS Research

Click to edit Master title styleSummarising:

What do I need to consider?

Documentation

Terms of deposit

Permissions and re-use

Software

Methodology

Time

Place

Sampling

Data collection

Editorial control

Classification

Coding

21

Page 22: The Role of Data in IS Research

Click to edit Master title styleOutline

• Importance of Data

• Getting Data

• Using Datasets for IS Research

Page 23: The Role of Data in IS Research

Click to edit Master title styleUse Case: Evaluation of

Information Access Systems

Information Access System

Input

Output

Page 24: The Role of Data in IS Research

Click to edit Master title styleExamples:

Web Search Engines

Page 25: The Role of Data in IS Research

Click to edit Master title styleExample:

Social Media Search Engines

Page 26: The Role of Data in IS Research

Click to edit Master title styleExample:

Product Search Engines

26

Page 27: The Role of Data in IS Research

Click to edit Master title styleExamples:

Multimedia Search Engines

Page 28: The Role of Data in IS Research

Click to edit Master title styleExample:

Libraries

Page 29: The Role of Data in IS Research

Click to edit Master title styleHow do we evaluate

information access systems?

Document

collection

Topic

set

Relevance

assessments

Test colle

ction

Document

collection

But how can we compare with state-of-the-art?

SystemB

SystemA

Page 30: The Role of Data in IS Research

Click to edit Master title styleEvaluation Campaigns

TRECCLEF

FIRE

NTCIR

Common dataset Pre-defined tasks Ground truth Evaluation protocol Evaluation metrics

Page 31: The Role of Data in IS Research

Click to edit Master title styleFocus on different domains

Microblogging

Ad-hoc and Web Search

Multimedia

Federated Web Search

XML Retrieval

Information Access in the Legal Domain

Document Similarity

Page 32: The Role of Data in IS Research

Click to edit Master title styleExample projects

Page 33: The Role of Data in IS Research

Click to edit Master title styleCLEF InitativeSo

urc

e: h

ttp

://w

ww

.isic

al.a

c.in

/~fi

re/2

01

3/s

lide

s/o

the

r_cl

ef_f

ire1

3.p

df

Page 34: The Role of Data in IS Research

Click to edit Master title styleCLEF Tracks

Source: http://www.clef-initiative.eu/track/series

eHealth

ImageCLEF

LifeCLEF

Living Labs for IR (LL4IR)

News Recommendation Evaluation Lab (NEWREEL)

Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)

Social Book Search (SBS)

CL

EF

’16

Page 35: The Role of Data in IS Research

Click to edit Master title style

In CLEF NewsREEL, participants can develop stream-based news

recommendation algorithms and have them benchmarked (a) online by

millions of users over the period of a few months in a living lab, and (b) offline

by simulating a live stream.

NEWSREEL

F. Hopfgartner, T. Brodt, J. Seiler, B. Kille, A. Lommatzsch, M. Larson, R. Turrin, A. Sereny

“Benchmarking News Recommendations: The CLEF NewsREEL Use Case,” in SIGIR Forum, 49(2):129-136, 2015

Page 36: The Role of Data in IS Research

Click to edit Master title styleExample: News Articles

Source (Image): T. Brodt of plista.com

Page 37: The Role of Data in IS Research

Click to edit Master title style

Profit = Clicks on recommendations

Benchmarking metric: Click-Through-

Rate

Request

article

Request

article

Request

recommendation

Request

recommendation

Page 38: The Role of Data in IS Research

Click to edit Master title styleDataset

• Traffic and content

updates of nine German-

language news content

provider websites

• Traffic: Reading article,

clicking on

recommendations

• Updates: adding and

updating news articles

B. Kille, F. Hopfgartner, T. Brodt, T. Heintz

“The plista Dataset” in Proc. NRS'13: International Workshop and Challenge on News Recommender Systems, Hong Kong, China, pp. 16-23, 2013.

Page 39: The Role of Data in IS Research

Click to edit Master title styleEvaluation using offline

dataset

Idomaar

request

articlessimulate

stream

Page 40: The Role of Data in IS Research

Click to edit Master title styleExample results

B. Kille, A. Lommatzsch, R. Turrin, A. Sereny, M. Larson, T. Brodt, J. Seiler, F. Hopfgartner

“Overview of CLEF NewsREEL 2015: News Recommendation Evaluation Lab,” in Working Notes of CLEF 2015, Toulouse, France, 2015.

Page 41: The Role of Data in IS Research

Click to edit Master title styleExample projects

Page 42: The Role of Data in IS Research

Click to edit Master title styleNTCIRS

ourc

e: H

ideo

Jo

ho

Page 43: The Role of Data in IS Research

Click to edit Master title styleNTCIR-12 TasksN

TC

IR-1

2

Second round:

Search-Intent Mining

Mobile Click

Temporal Information Access

Spoken Query & Spoken Document Retrieval

QA Lab for Entrance Exam

First round:

Medical NLP for Clinical Documents

Personal Lifelog Access & Retrieval

Short Text Conversation

Page 44: The Role of Data in IS Research

Click to edit Master title style

Encourage research advances in organising and retrieving from lifelog data.

LifeLog @ NTCIR-12

Page 45: The Role of Data in IS Research

Click to edit Master title styleWhat is The Quantified Self?

The Quantified Self is about obtaining self-knowledge through

self-tracking.

Page 46: The Role of Data in IS Research

Click to edit Master title styleWhat is The Quantified Self?

Self-tracking is also referred to as lifelogging, self-analysis,

or self-hacking.

Page 47: The Role of Data in IS Research

Click to edit Master title styleExample: Visual Lifelogging

Page 48: The Role of Data in IS Research

Click to edit Master title styleVisual Lifelog of a day

2,000 pictures a day

Slide: Cathal Gurrin

Page 49: The Role of Data in IS Research

Click to edit Master title styleLifelogging Challenges

The challenges are how to sense the person, their actions, their life and make it accessible using appropriate interfaces, search, recommendation engines and visual/aural feedback. Further, exploiting the lifelog to identify context for adaptive information services.

Source (Graphic): DAI-Labor, Berlin

Page 50: The Role of Data in IS Research

Click to edit Master title styleMultimodal dataset with

information needs

Created by three individuals over

10+ days

TE

ST

CO

LL

EC

TIO

N

18.18GB 88,124 images Accompanying output of 1,000

concepts (825MB) Data processed pre-release

(removal of personal content; face blurring, translation of concepts)

Detailed user queries andjudgments generated by the lifelogging data gatherers

C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, R. Albatal

“NTCIR-Lifelog: The First Test Collection for Lifelog Research”, in Proc. SIGIR'16: ACM International Conference on Information Retrieval, Pisa, Italy, to appear.

Page 51: The Role of Data in IS Research

Click to edit Master title style

Evaluate different methods of

retrieval and access.

TasksT1

: LI

FELO

G S

EMA

NTI

C A

CC

ESS

(LSA

T)

T2:

LIFE

LOG

IN

SIG

HT

Models the retrieval need from lifelogs (Known-Item Search)

Retrieve N segments that match information need

Interactive or Automatic participation

Interactive: Time limit for fair and comparative evaluation in an interactive system with users

Automatic: Fully-automatic retrieval system. Automated query processing

Models the need for reflection over lifelog data

Exploratory task, the aim is to:

encourage broad participation

novel methods to visualise and explore lifelogs

Same data as LSAT task

Presented via demo/poster.

Page 52: The Role of Data in IS Research

Click to edit Master title styleTask 1: Lifelog Semantic

Access

Find the moment(s) where I

use my coffee machine.

Find the moment(s) where I am in the kitchen

Find the moment(s) where I

am playing with my phone.

Find the moment(s) where I

am preparing breakfast.

Page 53: The Role of Data in IS Research

Click to edit Master title styleTask 2: Lifelog Insight Task

Provide insights on the time I spend taking breakfast.

Provide insights on the time I spend driving to work.

Provide insights on the time I spend reading a paper.

Provide insights on the time I spend working on the

computer.

Page 54: The Role of Data in IS Research

Click to edit Master title styleFinal thoughts

• Data plays an essential role in scientific research since it is

used to prove or disprove a hypothesis

• You are now familiar with various sources where you can

get datasets that might be useful for your own research

• When selecting data, question its credibility, e.g., is it

biased? Can it be used to support your hypotheses?

• Consider accessibility of the data you want to analyse. Are

you allowed to use it? Can others (e.g., other

researchers?) access the data?