what we can learn about virtual scholars from usage data obtained from deep log analysis

27
What we can learn about virtual scholars from usage data obtained from deep log analysis Professor David Nicholas, Dr Tom Dobrowolski and Paul Huntington CIBER, University College London http://www.ucl.ac.uk/ciber/

Upload: ivan

Post on 28-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

What we can learn about virtual scholars from usage data obtained from deep log analysis. Professor David Nicholas, Dr Tom Dobrowolski and Paul Huntington CIBER, University College London http://www.ucl.ac.uk/ciber/. Structure of talk. Why we are studying the virtual scholar - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What we can learn about virtual scholars from usage data obtained from deep log analysis

What we can learn about virtual scholars from usage data obtained

from deep loganalysis

Professor David Nicholas, Dr Tom Dobrowolski and Paul Huntington

CIBER, University College London

http://www.ucl.ac.uk/ciber/

Page 2: What we can learn about virtual scholars from usage data obtained from deep log analysis

Structure of talk

• Why we are studying the virtual scholar

• The techniques we use (DLA)

• Research projects and analyses undertaken

• What we have discovered

• Implications of our research

Page 3: What we can learn about virtual scholars from usage data obtained from deep log analysis

The problem: everything has changed and got really big

• From control to no-control, from mediated to non mediated

• From bibliographic systems to full-text, visual, interactive ones

• From niche to universal systems• From a few searchers to everybody• From little choice to massive choice • From little change to constant change

Page 4: What we can learn about virtual scholars from usage data obtained from deep log analysis

Which can mean – paradigm shift, no grip, floundering

• Existing knowledge base obsolescent, flawed, wholly inadequate

• And there are huge issues to deal with – OA, IR, Big Deals

• We don’t even know what questions to ask anymore

• We are left generalising about too many people• Should be spending lots of time and money

researching the user…but are not

Page 5: What we can learn about virtual scholars from usage data obtained from deep log analysis

Mechanisms needed to provide grip and understanding – deep log analysis (DLA)

• Digital fingerprints/CCTV – refine and relate– Proprietary software too limiting, misleading and

report structure insufficiently focused on your needs– With DLA raw logs are edited/parsed and directly

imported into SPSS and usage (and search) data are analysed according to (bespoke) need

– Log data then related to demographic datasets – generated by subscriber/user databases or questionnaires and then triangulated with focus group/observation etc data

Page 6: What we can learn about virtual scholars from usage data obtained from deep log analysis

Deep log analysis: attractions

• Size and reach. Enormous data set; no need to take a sample

• Direct & immediately available record of what people have done: not what they say they might, or would, do; not what they were prompted to say, not what they thought they did

• Data are unfiltered and provide a reality check sometimes missing from questionnaire and focus group

• Data real-time and continuous. Creates a digital lab environment for innovation and the monitoring of change

• Raises the questions that need to be asked by questionnaire, focus group and interview

Page 7: What we can learn about virtual scholars from usage data obtained from deep log analysis

CIBER deep log studies

1. Maximising library investments in digital collections through better data gathering and analyses (MaxData): OhioLINK study. Institute of Museum and Library Studies, 2005-2007

2. Virtual Scholar research programme – use and impact of digital libraries in academe. Blackwell/Emerald, 2003-2004.

3. Characterising open access journal users and establishing their information seeking behaviour using deep log analysis: case study OUP Open. OUP, 2005-2006

4. Physics journals: a deep log analysis of IoPP journals. Institute of Physics, 2005-2006

5. Core scholarly research trends study: deep log analysis of Elsevier ScienceDirect users. Elsevier, 2005.

6. Digital journals – site licensing, library consortia deals and journal use statistics. The Ingenta Institute, 2002.

Page 8: What we can learn about virtual scholars from usage data obtained from deep log analysis

Kinds of analysis conducted

Use analysis• By number of items viewed, number of sessions

conducted, site penetration, repeat visits, time online, kind of items viewed, pattern of item use (TOC, abstract, full-text)

User analysis• By age, gender, occupation (student, practitioner)

organisational affiliation, heavy/light, referral link used, type of university (research/teaching), subject/discipline of journal, subject discipline of the user, department of the subnet, search approach adopted, geographical location; whether purchased online or not; use of additional functions

Page 9: What we can learn about virtual scholars from usage data obtained from deep log analysis

What have we learnt

• We have never had such a large data set of usage data.

• From the digital fingerprints of millions of users and tens of millions of transactions from a wide range of digital journal platforms we have drawn some interesting and controversial conclusions about the behaviour of the virtual scholar

• I don’t recognise the users you are describing.

Page 10: What we can learn about virtual scholars from usage data obtained from deep log analysis

Information seeking characteristic 1

Phenomenally active and interested

• In case of Blackwell Synergy, about half a million people used the site a month; nearly 5 million items viewed during the same period

• In case of OhioLINK 6000 journals available and all bar about 5 not used within a month

• Two-thirds of EmeraldInsight visitors non-subscribers

Page 11: What we can learn about virtual scholars from usage data obtained from deep log analysis

Information seeking characteristic 2

• Shallow searchers, suggesting a checking-comparing, dipping sort of behaviour that is a result of easy access, a shortage of time and huge digital choice Flicking

• Over two thirds typically view no more than three items in a session and then leave; Scientists view less (66% view no more than three items) and Humanities scholars more (56%); overall just 10% view more than ten items

• Differences in what they view when online

Page 12: What we can learn about virtual scholars from usage data obtained from deep log analysis

A digital consumer trait…scholarly journal users

Type of user/session

Number of items viewed

Emerald Insight

(Jan-Dec 2002)

Blackwell Synergy

(February 2004)

Bouncer/checker 1 to 370

67

Moderately engaged

4 to 1020

26

Engaged 11 to 20 6 5

Seriously engaged

Over 214

2

Total 100 100

Page 13: What we can learn about virtual scholars from usage data obtained from deep log analysis

ProfessionalAcademicMedicalScience

100

90

80

70

60

50

40

30

20

10

0

PDF article

HTML article

Abstract

ToC

List of Issues

25182930

161411

24

18

2926

35

62

28

33

Type of item requested by subject category of article (Synergy)

Page 14: What we can learn about virtual scholars from usage data obtained from deep log analysis

Information seeking characteristic 3

• Unpredictable form of behaviour in which there appears to be little user loyalty, repeat behaviour or use of memory

• Within a year it appeared that two-thirds of people did not come back

• Some more likely to return….

Page 15: What we can learn about virtual scholars from usage data obtained from deep log analysis

Some more likely to return (Synergy)

ProfessionalAcademicMedicalScience

100

90

80

70

60

50

40

30

20

10

0

No. of visits

Over 15

6 to 15

2 to 5

Once

454

1081014

323133

40

555652

43

Page 16: What we can learn about virtual scholars from usage data obtained from deep log analysis

Information seeking characteristic 4

• Search a variety of sites to find what they want…together with characteristic 2 this makes them ‘promiscuous’ in information seeking terms

• Younger scholars more promiscuous

Page 17: What we can learn about virtual scholars from usage data obtained from deep log analysis

Information seeking characteristic 5

• A bouncing, checking, promiscuous and consumer form of behaviour creates enormous volatility and unpredictability

• Digital visibility, sales mentality• “I may read books, surf, ask, watch telly even -

the answer could come from anywhere”

Page 18: What we can learn about virtual scholars from usage data obtained from deep log analysis

Volatility (EmeraldInsight)

Movement % change

Increase

More than 100% 11

+ 75-99% 2

+ 50-74% 3

+ 25-49% 16

+ 1-24% 29

Decrease

- 1-24% 24

- 25-49% 10

- 50-74% 3

- 75-99% 1

Average 17

Page 19: What we can learn about virtual scholars from usage data obtained from deep log analysis

Sales mentality (EmeraldInsight)

29.06.200222.06.200215.06.200208.06.200201.06.2002

14000

12000

10000

8000

6000

4000

2000

0

EmployeeRelations

Int Jrnl of Public

Sector Management

Page 20: What we can learn about virtual scholars from usage data obtained from deep log analysis

Information seeking characteristic 6

• Increased visibility leads to increased exposure and use of older scientific material

• History downloads to material older than 5 years old (54%) – same for language and literature ; Materials Science (59%) Physiology (64%)

Page 21: What we can learn about virtual scholars from usage data obtained from deep log analysis

Information seeking characteristic 7-9

• Untrusting: trust up for grabs, authority to be won (and checked). Brand problems - Tesco

• Seemingly ‘lazy’ and easily lead in retrieval terms - determined by digital visibility, promotion, search engines and poorly thought through search expressions

• Search approach/form of navigation taken has an enormous impact on what is seen/used. People using the search engine were: far more likely to conduct a session that included a view to an old article; more likely to view more subjects, more journals, and also viewed more articles and abstracts too.

Page 22: What we can learn about virtual scholars from usage data obtained from deep log analysis

Conclusions and implications

• Choice and a common and multi-function retrieval platform is changing us all, making us all a little bit more similar and should question strongly our assumptions about the scholar

• We are not good at using the evidence…digital concrete and digital fog…so big questions here for our funders, libraries etc

• We need to get closer to the user but we are moving further apart and data enables us to get closer

• Evaluation is actually part of a system and not separate from it

Page 23: What we can learn about virtual scholars from usage data obtained from deep log analysis

References

• Nicholas, D., Huntington, P. and Watkinson, A. Scholarly journal usage: the results of deep log analysis. Journal of Documentation, 61(2), 2005, 246-280.

• Nicholas, D., Huntington, P., Dobrowolski, T., Rowlands, I., Jamali, H. R. & Polydoratou, P. Revisiting ‘obsolescence’ and journal article ‘decay’ through usage data: an analysis of digital journal use by year of publication, Information processing and Management, 41(6), 2005, 1441-1461.

• Nicholas D, Huntington P, Monopoli M and Watkinson A. ‘Engaging with scholarly digital libraries (publisher platforms): the extent to which ‘added-value’ functions are used.’ Information Processing & Management. 42(2), 2005, pp??

• Nicholas D, Huntington P, Williams P and Dobrowolski T. ‘The Digital Information Consumer in New directions in human information behaviour.’ Edited by A Spink and C Cole. Kluwer Academic, 2005

• Nicholas D, Huntington P, Russell B, Watkinson A, Hamid R. Jamali, Tenopir, C. The big deal: ten years on. Leaned Information 18(4) October, 2005, pp??

Page 24: What we can learn about virtual scholars from usage data obtained from deep log analysis

Sample analyses

Page 25: What we can learn about virtual scholars from usage data obtained from deep log analysis

Alpha

Subject

Search

Alpa/subj & search

Alpha & subject

100806040200

No. of journals

10 or more

4 to 9

2 to 3

One

20

16

30

24

11

33

30

34

30

31

45

20

33

56

57

Number of different journals viewed by access method (OhioLINK)

Page 26: What we can learn about virtual scholars from usage data obtained from deep log analysis

10.4%

31.5%

18.9%

3.9%

3.4%

17.2%

5.9%

5.6%

3.2%

Other

Sociology and Social

Psychology and Psych

Politics Political S

Medical Specialties

Management and Econo

History

General Social Scien

Education

Journal Subject categories viewed by Sociology (OhioLINK)

Page 27: What we can learn about virtual scholars from usage data obtained from deep log analysis

Subject of journal by date of material viewed (OhioLINK)

General Physics andGeneral Social Scien

GeneticsGeography

GeologyGeometry

HistoryHumanities

Internal MedicineLanguage and Literat

Library and InformatManagement and Econo

Materials ScienceMechanical Engineeri

Medical SpecialtiesMeteorology ClimatolMicrobiology and ImmNeurology and Neurop

Nuclear and ParticleOrganic Chemistry

Physical and TheoretPhysiology and Anato

Politics Political SProbability and Stat

Psychology and PsychPublic Health

Sociology and SocialSurgery

Therapeutics and PhaZoology

100806040200

1988 to 1994

1995 to 1999

2000 to 2003

2004

12

11

1120

1322

2121

1520

182117

141813

301828192820

1214

2315

31242211

1720

4737

4440

3935

3540

4642

447144

4242

4436

4041

4651

4247

4637

3740

3358

40

4140

4432

4032

5037363739

1138

4527

3837

40303333

4429

3631

3838

5522

28