making sense of users' web activities

29
Making sense of Users’ Web activities Mathieu d'Aquin Knowledge Media Institute, The Open University, UK

Upload: mathieu-daquin

Post on 24-May-2015

2.376 views

Category:

Technology


3 download

DESCRIPTION

Keynote at the Personal Semantic Data (PSD) workshop, collocated with EKAW 2010

TRANSCRIPT

Page 1: Making sense of users' Web activities

Making sense of Users’ Web activities

Mathieu d'AquinKnowledge Media Institute, The Open University, UK

Page 2: Making sense of users' Web activities

A bit of sci-fi to start with

“… from people who are afraid that someone else knows information that they don’t and is gaining an unfair advantage by it. For all the claims one hears about the liberating impact of the data-net, the truth is that it whished on most of us a brand-new reason for paranoia”

John Brunner,

The Shockwave Rider, 1975

Page 3: Making sense of users' Web activities

What we don’t know that they know

Simple important things:

And more complex important things…

What are all the websites that know my e-mail address?

What does amazon.co.uk or the website of my favorite airline know

about me?

Page 4: Making sense of users' Web activities

Is this Personal Information Management?

• Yes, but…• Looking at individual user’s information

exchange and more generally activities on the Web

• This is :– Big– Heterogeneous– Distributed– Fragmented– Sometimes implicit

• And hard to collect!

Page 5: Making sense of users' Web activities

So, what do we do?

Unrestricted monitoring of information exchange on the Web by an individual

user

Page 6: Making sense of users' Web activities

Loca

l Web

Age

nts

(e.g

., br

owse

r)

Local LoggingProxy

HTTP Requests

HTTP Responses

HTTP Requests

HTTP Responses Exte

rnal

Web

Site

s

Web Exchange RDF Logs

Page 7: Making sense of users' Web activities

<REQUEST RDF:ABOUT="#REQUEST-1257949232709-1257949233757"> <STARTEDAT>1257949232709</STARTEDAT> <ENDEDAT>1257949233757</ENDEDAT> <ORIGIN RDF:RESOURCE="127.0.0.1" /> <ONPORT>80</ONPORT> <TOHOST RDF:RESOURCE="API.FACEBOOK.COM" /> <METHOD RDF:RESOURCE="POST"/> <TOURL RDF:RESOURCE="HTTP://API.FACEBOOK.COM/RESTSERVER.PHP" /> <HTTPVERSION RDF:RESOURCE="HTTP-1.1" /> <HOST RDF:RESOURCE="API.FACEBOOK.COM" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--X-WWW-FORM-URLENCODED" /> <USER-AGENT RDF:RESOURCE="MOZILLA--5.0_(MACINTOSH;_U;_INTEL_MAC_OS_X;_EN)_APPLEWEBKIT--526.9+_(KHTML._LIKE_GECKO)_ADOBEAIR--1.5.2" /> <REFERER RDF:RESOURCE="APP:--TWEETDECK.SWF" /> <X-FLASH-VERSION RDF:RESOURCE="10.0.32.18" /> <ACCEPT RDF:RESOURCE="*--*" /> <ACCEPT-LANGUAGE RDF:RESOURCE="EN-US" /> <ACCEPT-ENCODING RDF:RESOURCE="GZIP._DEFLATE" /> <COOKIE RDF:RESOURCE= "__QCA=1239783354-42963995-12118014;___UTMA=87286159.357565716.1239892196.1252686326.1257582307.16;___UTMZ=87286159.1257582307.16.16.UTMCCN= (REFERRAL)|UTMCSR=FACEBOOK.COM|UTMCCT=--TOS.PHP|UTMCMD=REFERRAL;_C_USER=605559235;_CUR_MAX_LAG=2;_DATR=1239398136-0711BF1215821A9C58848BF0FFD0020EC8450CFA7154B9E228C29;_LSD=P3ZPN;_LXE=METM.DAQUIN%40VIRGIN.NET;_LXS=3;_S_VSN_FACEBOOKPOC_1=9874874320812" /> <CONTENT-LENGTH RDF:RESOURCE="984" /> <CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_C22B691F691DABD5AE893B9CB2F8ADD7" /> <RESPONSE> <RESPONSE RDF:ABOUT="#RESPONSE-1257949232709--1257949233757"> <HTTPVERSION RDF:RESOURCE="HTTP--1.0" /> <RESPONSECODE RDF:RESOURCE="200_OK" /> <CACHE-CONTROL RDF:RESOURCE="PRIVATE._NO-STORE._NO-CACHE._MUST-REVALIDATE._POST-CHECK=0._PRE-CHECK=0" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--JSON" /> <EXPIRES RDF:RESOURCE="MON._26_JUL_1997_05:00:00_GMT" /> <PRAGMA RDF:RESOURCE="NO-CACHE" /> <CONTENT-ENCODING RDF:RESOURCE="GZIP" /> <CONTENT-LENGTH RDF:RESOURCE="5943" /> <X-CACHE RDF:RESOURCE="MISS_FROM_ROEBURN.OPEN.AC.UK" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_5CCF6054FD0FBA3EE7EB444E178EAF19" /> </RESPONSE></RESPONSE></REQUEST>

<REQUEST RDF:ABOUT="#REQUEST-1257949232709-1257949233757"> <STARTEDAT>1257949232709</STARTEDAT> <ENDEDAT>1257949233757</ENDEDAT> <ORIGIN RDF:RESOURCE="127.0.0.1" /> <ONPORT>80</ONPORT> <TOHOST RDF:RESOURCE="API.FACEBOOK.COM" /> <METHOD RDF:RESOURCE="POST"/> <TOURL RDF:RESOURCE="HTTP://API.FACEBOOK.COM/RESTSERVER.PHP" /> <HTTPVERSION RDF:RESOURCE="HTTP-1.1" /> <HOST RDF:RESOURCE="API.FACEBOOK.COM" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--X-WWW-FORM-URLENCODED" /> <USER-AGENT RDF:RESOURCE="MOZILLA--5.0_(MACINTOSH;_U;_INTEL_MAC_OS_X;_EN)_APPLEWEBKIT--526.9+_(KHTML._LIKE_GECKO)_ADOBEAIR--1.5.2" /> <REFERER RDF:RESOURCE="APP:--TWEETDECK.SWF" /> <X-FLASH-VERSION RDF:RESOURCE="10.0.32.18" /> <ACCEPT RDF:RESOURCE="*--*" /> <ACCEPT-LANGUAGE RDF:RESOURCE="EN-US" /> <ACCEPT-ENCODING RDF:RESOURCE="GZIP._DEFLATE" /> <COOKIE RDF:RESOURCE= "__QCA=1239783354-42963995-12118014;___UTMA=87286159.357565716.1239892196.1252686326.1257582307.16;___UTMZ=87286159.1257582307.16.16.UTMCCN= (REFERRAL)|UTMCSR=FACEBOOK.COM|UTMCCT=--TOS.PHP|UTMCMD=REFERRAL;_C_USER=605559235;_CUR_MAX_LAG=2;_DATR=1239398136-0711BF1215821A9C58848BF0FFD0020EC8450CFA7154B9E228C29;_LSD=P3ZPN;_LXE=METM.DAQUIN%40VIRGIN.NET;_LXS=3;_S_VSN_FACEBOOKPOC_1=9874874320812" /> <CONTENT-LENGTH RDF:RESOURCE="984" /> <CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_C22B691F691DABD5AE893B9CB2F8ADD7" /> <RESPONSE> <RESPONSE RDF:ABOUT="#RESPONSE-1257949232709--1257949233757"> <HTTPVERSION RDF:RESOURCE="HTTP--1.0" /> <RESPONSECODE RDF:RESOURCE="200_OK" /> <CACHE-CONTROL RDF:RESOURCE="PRIVATE._NO-STORE._NO-CACHE._MUST-REVALIDATE._POST-CHECK=0._PRE-CHECK=0" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--JSON" /> <EXPIRES RDF:RESOURCE="MON._26_JUL_1997_05:00:00_GMT" /> <PRAGMA RDF:RESOURCE="NO-CACHE" /> <CONTENT-ENCODING RDF:RESOURCE="GZIP" /> <CONTENT-LENGTH RDF:RESOURCE="5943" /> <X-CACHE RDF:RESOURCE="MISS_FROM_ROEBURN.OPEN.AC.UK" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_5CCF6054FD0FBA3EE7EB444E178EAF19" /> </RESPONSE></RESPONSE></REQUEST>

<REQUEST RDF:ABOUT="#REQUEST-1257949232709-1257949233757"> <STARTEDAT>1257949232709</STARTEDAT> <ENDEDAT>1257949233757</ENDEDAT> <ORIGIN RDF:RESOURCE="127.0.0.1" /> <ONPORT>80</ONPORT> <TOHOST RDF:RESOURCE="API.FACEBOOK.COM" /> <METHOD RDF:RESOURCE="POST"/> <TOURL RDF:RESOURCE="HTTP://API.FACEBOOK.COM/RESTSERVER.PHP" /> <HTTPVERSION RDF:RESOURCE="HTTP-1.1" /> <HOST RDF:RESOURCE="API.FACEBOOK.COM" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--X-WWW-FORM-URLENCODED" /> <USER-AGENT RDF:RESOURCE="MOZILLA--5.0_(MACINTOSH;_U;_INTEL_MAC_OS_X;_EN)_APPLEWEBKIT--526.9+_(KHTML._LIKE_GECKO)_ADOBEAIR--1.5.2" /> <REFERER RDF:RESOURCE="APP:--TWEETDECK.SWF" /> <X-FLASH-VERSION RDF:RESOURCE="10.0.32.18" /> <ACCEPT RDF:RESOURCE="*--*" /> <ACCEPT-LANGUAGE RDF:RESOURCE="EN-US" /> <ACCEPT-ENCODING RDF:RESOURCE="GZIP._DEFLATE" /> <COOKIE RDF:RESOURCE= "__QCA=1239783354-42963995-12118014;___UTMA=87286159.357565716.1239892196.1252686326.1257582307.16;___UTMZ=87286159.1257582307.16.16.UTMCCN= (REFERRAL)|UTMCSR=FACEBOOK.COM|UTMCCT=--TOS.PHP|UTMCMD=REFERRAL;_C_USER=605559235;_CUR_MAX_LAG=2;_DATR=1239398136-0711BF1215821A9C58848BF0FFD0020EC8450CFA7154B9E228C29;_LSD=P3ZPN;_LXE=METM.DAQUIN%40VIRGIN.NET;_LXS=3;_S_VSN_FACEBOOKPOC_1=9874874320812" /> <CONTENT-LENGTH RDF:RESOURCE="984" /> <CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_C22B691F691DABD5AE893B9CB2F8ADD7" /> <RESPONSE> <RESPONSE RDF:ABOUT="#RESPONSE-1257949232709--1257949233757"> <HTTPVERSION RDF:RESOURCE="HTTP--1.0" /> <RESPONSECODE RDF:RESOURCE="200_OK" /> <CACHE-CONTROL RDF:RESOURCE="PRIVATE._NO-STORE._NO-CACHE._MUST-REVALIDATE._POST-CHECK=0._PRE-CHECK=0" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--JSON" /> <EXPIRES RDF:RESOURCE="MON._26_JUL_1997_05:00:00_GMT" /> <PRAGMA RDF:RESOURCE="NO-CACHE" /> <CONTENT-ENCODING RDF:RESOURCE="GZIP" /> <CONTENT-LENGTH RDF:RESOURCE="5943" /> <X-CACHE RDF:RESOURCE="MISS_FROM_ROEBURN.OPEN.AC.UK" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_5CCF6054FD0FBA3EE7EB444E178EAF19" /> </RESPONSE></RESPONSE></REQUEST>

<REQUEST RDF:ABOUT="#REQUEST-1257949232709-1257949233757"> <STARTEDAT>1257949232709</STARTEDAT> <ENDEDAT>1257949233757</ENDEDAT> <ORIGIN RDF:RESOURCE="127.0.0.1" /> <ONPORT>80</ONPORT> <TOHOST RDF:RESOURCE="API.FACEBOOK.COM" /> <METHOD RDF:RESOURCE="POST"/> <TOURL RDF:RESOURCE="HTTP://API.FACEBOOK.COM/RESTSERVER.PHP" /> <HTTPVERSION RDF:RESOURCE="HTTP-1.1" /> <HOST RDF:RESOURCE="API.FACEBOOK.COM" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--X-WWW-FORM-URLENCODED" /> <USER-AGENT RDF:RESOURCE="MOZILLA--5.0_(MACINTOSH;_U;_INTEL_MAC_OS_X;_EN)_APPLEWEBKIT--526.9+_(KHTML._LIKE_GECKO)_ADOBEAIR--1.5.2" /> <REFERER RDF:RESOURCE="APP:--TWEETDECK.SWF" /> <X-FLASH-VERSION RDF:RESOURCE="10.0.32.18" /> <ACCEPT RDF:RESOURCE="*--*" /> <ACCEPT-LANGUAGE RDF:RESOURCE="EN-US" /> <ACCEPT-ENCODING RDF:RESOURCE="GZIP._DEFLATE" /> <COOKIE RDF:RESOURCE= "__QCA=1239783354-42963995-12118014;___UTMA=87286159.357565716.1239892196.1252686326.1257582307.16;___UTMZ=87286159.1257582307.16.16.UTMCCN= (REFERRAL)|UTMCSR=FACEBOOK.COM|UTMCCT=--TOS.PHP|UTMCMD=REFERRAL;_C_USER=605559235;_CUR_MAX_LAG=2;_DATR=1239398136-0711BF1215821A9C58848BF0FFD0020EC8450CFA7154B9E228C29;_LSD=P3ZPN;_LXE=METM.DAQUIN%40VIRGIN.NET;_LXS=3;_S_VSN_FACEBOOKPOC_1=9874874320812" /> <CONTENT-LENGTH RDF:RESOURCE="984" /> <CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_C22B691F691DABD5AE893B9CB2F8ADD7" /> <RESPONSE> <RESPONSE RDF:ABOUT="#RESPONSE-1257949232709--1257949233757"> <HTTPVERSION RDF:RESOURCE="HTTP--1.0" /> <RESPONSECODE RDF:RESOURCE="200_OK" /> <CACHE-CONTROL RDF:RESOURCE="PRIVATE._NO-STORE._NO-CACHE._MUST-REVALIDATE._POST-CHECK=0._PRE-CHECK=0" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--JSON" /> <EXPIRES RDF:RESOURCE="MON._26_JUL_1997_05:00:00_GMT" /> <PRAGMA RDF:RESOURCE="NO-CACHE" /> <CONTENT-ENCODING RDF:RESOURCE="GZIP" /> <CONTENT-LENGTH RDF:RESOURCE="5943" /> <X-CACHE RDF:RESOURCE="MISS_FROM_ROEBURN.OPEN.AC.UK" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_5CCF6054FD0FBA3EE7EB444E178EAF19" /> </RESPONSE></RESPONSE></REQUEST>

<REQUEST RDF:ABOUT="#REQUEST-1257949232709-1257949233757"> <STARTEDAT>1257949232709</STARTEDAT> <ENDEDAT>1257949233757</ENDEDAT> <ORIGIN RDF:RESOURCE="127.0.0.1" /> <ONPORT>80</ONPORT> <TOHOST RDF:RESOURCE="API.FACEBOOK.COM" /> <METHOD RDF:RESOURCE="POST"/> <TOURL RDF:RESOURCE="HTTP://API.FACEBOOK.COM/RESTSERVER.PHP" /> <HTTPVERSION RDF:RESOURCE="HTTP-1.1" /> <HOST RDF:RESOURCE="API.FACEBOOK.COM" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--X-WWW-FORM-URLENCODED" /> <USER-AGENT RDF:RESOURCE="MOZILLA--5.0_(MACINTOSH;_U;_INTEL_MAC_OS_X;_EN)_APPLEWEBKIT--526.9+_(KHTML._LIKE_GECKO)_ADOBEAIR--1.5.2" /> <REFERER RDF:RESOURCE="APP:--TWEETDECK.SWF" /> <X-FLASH-VERSION RDF:RESOURCE="10.0.32.18" /> <ACCEPT RDF:RESOURCE="*--*" /> <ACCEPT-LANGUAGE RDF:RESOURCE="EN-US" /> <ACCEPT-ENCODING RDF:RESOURCE="GZIP._DEFLATE" /> <COOKIE RDF:RESOURCE= "__QCA=1239783354-42963995-12118014;___UTMA=87286159.357565716.1239892196.1252686326.1257582307.16;___UTMZ=87286159.1257582307.16.16.UTMCCN= (REFERRAL)|UTMCSR=FACEBOOK.COM|UTMCCT=--TOS.PHP|UTMCMD=REFERRAL;_C_USER=605559235;_CUR_MAX_LAG=2;_DATR=1239398136-0711BF1215821A9C58848BF0FFD0020EC8450CFA7154B9E228C29;_LSD=P3ZPN;_LXE=METM.DAQUIN%40VIRGIN.NET;_LXS=3;_S_VSN_FACEBOOKPOC_1=9874874320812" /> <CONTENT-LENGTH RDF:RESOURCE="984" /> <CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_C22B691F691DABD5AE893B9CB2F8ADD7" /> <RESPONSE> <RESPONSE RDF:ABOUT="#RESPONSE-1257949232709--1257949233757"> <HTTPVERSION RDF:RESOURCE="HTTP--1.0" /> <RESPONSECODE RDF:RESOURCE="200_OK" /> <CACHE-CONTROL RDF:RESOURCE="PRIVATE._NO-STORE._NO-CACHE._MUST-REVALIDATE._POST-CHECK=0._PRE-CHECK=0" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--JSON" /> <EXPIRES RDF:RESOURCE="MON._26_JUL_1997_05:00:00_GMT" /> <PRAGMA RDF:RESOURCE="NO-CACHE" /> <CONTENT-ENCODING RDF:RESOURCE="GZIP" /> <CONTENT-LENGTH RDF:RESOURCE="5943" /> <X-CACHE RDF:RESOURCE="MISS_FROM_ROEBURN.OPEN.AC.UK" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_5CCF6054FD0FBA3EE7EB444E178EAF19" /> </RESPONSE></RESPONSE></REQUEST>

2.5 months = 3 Million HTTP Requests100 Million RDF Triples

Page 8: Making sense of users' Web activities

What this talk is about

Using ontologies and external datasets to – Generate abstractions of this low level data– Enrich it with external knowledge and models– Interpret to give back useful information to the

user

Page 9: Making sense of users' Web activities

HTTP Ontology

Web Site Information

Location Information

Online Activities Ontology

Parameters and Website

info.

Personal Information

Trust Model

Page 10: Making sense of users' Web activities

HTTP Ontology

• Built bottom-up from the data

• Can help inferring simple things from it

• And answer questions through SPARQL queries

Request time: DateTime toURL: URL referer: URL

Response time: DateTime responseCode: int

InternetPoint time: DateTime

WebHost domain: String

WebAgent ID: String

DataFile ID: String

DataFormat MineID: String

hasResponse

origine

toHost

User-Agent

Content

Content

Content-Type

Content-Type

Page 11: Making sense of users' Web activities

Simple examples

Requests per User Agents

Requests per time of day

Requests per Host

Page 12: Making sense of users' Web activities

Integrating basic info

Domain name

IP

Location

“What!? What requests have I made to websites in Nigeria? What Data did I send?”Can be answered in a SPARQL query

Page 13: Making sense of users' Web activities

More information about websites

• The linked data cloud is full of it.• Using the domain name to address this

information.CONSTRUCT {<domain_name> ?p ?y}WHERE {{{?x dbpedia:homepage <http://domain_name>}.

{?x ?p ?y}}UNION {{?x owl:sameAs ?z}.

{?x dbpedia:homepage <http://domain_name>}.

{?x ?p ?y}}}

Page 14: Making sense of users' Web activities

Examples

www.youtube.com

Google Services

Entertainment Websites

Video Hosting

google

subsediaryOf

Company

type

Video sharing

subject/category

parent

www.google-analytics.com

Web Analytics

developer

subject/category

www.google.com

owner

Internet Search Engine

Search Engine

Web Search Engine

DBpedia freebase

Page 15: Making sense of users' Web activities

Activities

• Can we now understand the user activities?• Based on website categories and on their parameters:

GET http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2F

POST format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v =1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM %20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1 %29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Ftime%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20message%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%20type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20FROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27newsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28created%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=12565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9e1a5ec6c5768%2D605559235

Page 16: Making sense of users' Web activities

Activities in an Ontology

• Derived in a bottom-up way from categories of activities/request

• Can be used to characterize overall activities, individual activities or correlations between activities

ActivityBasedRequest

ExplicitActivity ImplicitActivity

ReportToAnalytics

CheckStatusFeed

AutoCheckStatusFeed

ManualCheckStatusFeed

Search

SearchVideo

SearchImage

FollowLink

FollowSearchResult

Page 17: Making sense of users' Web activities

Example Activity: Search

Search keywords

Page 18: Making sense of users' Web activities

Example Activity: Search

inverseOf(linked-followed, referer)InformationalSearch = SearchRequest and min 2 link-followedNavigationalSearch = SearchRequest and =1 link-followed

Prominence of Navigational Searches

IndexedSite = exists referer NavigationalSearchIndexedSite(?x), NavigationalSearch(?y), referer(?x, ?y), searchTerm(?y, ?z) IndexedWithKeyword(?x, ?z)

Page 19: Making sense of users' Web activities

Example Activity: Search

Search Keywords

OpenCalais

Topics of interest

Page 20: Making sense of users' Web activities

Personal data exchange

Request Parameters

Personal Information (Profile)

Trust Model

Page 21: Making sense of users' Web activities

Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data

Page 22: Making sense of users' Web activities

User profile re-constructed from Web activities

• 36 attributes, 1,080 values, to 123 domains

• A model of what piece of personal information was sent where (can answer the questions)

Page 23: Making sense of users' Web activities

What that tells us about trust

Taking the point of view of an external observer, we can derive an observed model of trust and criticality of data– If this piece of data is critical to you and you

give it to bob, you must trust bob– If you give this piece of data to many

untrusted people, you probably don’t consider it critical

Page 24: Making sense of users' Web activities

Formally

• Trust in a domain =

max of criticality of data it received

• Criticality of a piece of data =

1 / 1 + Σ (1- trust in websites

that received the data)

• Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5

Page 25: Making sense of users' Web activities

Interacting with the model

Expose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior

Page 26: Making sense of users' Web activities

Demo

Page 27: Making sense of users' Web activities

Conclusion

• First set tools exploiting logs of personal Web activity

• Demonstrate the need for ways to abstract and interpreter activity data, to support Web Users

• Demonstrate the ability of semantic technologies, ontologies and the enrichment through external data, to provide such abilities

Page 28: Making sense of users' Web activities

So much more to do

Can I collect this tweet? From HTTPS? From my mobile phone?

Can I link it to where I am?

To what I’m doing? To what I have been doing?

To the abstract of the presentation? To the slides on SlideShare.net? To blogs mentioning it?

Can I cope with the scale of all this information? Can I decide what to share? Can I store all this securely? Can I get usable access to it? Can I learn something from it?

Page 29: Making sense of users' Web activities

Thank you

[email protected]

@mdaquin