dr. alexandra i. cristea acristea/ user modelling (um)
TRANSCRIPT
Dr. Alexandra I. Cristea
http://www.dcs.warwick.ac.uk/~acristea/
User Modelling (UM)
Reading Reminder
• Recommended Reading: User Modelling for the Social Semantic Web (Plumbaum et al, 2011); Ontological technologies for user modelling (Sosnovsky, Dicheva, 2010); Generic User Modelling (Kobsa, 2007); User Profiles (Gauch et al, 2007).
• Recommended Papers for reference: Challenges of SW for UM (Dolog, Nejdl, 2003).2
Overview: UM1. Introduction
2. History
3. Information collection
4. User Model Representation
5. User Model Construction
6. Issues
3
Overview: UM
1. Introduction
A. Definition
B. Motivation
C. Application fields
D. Classifications
4
Definition: What is user model (ling)?• “If a program can change its behaviour
based on something related to the user, then the program does (implicit or explicit) user modelling.”
• “A user model is a representation of the user in an information system, in the form of information which the system collects and maintains in order to improve the quality of information access” (Brusilovsky)
• Compare with Wikipedia def5
Core vs. Extended User Profile • Core profile
– info related to user’s search goals and interests • Extended profile
– info related to user as a person: demographic info, e.g., name, age, country
– education level – abilities – profession...
• Determined by the application needs 6
EG: http://webprotege.stanford.edu/#Edit:projectId=aec7af2a-e644-48b3-916b-f2dc7ef76bd8
7
Motivation: Why user modelling?• pertinent information
– What is pertinent to me may not be pertinent to you– information should flow within and between users– users should control the level of information-push
• large amounts of information– too much information, too little time– people often become aware of information when it is not
immediately relevant to their needs– Difficult to handle– Etc.
8
Application fields: What for?• Semantic Web• Web 2.0• Recommender Systems (e.g., commercial
systems; content-based recommenders)• Information retrieval • Information filtering• User Simulation (e.g. of user roles)• Intelligent Tutoring Systems• Expert Systems• Adaptive Hypermedia • TOTO ADAPT ADAPT TO THETO THE USER USER
9
Classification• According to the way User Information is collected:
– explicit, through user intervention – implicit, through agents that passively monitor user activities
• According to the life-period of the profile / model– Static profiles that maintain the same information over time. – Dynamic profiles that can be modified or augmented.
• Short-term profiles represent the user’ s current interests – Long-term profiles indicate interests that are not subject to
frequent changes over time • Model structure
– Keyword profiles – Semantic net profiles – Concept profiles 10
The Big Picture
Data Collection
Model Constructor
Technology or Application
User Personalised Services
User Info Model
Structure
11
A (classical) thought on UM:
IF Man cannot understand Man,
HOW can a machine built by Man understand Man?
12
Overview: UM1. Introduction
2. HistoryA. Early days
B. Early systems
C. User Modelling Shells
D. User Modelling Servers
13
Early days• Start: 1978, 79:
– Allen, Cohen & Perrault: Speech research for dialogue coherence
– Elaine Rich: Building & Exploiting User Models (PhD thesis)
• 10 year period of developments– UM performed by application system– No clear distinction between UM components &
other system tasks
• mid 80’s: Kobsa, Allgayer, etc. (see UMUAI)– Distinction appears– No reusability consideration
14
15
• GUMS (Finin & Drager, 1989; Kass
1991)– General User Modelling System – Stereotype hierarchies– Stereotype members + rules about them– Consistency verification
set framework for General UM systems
• Called UM shell systems (Kobsa)
Early systems
16
UM shell services (Kobsa ‘95)
17
UM shells Requirements• Generality
– As many services as possible– “Concessions”: student-adaptive tutoring systems
• Expressiveness– Able to express as many types of assumptions as
possible (about UU)
• Strong Inferential Capabilities– AI, formal logic (predicate l., modal reasoning,
reasoning w. uncertainty, conflict resolution)
18
Characteristics of UM Server
• Client-server architecture for the WEB !!!• Advantages:
– Central repository w. UU info for 1/more applic.• Info sharing between applications• Complementary info from client DB integrated easily
– Info stored non-redundant • Consistency & coherence check possible• Info on user groups maintained w. low redundancy
(stereotypes, a-priori or computed)
– Security, id, authentication, access control, encryption can be applied for protecting UM
19
UM server Services• Comparison of UU selective actions
– Amazon: “Customers who bought this book also bought: […]”
• Import of external UU info– ODBC(Open Database Connectivity)
interfaces, or support for a variety of DB
• Privacy support– Company privacy policies, industry, law
20
• Quick adaptation– Preferably, at first interaction, to attract customers
levels of adaptation, depending on data amount
• Extensibility– To add own methods, other tools API for UU info
exchange
• Load balancing– Reaction to increased load: e.g., CORBA based
components, distributed on the Web
• Failover strategies (in case of breakdown)• Transaction Consistency
– Avoidance of inconsistencies, abnormal termination
UM server Requirements
21
UM server trends
• More recommender systems than real UM• Based on network environments• Less sophisticated UM, other issues (such as
response time, privacy) are more important• Separation of tasks is essential, to give flexibility:
– Not only system functions separately from UM functions, but also
– UM functions separation: • domain modelling, knowledge, cognitive modelling, goals and
plans modelling, moods and emotion modelling, preferences modelling, and finally, interface related modelling
– In this way, the different levels of modelling can be added at different times, and by different people 22
Overview: UM1. Introduction
2. History
3. Information collectionA. Information type
B. Information sources
C. User Identification Method
D. User Information Collection Method
23
Information type:What can we adapt to?• U knowledge
• U Cognitive properties • (learning style, personality, etc.)
• U Tasks, Goals and Plans
• U Mood and Emotions
• U preferences
24
Adaptation to User Knowledge
• Conceptual knowledge – that can be explained by the system
• User option knowledge – about possible actions via an interface
• Problem solving knowledge– how knowledge can be applied to solve
particular problems
• Misconceptions – erroneous knowledge 25
26
What can we adapt to? User knowledge
• Cognitive properties • learning style, personality, etc.
• User goals and plans
• User mood and emotions
• User preferences
27
Kolb scale
converger (abstract, active)
diverger (concrete, reflective)
assimilator (abstract, reflective)
accomodator (concrete, active)
"How?"
"Why?" "What?"
"What if?"
active
abstract
reflective
concrete
Child,Budda,philosopher
Business person
Teacher,reviewer
Programmer
28
• Last time: UM definition, history, started information collection: knowledge, cognitive properties: cognitive styles
• Next: continue cognitive styles, other user info collection types, user info sources, user info collection methods
29
Van der Veer et al.
30
Other cognitive styles• Visual-verbal (verbalizer-imager; image-text)• Sequential-global (analytic-wholistic)• Active-reflective• Sensing-intuitive
• See (and try out) ILS questionnaire: • http://www.engr.ncsu.edu/learningstyles/
ilsweb.html
• Etc. 31
What can we adapt to?
User knowledge
Cognitive properties • (learning style, personality, etc.)
• User goals and plans
• User mood and emotions
• User preferences
32
User Goals and Plans• What is meant by this?
– A user goal is a situation that a user wants to achieve. – A plan is a sequence of actions or event that the user
expects will lead to the goal.
• System can: – Infer the user’s goal and suggest a plan – Evaluate the user’s plan and suggest a better one – Infer the user’s goal and automatically fulfil it (partially) – Select information or options to user goal(s) (shortcut
menus)
33
What can we adapt to?
User knowledge
Cognitive properties • (learning style, personality, etc.)
User goals and plans
• User mood and emotions
• User preferences
34
Moods and emotions?
• New, relatively unexplored area! • Unconscious level difficult to recognise, but
it is possible to look at type speed, error rates / facial expressions, sweat, heartbeat rate...
• Conscious level can be guessed from task fulfilment (e.g. failures)
• Emotions affect the user’s cognitive capabilities it can be important to affect the user’s emotions (e.g. reduce stress) 35
Emotional Modelling
Gratch, 5th Int. Conf. on Autonomous Agents, Montreal, Canada, 2001
We address how emotions arise from an evaluation of the relationship between environmental events &an agent’s plans and goals, as well as the impact of emotions on behaviour, in particular the impact
on the physical expressions of emotional state through suitable choice of gestures &
body language.36
Sample model of emotion assessment
Conati, AAAI, North Falmouth, Massachusetts 2001
37
What can we adapt to?
User knowledge
Cognitive properties • (learning style, personality, etc.)
User goals and plans
User mood and emotions
• User preferences
38
Adaptation to user preferences/ interests
• So far, the most successful type of adaptation. Preferences can in turn be related to knowledge / goals / cognitive traits, but one needs not care about that.
• Examples: – Firefly – www.amazon.com – Mail filters – Grundy (Rich: personalized book recommendation expert system)
39
Other UM parameters to adapt to
• User context
• User environment
• User group
• User social interaction
• Etc.
40
Overview: UM1. Introduction
2. History
3. Information collectionA. Information type
B. Information sources
C. User Identification Method
D. User Information Collection Method
41
42
‘ Deep’ or ‘shallow’ modelling?
• Deep models give more inferential power! • Same knowledge can affect several parts of
the functionality, or even several applications • Better knowledge about how long an
inference stays valid • But deep models are more difficult to acquire • How do we get information about the user?
43
Overview: UM1. Introduction
2. History
3. Information collectionA. Information type
B. Information sources
C. User Identification Method
D. User Information Collection Method
44
User Identification• Five basic approaches to user identification:
– software agents – logins – enhanced proxy servers – cookies – session ids
• The first 3 techniques are more accurate, but require active participation of the user. The last 2 are less invasive
45
Overview: UM1. Introduction
2. History
3. Information collectionA. Information type
B. Information sources
C. User Identification Method
D. User Information Collection Method
46
User Information Collection• Explicit Feedback Systems
– direct user intervention, e.g. via HTML forms – (potentially) accurate, extra burden on users – e.g. demographic info (birthday, marriage status, job,
personal interests) – Users may not accurately report interests – Static profile versus changing user interests.
• Implicit Feedback Systems – Collect information while user is performing regular tasks – Open Web personalization needs extra software capturing
user activity.47
What Kind of Implicit Feedback?
• Tracking of regular activities – Clicks– Time spent – Scrolling and mouse movement – Eye tracking, etc.
• Enabling and tracking additional interest-bearing activities – Bookmarking (e.g., del.icio.us)– Downloading (Scienstein paper recommender) – Annotating (Knowledge Sea, Brusilovsky) – Q & A 48
Info Collection: Browser Cache and Proxy Server
• Browsing histories collection: – users share browsing caches on a periodic basis – users install a proxy server as gateway to Internet,
capturing all Internet traffic generated
• Disadvantages: – Sharing histories: too much work from the user – shared with one particular Website only– Typically from a single computer. Otherwise:
• Share browsing cache from multiple computers • Install same proxy server on each computer • Use a login system with same user profile
49
Info Collection: Browser Agent• either standalone application with browsing capabilities
or plug-in to existing browser (i.e., HeyStaks) • Advantage:
– richer information about user (browsing history, actions performed: bookmarking, downloading, scrolling and mousing).
• Disadvantages: – users needs to install a new application or plugin– large software development/ maintenance investment– typically only available on that particular computer
• Or install on multiple computers and assure synchronization
50
Info Collection: Desktop Agents
• searches not limited to Web: includes databases, users personal documents (Google Desktop Search –
discontinued; Windows Search 4.0) – enhanced user profile
• Client-side approaches: burden on users to collect/share activity log (unless integrated w. OS)– Microsoft, Apple, and Google have been actively working on
it
51
Info Collection: Web and Search Logs
• Weblogs capture browsing histories of users at website – to adapt website based on user behaviour
• Search logs contain info about queries + date/time/result– For user profiling
• Advantage: user does not need to install a desktop application and/or upload their information to the service.
• Disadvantage: only activities at search site tracked, less info available.
• Heavily used by IBM, Google and Microsoft
52
• Last time: (finished) user information type, user information sources, user identification, user data collection
• Next time: UM representation, UM construction, UM issues
53
User modelling is always about guessing …
Concluding: What information is available?
54
Overview: UM1. Introduction
2. History
3. Information collection
4. User Model Representation1. Keyword Profiles
2. Semantic Network Profiles
3. Concept Profiles
55
Generic User Modelling Techniques
• Rule-based frameworks
• Frame-based frameworks – (incl. Keyword Profiles)
• Network-based frameworks – (incl. Semantic Networks and Concept Profiles)
• Probability-based frameworks
• Sub-symbolic techniques
• Example-based frameworks 56
Keyword Profiles• from web pages visited, bookmarked, saved, explicitly provided• Bag-of-words
– most popular words– may be associated with numerical weight representing importance
• Profile vectors – overlay of a keyword vector– 0-1 vector – Weighted vector
• Benefits – Simplicity
• Shortcomings: – Words may have multiple meanings. Same idea expressed by different
words. polysemy, synonymy, >> ambiguous, inaccurate 57
0-1Keyword profiles• Rows represent document terms
• Columns represent users
• User 1 liked document “the cat is on the mat”
• User 2 liked document “the mat is on the floor”
User 1 User 2
Cat 1 0
floor 0 1
mat 1 1
Weighted Keyword Profiles
Term 1 Term 2 Term n
User 1 W11 W12 … W1n
User 2 W21 W22 … W2n
…
User m Wm1 Wm2 … Wnm
Advanced Keyword Profiles• Deal with shortcomings: synonymy, polysemy,
interest drift • Examples:
– Representing user as a set of keyword vectors (e.g., one per bookmark (interest))
– representing each interest with three keyword vectors, i.e., a long-term descriptor and two short-term descriptors, one positive and one negative
– separate profiles for each topic and distinguish short and long-term profiles
Domain (topic)-based UM
Art
0.60 0.72 … 0.45 0.33
Portrait Sculpture … Watercolor Painting
Sports
0.88 0.27 … 0.79 0.33
Soccer Bat … Touchdown Score
Music
0.15 0.87 … 0.31 0.63
Rock Symphony … Score Orchestra
Semantic Network Profile• polysemy => weighted semantic network:
– node contains particular word found in corpus, arcs represent co-occurrences of two
• To increase accuracy, words can be grouped in “synsets”. – nodes are “synsets”, – arcs are co-occurrences of the “synsets” members within a document
of interest to the user, – node and arc weights represent the users level of interest.
62
Semantic Network Profile• Advanced relevance network for query expansionJava -> Java and programming -> Java and (programming or development)
63
Concept Profile• Similar to semantic network: with nodes and arcs. • nodes represent abstract topics considered
interesting to the user, not specific words (sets)• hierarchical concepts enable generalizations. • simplest: constructed from reference taxonomy
(WordNet) or thesaurus• complex: from reference ontology • levels in hierarchy: fixed, or change dynamically
according to users interests.
64
Concept profile over news taxonomy• For each domain concept (or taxon) an overlay
model stores estimated level of user interests
65
Additional types of UM representation and classification
• Overlay: user’s characteristics are a subset of expert system’s characteristics (e.g. knowledge of a domain)
• Predictive: run models and see which best fits user’s input
• Analytic: analyse user’s input and see which features exist and try and account for structure (data mining)
66
Overview: UM1. Introduction
2. History
3. Information collection
4. User Model Representation
5. User Model Construction– Building Keyword Profiles – Building Semantic Network Profiles – Building Concept Profiles
67
Building Keyword Profiles1. extracting keywords from Web pages collected.
2. keyword weighting1. popular weighting scheme: Tf*Idf (information retrieval
theory)
2. Latent Semantic Indexing (LSI)
3. Linear Least Squares Fit (LLSF) for creating the keyword-based feature vectors.
3. number of words frequently capped: only top most highly weighted terms from any page contribute to profile.
68
TF*IDF• term frequency tft,d of term t in document d = number
of times that t occurs in d.
• document frequency dft,D of t = number of documents that contain t in the whole corpus D.
• N = number of documents in D.
• inverse document frequency idft,D of t in D:
• The tf-idf weight of a term is the product of its tf weight and its idf weight.
)|d|/( log idf Dt,10, NDt
Dtdt idftfDdt ,,,,
w
Latent Semantic Indexing• Principle:
– words that are used in the same contexts tend to have similar meanings.
– Uses SVD (singular value decomposition).– Overcomes synonymy and polysemy.
• Algorithm:– Constructs term-document matrix of occurrences (term: row;
doc:column): sparse, large; applies weighting with local term weight (tf in doc) and global weight (df in corpus) –with various fcts (binary, log, etc.)
– Rank-reduced SVD (finds relationships between terms): transforms matrix in 3 matrixes: term-concept, singular values, concept-document; truncates it to largest k (100-300) entries in sing. val.
Linear Least Squares Fit• finding the best-fitting curve to a given set of points by
minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve
For obtaining keyword-based feature vectors
Building Semantic Network Profile
• keywords added to a semantic net– if keyword already in, node’s score increased by
user’s feedback (or decreased, for negative feedback). – otherwise, a new node is created.
• update the weights on the keyword
co-occurrence arcs.
72
Building Concept Profiles• Similar to semantic network; however:
• Usually, a reference ontology (often, hierarchy) is used as a base
• User profiles can be a weighted mapping over the ontology
73
Overview: UM1. Introduction
2. History
3. Information collection
4. User Model Representation
5. User Model Construction
6. Issues1. Privacy
2. Profile exchange
3. Profile editing74
Privacy Issues• Personal user information: critical data
– Where is it stored?• Local machine • Not at all
– How is it protected?– Can users alter it?– Real identity? Country laws?
• Anonymity: session ids, cookies, login w. pseudonyms
75
Profile Exchange• multiple systems collect user info
– Integrating, exchanging profiles => better personalization
– Ubiquitous User Modelling– Ontologies for profile exchange
• GUMO (follow-up)
76
Who Maintains the Model/Profile?
• user/administrator– Sometimes the only choice
• system (automatic personalization)
• collaborative - user and system– User creates, system maintains– User can influence and edit– Does it help or not?
77
Conclusions• accurate representation of user is crucial to performance of
personalized information access systems • surveyed user modelling definitions, history, popular
techniques for collecting user formation, representing and building user models, issues (privacy)
• On-going research topics... – How to improve profile accuracy? – How to quickly achieve profile stability? How to identify major/minor,
long-term/short-term interest of users? – How to determine appropriate level of depth in the interest hierarchy
in user profile...?
78