intelligent information integrationtakeda/lecture-kss/iii2007e-part2.pdf · zmain task: how to...

Hideaki Takeda / National Institute of Informatics NII

Intelligent information integrationPart II: Information Gathering

Hideaki Takeda

National Institute of Informatics

takeda@nii.ac.jp

http://www-kasm.nii.ac.jp/~takeda/

Intelligent Information Integration

Information Gathering

Information Integration

Information Distribution

Information Gathering

Model: Relationship between agents and information sources

Main task: how to provide access from the agent to information sources

Methods for Information Gathering

Information Retrieval

Explicit specification of users needs

Currently available information sources

Information Filtering

Dynamic information sources

Not available at the current moment, but available in future

Browsing

Implicit specification of users needs

Browsing

Characteristics as Information Gathering

Users initiative

Applicable even with vague purpose

Cons: No warranty to reach the goal

Human habit: up to down, side trip

Human limitation: Sequential access

Problems

How to support users with keeping users initiative

How to obtain users preference

Browsing

Problems

How to support users with keeping users initiative

How to obtain users preference

Web Watcher

Letizia

Syskill & Webert

InforSpinder

Web Watcher

Use of machine learning

Learn users preference from browsing process

Functions

Recommendation of links which the system infers useful to the user among links in browsing pages

Recommendation of links which the systems infers useful to the user among all links

Web Watcher

Learning target

LinkUtility: Page×Goal×User×Link→[0,1]

UserChoice: Page×Goal×Link→[0,1]

Page: Keyword vector of 200 words extracted from pages

link: Keyword vector of 200 words from links and 100 words from texts surrounding links

Goal: Keyword vector of 30 words

Web Watcher

Learning Method

TFIDF(term frequency times inverse document frequency) [Salton] and other two methods

Results

2 or 3 times better than random recommendation

TF/IDF

To measure importance of document d within document set D by word t

W(d,t,D) = TF(d,t)･IDF(t,D)

TF(d,t): frequency of word t in document d

IDF(t,D) = log(|D|/freq(D,t))

freq(D, t): number of document in D in which t appears

Intuitively

More frequent the specific word appears, more important

But, more documents have the specific words, less important

Use of TF/IDF in WebWatcher

TF/IDF is calculated for the goal page

Then each vector for “userchoice” is calculated with TF/IDF values for words as weight

Welcome from the dean

0.0 Baseball0.7 welcome0.0 graphics0.1 the0.0 hockey…0.2 from0.0 space0.7 dean

TFIDF weights

Reinforcement Learning

A model that an agent can learn its behavior through trial-and-error in a dynamic environment

Learning Agent

Environment

Action state Reward

Basic elements in reinforcement learning

Environment

The environment is (partially) observable by sensors

A set of states described by some parameters

Reinforcement (Reward) function

Reinforcement or reward is feedback from the environment to the agent

The goal of reinforcement learning is to find a function to maximize rewards that it will receive in the future

Examples for reward: delayed rewards, minimal time …

value function

policy

Decide next action in each state

Mapping from state to action

Optimal policy: mapping from any state to the state where maximal reward is given

state value

Sum of rewards from the initial state to the state

Value function

Mapping from a state to a state value

It depends on policy

The given environment

-1-1-1

Optimal state value

-1-1-1

Q-Learning

R(s): reward at state s

Q(s, a): evaluation function for action a at state s

Delayed rewards

)]','([)'(),(

asQsRasQ

nactiona

=∑maxλ

1 2 3 4 Tγ γ γ γ

Q-learning in WebWatcher

Page: state

Hyperlink: action

Reward for page: how much it is interesting for user

It is measured by sum of words weighted by TFIDF

Letizia

Browser which learns in browsing

Functions

Recommend pages starting from the page which the user is currently browsing

Methods

Collect pages starting from the current page in breadth-first way

Stop by thresholds or if the current page is changed

Users profile

Letizia

Syskill & Webert

Browser which learn by Supervised Learning

User specify hot, cold, or lukewarm

System recommends either hot, cold, worth to look, or worth not to look

Selection of keywords

Calculation for keyword selection

( ) ( ) ( ) ( ) ( ) ( )[ ]E W S I S P W present I S P W absent I Sw present w absent, = − = + == =

( ) ( ) ( )( )I S p S p Sc c= −∑ log2

Selection of keywords

hot:30 cold:70

pr:20h&p:10 c&p:10

h&a:20 h&c:60ab:80

total:100

I I I I

h p c p

h a c a

= = = === = = =

= = = =

= − ⋅ + ⋅ =

0 3 0 7

0 5 0 5

0 25 0 75

052 0 36

05 0 31

088 0 2 1 08 081 0 03

| . | .

. ( . . . ) .

hot:30 cold:70

pr:20h&p:15 c&p:5

h&a:20 h&c:60ab:80

total:100

I I I I

h p c p

h a c a

= = = === = = =

= = = =

= − ⋅ + ⋅ =

0 3 0 7

0 75 0 25

0 25 0 75

0 52 0 36

0 31 0 5

0 5 0 31

0 88 0 2 081 0 8 0 81 0 07

| . | .

. ( . . . . ) .

InfoSpider[Menczer]

Adaptive information agent by Artificial Life

Agent follows links, then obtain energy dependent on page content and sometimes propagates

Identify pages related to the questions

InfoSpinder

Information Filtering / Recommendation system

Content-based filtering

Estimate users preference by comparing keywords in pages and users profiles

Social filtering / Collaborative filtering

Estimate users preference by collecting and analyzing preferences of many users

Problem definition

How to estimate missing information from the given matrix?

Each vector represents preference of each person

Some values are missing because she has not experience them

Estimate these values

Solution: Use similarity between users

Article Person A

１ 1 4 2 22 5 2 4 43 34 2 5 55 4 1 16 ? 2 5

Person B Person C Person D

Algorithm for Social filtering(correlation coefficient)

12)1(2

8.044111144

)2(12)1()1(212

=++⋅+−⋅−

−=++++++

−⋅+⋅−+−⋅+⋅−=

∑∑

−−

−−=

)1',1()',(

lklkkkl

mkmkkkCov

r KKσσ

Article Person A

１ 1 4 2 22 5 2 4 43 34 2 5 55 4 1 16 ? 2 5

Person B Person C Person D

−−=

))(()',(

lklkkkl

xxxxkkCov

xxσStandard deviation

Calculate relation between user k and k’ by correlation coefficient (相関係数）

Covariance

Algorithm for Cooperative filtering(correlation coefficient)

∑∑

kkkkklk

'''' )(

56.418.0

12)8.0()1(3' 6 =

+⋅+−⋅−

１ 1 4 2 22 5 2 4 43 34 2 5 55 4 1 16 ? 2 5

Article Person A Person B Person C Person D

GroupLensCollaborative filtering system for NetNews

Collaborative Filtering: Pros and Cons

Robust for content change

No need for content analysis

Applicable for non-text data

Few users actions

Just only evaluate items

“Cold start” problem

Massive evaluation data is need before reliable recommendation

No evaluation, no recommendation

Items without evaluation are never recommended

MovieLens

Content-based filtering: pros and cons

Precise recommendation is possible

For users

For providers

For providers: Difficulty to design profiles

For users: Difficulty for keeping users profiles

Update

Not adaptive for new contents

Lifestyle Finder

Recommendation for buying, visiting, etc

Estimate users preference by categorizing users to some clusters

Fewer feedback are needed for learning preference

Categorization of users with large-scale demographic data

Demographic data tell us a good example of users categorization

Ex., A survey categorizes Americans into 62 clusters by using U.S. census, magazine subscriptions, catalog purchases, …

Create larger clusters by merging the given clusters

Ex., “Prefer TV news” ⊃ “Prefer the specific TV news program”

Lifestyle Finder

intelligent information integrationtakeda/lecture-kss/iii2007e-part2.pdf · zmain task: how to...

Documents

kss truck scale

hideaki chijiiwa _ color harmony

kss especificacion

kss-consulting gbr presentation from the -kss-consulting...

kss-213c/kss-213q laser assemblies - vocopro...

model no. kss - 101 systems new catalog 1.pdf · model no....

catalogo kss

eduroam jp and development of upki roaming yoshikazu...

ikterus neonatorum mklh kss 3

kss global standard plastic injection tool standard kss...

albert health 2013 kss

hideaki catalog

kss greens - biz profile 2015

cin kss session

kss - welding defect

control center kss

kss consolidated storage agreement

kss 5thed chap5 problems

hideaki ishii - sc.dis.titech.ac.jp

quest kss engineering drawings