ufa state aviation technical university
DESCRIPTION
Ufa State Aviation Technical University. Grigory A. Makeev. Distributed Collaborative Filtering System as a Prototype of a New Information Messaging Media. Paranoia: a web-based blog and RSS aggregation system. Ufa, 200 7. Information messaging. - PowerPoint PPT PresentationTRANSCRIPT
Ufa State Aviation Technical University
Distributed Collaborative Filtering System
as a Prototype of a New Information Messaging Media
Ufa, 2007
Paranoia: a web-based blog and RSS aggregation system
Grigory A. Makeev
2
Information messaging
•Important from his own point of view (selectivity);•In time (operativeness);•Most of existing important ones (pervasion);
A person, being an element of a social system, needs to obtain adequate information to interact with others. Thus we suppose that every person wishes to get information messages:
However, natural limitations are evident:
•Importance can be estimated only by user himself;•Messages are too many to handle in time;•Messages are too many to process them all;
3
Hypothesis: collaboration
At least until the semantics of natural languages can be processed effectively, importance of a message would always initially be estimated manually, by a human user.
•One single user has to process messages manually•Many collaborative users can effectively process a large set of
messages, exchanging important messages they find•May a message importance be estimated only once?•May a user use/trust an estimation of an arbitrary user(s)?
4
Collaborative filtering problem
U1 U2 Un...
...
Ui
m1
{m1,m3}
M m2
m3
m4mj
{m1,m2} {m4,m2} { ? }
P1(mk) P2(mk) Pn(mk) Pi(mk)
Building a recommendation
Ui
M
{ ? }
M`Í M
Models and methods of recommender systems
Restrictions
5
Recommender systemsSearch engines: • Google
Web-based recommender systems: • GroupLens • IOwl
Online stores: • Amazon• Ebay
Resources with elements of social networks
General drawbacks of existing collaborative filtering systems:•recommendations are built using data from all users, thus result has a
bad selectivity;•centralization;•vulnerability on logical and physical layers;•users lack control on the process;•users lack the explanation of the results;•systems do not allow an objective efficiency estimation.
Approaches to recommender systems
Content analysis Recommendation
support systems
Social data-mining
Collaborative filtering
6
An approach on collaborative filtering
• Users U1,U2,…,Ui;
• Every Ui controls a peer of a p2p-network, identified by a pair of
security keys;• Every Ui manages a set of messages Mi;
• If a message is in Mi, Ui is said to recommend this message;
• Only user Ui may manage messages of Mi set;
• Other users may retrieve Mi, receiving a recommendation of Ui
Data structures: messages
User UiUi
Public key
Private key
User name
Ui
Channel
Climate
Messages
Message
February, 13th, a strong hurricane approached
central Antarctica
Sgntr
...
UserName: Иванов И.И.
Location: УфаLanguage: русский
...
...
7
An approach on collaborative filtering
• Users U1,U2,…,Ui;
• Every Ui controls a set of rates Ri – pairs of (Uj,vij); vij [0,1]
which may have an additional information, such as a channel;
• Only user Ui may manage rates in Ri;
• Other users may retrieve Ri
Data structures: rates
Ui
Channel
Climate
Rates
User
Uj
Rate
0,9...
Society Uk 0,7
8
An approach on collaborative filtering
• Every user rates a limited number of users directly, that he knows of, or that he is somewhat sure of;
• Transitivity allows us to extend a set of users, included in collaborative filtering for a particular user;
•Messages, retrieved from all users included in a filtering process, are sorted by how many users recommended it and what their value was;
•Aggregation function AMF(m, R*i) is also to be found
Uj
Uk
0,9
0,8
Ui
TRF(Ui,Uj,Uk) = 0,8*0,9 = 0,72
Extending rates set and message aggregation
•Transitive rate is computed with a special function TRF(Ui,Uj,Uk) to be found
9
A proposed scheme of collaborative filtering
1. User evaluates an extended rates set of a sufficient depth.
Stage 1
UA
UB
UD UC
UE
0,9 0,8
0,8 0,7
(UD, 0.72, 1), (UB, 0.9, 0), (UC, 0.63, 1), (UE, 0.8, 0)
10
A proposed scheme of collaborative filtering
2. Retrieving messages from many peers, user evaluates an extended messages set M*I – unsorted result
of collaborative filtering;3. Calculating a value of every
message, user evaluates an extended messages set MR*i –
sorted result of collaborative filtering.
Stages 2-3
UA
UBUD UC UE
0.80.630.90.72
{m1, m2} {m1, m3} {m4, m5} {m1, m4}
m U v
m1 UD 0.72
m2 UD 0.72
m1 UB 0.9
m3 UB 0.9
m4 UC 0.63
m5 UC 0.63
m1 UE 0.8
m4 UE 0.8
m Uv
m1 UD,UB,UE2.42
m4 UC,UE1.43
m3 UB0.9
m2 UD0.9
m5 UC0.63
11
A proposed scheme of collaborative filtering
4. User corrects his own set of messages Mi;
5. User corrects his own set of rates;
Stages 4-5
m Uv
m1 UD,UB,UE2.42
m4 UC,UE1.43
m3 UB0.9
m2 UD0.9
m5 UC0.63
Ui
Chnl
...
Rates
User
UB
Rate
0,4
Chnl...
Messages
Messagem1
Sgntr...
... m4 ...
m6
... m6 ...
... UE 0,8
Ui
12
Advantages of the approach
Features of the system implementing the approach proposed:• Decentralization• Anonymity of authors• Authors can prove themselves and ownership on the message• Selectivity• Controllability• Explainability• Flood resistance• Antagonistic societies can co-exist and even collaborate
13
Results of the formal analysis and experiments
• Criteria of controllability and persistency on users and messages found and formalized;
• Several transitivity functions TRF and message aggregation function AMF found, examined to conform criteria found and the best one chosen;
• A system of virtual users created, seeking and exchanging important messages:
• Messages considered numbers;• Every user had a favourite number;• Users constructed their trusted neighbours in the making,
starting with random rates set, or a preset one;• Users aim at collecting most favourable messages;• An objective efficiency of the system is calculated;• Dependencies of efficiency on many factors investigated;
14
Proposed prototype implementation
• HTTP instead of p2p-network protocols• DNS routing instead of ad-hoc p2p naming and routing protocols• Web-server instead of p2p-node• Users sharing common web-servers instead of users on p2p-nodes• RSS as a message delivery protocol
A web-based RSS aggregator
It looks like a web-based RSS aggregator, but a typical one of them• does not actually “aggregate”, merely “collects”
It looks like a typical web-based collaborative filtering system, but most of them• use “general” reputation, influenced by everyone• are server based, centralized• are not customizeable
As a working prototype we propose an open-source (GNU GPL) web-based RSS aggregator – Paranoia, available at
http://greg.southural.ru/paranoia/
15
Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia
Paranoia server
LiveJournal
Blog 1<p>a message</p><p>a message</p><p>a message</p> ...
RSS
HTML
Syndicated feeds
A Paranoia blog is accessible both in browsers and RSS-aggregators
Paranoia server RSS-aggregator
Syndicated feeds
A Paranoia blog is accessiblein LiveJournal throughSyndicated feeds feature
A Paranoia blog is accessibleon another Paranoia server
A Paranoia blog is accessiblein any other RSS-aggregator
16
Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia
Paranoia server
My news<p>news</p><p>news</p><p>news</p> ...
Blog 2<p>message</p><p>message</p><p>message</p> ...
Paranoia server
LiveJournal
Paranoia can aggregate messagesfrom different sources – users of the samesystem, users of remote Paranoia system,users and communities of LiveJournal,and of arbitrary RSS feed.
RSS feeds
17
Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia
Paranoia server
My news<p>news</p><p>news</p><p>news</p> ...
My messages<p>message</p><p>message</p><p>message</p> ...
RSS
HTML
RSS
HTML
18
Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia
А
B
This rate means the following:
«I want to receive messages from user А in channel «Politics», and I value him for 0.5 in this channel»
Channel: politicsRate: 0.5
Channel: handiworkRate: 0.2
B
Channel: handiworkRate: 0.8
LJ
Channel: mainRate: 0.2
19
Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia
S А
B
The news will be as higher whenit has come from many users andwhen as higher as is their valuefor you.
Channel: politicsRate: 0.5
Channel: HandiworkRate: 0.2
B
Channel: HandiworkRate: 0.8
My news<p>news</p> 0.5<p>news</p> 0.4<p>news</p> 0.4 ...
Channel: KnitworkRate: 0.2
C
Channel: mainRate: 0.5
D
Channel: politicsRate: 0.5
E
Channel: politicsRate: 0.5
E
Channel: mainRate: 0.5
...
...
Paranoia server
...
Paranoia server
LiveJournal
...
20
Proposed prototype implementation
А B
Paranoia server
C D
E F
News
А and В process news feed ontheir own – thus they can not useeach other’s labour
An open-source web-based RSS aggregator - Paranoia
А B
Paranoia server
C D
E F
News
A and B collaborate in processing news feeds,- thus a message, coming from both a feed anda fellow user would receive higher rank ina news result set
21
Proposed prototype implementationNon-trivial features
An environment appears to be very flexible, and many tasks can be solved trivially within:
1. Administrator notifications: every user automatically rates a local administrator in a channel ‘system’
Channel: systemRate: 0.5
User news: system<p>notification1</p><p>notification2</p><p>notification3</p> ...
Paranoia server
Admin messages: system<p>notification1</p><p>notification2</p><p>notification3</p> ...
2. Users feedback: local administrator automatically rates every user in a channel ‘feedback’
C D ...
Channel: feedbackRate: 0.1
User A messages: feedback<p>feedback1</p><p>feedback2</p> ...
Paranoia server
Admin news: feedback<p>feedback1</p> 0.2<p>feedback2</p> 0.1<p>feedback3</p> 0.1 ...
User B messages: feedback<p>feedback1</p><p>feedback3</p> ...
Channel: feedbackRate: 0.1
22
Proposed prototype implementationNon-trivial features
3. Comments to messages are merely one’s own messages, stored in a special channel:
• Comments to do leave creator’s peer;• Comments are retrieved when needed, following the same rules as
any other message;
C D ...
Channel: politicsRate: 0.5
User A news: politics<p>message1</p>
<p>comment1</p><p>comment2</p>
<p>message2</p> ...
Paranoia server
User B messages: politics<p>message2</p> ...
User B messages: comments<p>comment1</p> ...
4. If comments are retrieved only from trusted peers and are not stored locally:• No one (except trusted peers) can spam the discussion;• Different groups with rates among group fellows can discuss the same
message without interfering!
23
Conclusion
In our opinion messaging systems (news messaging or whatsoever) would evolve gradually:
• to be distributed among many storages• to have many initial sources of information
• with emphasis to direct witnesses• to implement collaborative filtering
• specific for every user• controllable by every user• resistant to most types of malicious behaviour
Thank you!